
Mathematical Definitions of Sufficient Statistics
Source:vignettes/sufficient_statistics.Rmd
sufficient_statistics.RmdOverview
The redeem package models the intensities of interaction formation and dissolution using a log-linear formulation. The intensity (rate) for a dyad \((i,j)\) at time \(t\) is:
\[\lambda_{i,j}(t) = \exp(s_{i,j}(\mathscr{H}_t)^\top \beta + \alpha_i + \alpha_j + f(t, \gamma))\]
where:
- \(s_{i,j}(\mathscr{H}_t)\) is a vector of sufficient statistics derived from the history of interactions \(\mathscr{H}_t\) up to time \(t\).
- \(\alpha_i, \alpha_j\) are actor popularity parameters.
- \(f(t, \gamma) = \sum_{q=1}^Q \gamma_q \mathbb{I}(c_{q-1} \le t < c_q)\) is the baseline step-function representing temporal variation over change points \(0 = c_0 < c_1 < \dots < c_Q\) (with \(\gamma_1 = 0\)).
This vignette provides precise mathematical definitions for all sufficient network statistics implemented in the package, as well as a guide for adding custom statistics.
The complete list of available formula terms is documented in the
redeem_terms reference manual page. For model estimation,
see the help pages for the models via dem and
rem.
Statistic Transformations
Each sufficient statistic \(s_{i,j}(\mathscr{H}_t)\) is defined as a transformed count \(f(N(t))\), where \(N(t)\) is a raw network statistic. The package supports five standard transformations \(f(\cdot)\):
| Transformation | Mathematical Definition |
|---|---|
identity |
\(f(s) = s\) |
log |
\(f(s) = \log(1 + s)\) |
recip |
\(f(s) = 1/(1+s)\) |
bin |
\(f(s) = \mathbb{I}(s > 0)\) |
sig |
\(f(s) = \frac{s}{s + K}\) |
Endogenous Network Statistics
These statistics capture structural dependencies within the network evolution.
1. Intercept (Intercept / intercept)
A constant term representing the baseline log-intensity. \[s_{i,j}(\mathscr{H}_t) = 1\]
2. Inertia (inertia /
number_interaction)
Counts how many times the dyad \((i,j)\) has initiated an interaction in the past. \[s_{i,j}(\mathscr{H}_t) = f(N_{i,j}(t))\] where \(N_{i,j}(t) = \sum_{k: t_k < t} \mathbb{I}(i_k = i, j_k = j)\) (or the windowed version \(N_{i,j}^w(t) = \sum_{k: t-w < t_k < t} \mathbb{I}(i_k = i, j_k = j)\)).
3. Reciprocity (reciprocity)
Models the tendency to reciprocate past interactions (directed only). \[s_{i,j}(\mathscr{H}_t) = f(N_{j,i}(t))\]
4. Duration (duration /
current_interaction)
Measures the dependency on the time since the current interaction started (DEM only). \[s_{i,j}(\mathscr{H}_t) = \begin{cases} f(t - t_{\text{start, } i,j}) & \text{if dyad } (i,j) \text{ is interacting at } t \\ 0 & \text{otherwise} \end{cases}\] where \(t_{\text{start, } i,j}\) is the timestamp of the last formation event.
5. Participation Shifts (P-shifts)
P-shifts capture the sequential dependencies between consecutive events (REM only). Let the preceding event in the stream be \(A \to B\). For a candidate event \(i \to j\) at time \(t\):
-
psABBA(Reciprocation): \(s_{i,j}(t) = \mathbb{I}(i = B, j = A)\) -
psABBY(Receiver turn-continuing): \(s_{i,j}(t) = \mathbb{I}(i = B, j \ne A, j \ne B)\) -
psABAY(Sender turn-continuing): \(s_{i,j}(t) = \mathbb{I}(i = A, j \ne A, j \ne B)\) -
psABXA(Usurpation to sender): \(s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j = A)\) -
psABXB(Usurpation to receiver): \(s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j = B)\) -
psABXY(Completely new dyad): \(s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j \ne A, j \ne B)\)
Triadic Closure and Shared Partners
Triadic statistics capture structural closure. They can be calculated
over active edges (designated as current_
in DEM) or historical event existence (designated as
general_ in REM/DEM). Let \(\mathcal{A}_t\) represent the set of
interacting dyads (for current_) or previously interacted
dyads (for general_) at time \(t\).
Common Partners (general_common_partners /
current_common_partners)
Counts the number of third-party actors \(k\) sharing a connection of a specified type with both \(i\) and \(j\):
-
OSP(Outgoing Shared Partner): \(s_{i,j}(t) = f(|\{k : (i,k) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\) -
ISP(Incoming Shared Partner): \(s_{i,j}(t) = f(|\{k : (k,i) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\) -
OTP(Outgoing Two-Path): \(s_{i,j}(t) = f(|\{k : (i,k) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\) -
ITP(Incoming Two-Path): \(s_{i,j}(t) = f(|\{k : (k,i) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\)
Triangles (general_triangle /
current_triangle)
Similar to common partners, but only non-zero if the focal dyad itself is active (directed only):
-
OSPTriangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (i,k) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\) -
ISPTriangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (k,i) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\) -
OTPTriangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (i,k) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\) -
ITPTriangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (k,i) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\)
Degree and Centrality Statistics
These statistics capture actor-level activity or popularity. Let \(\mathcal{D}_t\) be the network state at time \(t\), and \(d_{i, \text{out}}(t)\), \(d_{i, \text{in}}(t)\) represent the out-degree and in-degree of \(i\) in \(\mathcal{D}_t\). Let \(c_{i, \text{out}}(t)\), \(c_{i, \text{in}}(t)\) be the total out-events and in-events involving \(i\).
Degree Statistics (general_degree_out_sender,
etc.)
- Out-Degree Sender: \(s_{i,j}(t) = f(d_{i, \text{out}}(t))\)
- Out-Degree Receiver: \(s_{i,j}(t) = f(d_{j, \text{out}}(t))\)
- In-Degree Sender: \(s_{i,j}(t) = f(d_{i, \text{in}}(t))\)
- In-Degree Receiver: \(s_{i,j}(t) = f(d_{j, \text{in}}(t))\)
- Degree Sum (undirected only): \(s_{i,j}(t) = f(d_{i}(t) + d_{j}(t))\)
- Degree Absolute Difference (undirected only): \(s_{i,j}(t) = f(\|d_{i}(t) - d_{j}(t)\|)\)
Count Statistics (general_count_out_sender, etc.)
Count statistics are identical to degree statistics but use total interaction counts (\(c\)) rather than binary degrees (\(d\)):
- Out-Count Sender: \(s_{i,j}(t) = f(c_{i, \text{out}}(t))\)
- Out-Count Receiver: \(s_{i,j}(t) = f(c_{j, \text{out}}(t))\)
- In-Count Sender: \(s_{i,j}(t) = f(c_{i, \text{in}}(t))\)
- In-Count Receiver: \(s_{i,j}(t) = f(c_{j, \text{in}}(t))\)
- Count Sum (undirected only): \(s_{i,j}(t) = f(c_{i}(t) + c_{j}(t))\)
- Count Absolute Difference (undirected only): \(s_{i,j}(t) = f(\|c_{i}(t) - c_{j}(t)\|)\)
All of these degree and count statistics can be conveniently
specified in model formulas using the degree() (or
degrees()) and count() helper functions:
-
Out-Degree / Out-Count Sender:
degree(type = "out_sender")orcount(type = "out_sender") -
Out-Degree / Out-Count Receiver:
degree(type = "out_receiver")orcount(type = "out_receiver") -
In-Degree / In-Count Sender:
degree(type = "in_sender")orcount(type = "in_sender") -
In-Degree / In-Count Receiver:
degree(type = "in_receiver")orcount(type = "in_receiver") -
Degree / Count Sum:
degree(type = "sum")orcount(type = "sum") -
Degree / Count Absolute Difference:
degree(type = "absdiff")orcount(type = "absdiff")
Exogenous Statistics
Developer Guide: Adding Custom Sufficient Statistics
To add a new sufficient statistic (e.g., my_stat) to the
package, follow this two-step process:
Step 1: Implement the Update Logic in C++
Define a C++ function in the src/ directory (e.g., in a
new file or in src/sufficient_statistics.cpp) with the
signature ValidateFunction. This function computes the
change to the statistic state matrix when an event occurs:
#include "redeem/sufficient_statistics.h"
#include "redeem/extension_api.hpp"
arma::uvec stat_my_stat(
Data_DEM &object,
arma::mat &data,
unsigned int &from,
unsigned int &to,
unsigned int &number_event,
unsigned int col_number,
std::string transformation,
unsigned int type
) {
if (from == 0 || to == 0) return arma::uvec();
// 1. Calculate which dyad indices are affected and the change value (val)
double val = (type == 1) ? 1.0 : -1.0;
arma::uvec affected_indices = { object.current_stats.find_from_to(from, to) };
// 2. Apply the update using the helper (handles transformations internally)
apply_update(object, affected_indices, col_number, val, transformation, data);
// 3. Return the indices of modified dyads
return affected_indices;
}
// 4. Register the term in the global C++ Registry
TERM_REGISTER("my_stat", stat_my_stat);Step 2: Write the R Initializer Function
In R/init_terms.R, create an initialization function
named InitRedeemTerm.my_stat. This function validates
user-provided arguments and sets up initial values:
#' @keywords internal
InitRedeemTerm.my_stat <- function(arglist, n_nodes, model_type, directed, ...) {
# Validate arguments against expected types and models
arglist <- check.RedeemTerm(arglist,
model_type = model_type, directed = directed,
expected = list(transformation = c("identity", "log", "recip", "bin", "sig"), K = "numeric"),
defaults = list(transformation = "identity", K = 1)
)
# Define initial statistic values at time 0 (optional)
eval_at_zero <- matrix(0, n_nodes, n_nodes)
# Return the configuration list
list(
base_name = "my_stat",
transformation = arglist$transformation,
eval_at_zero = eval_at_zero,
event_stream = arglist$event_stream
)
}Once these steps are completed, run devtools::document()
to update namespaces and documentation, and build the package. The new
term is now ready for use in dem() and rem()
by specifying it directly in the formula (e.g.,
~ my_stat(transformation = "log")).