Multivariate Point Process Package
0.1
|
PlainTerminating implements the multivariate terminating process. More...
#include "include/PlainTerminating.h"
Classes | |
struct | OPTION |
Options used to configure the fitting of the terminating point process. More... | |
Public Types | |
enum | OptMethod { SGD, PLBFGS } |
Optimization algorithms used to fit the standard Hawkes Process. More... | |
enum | Regularizer { L1, L22, NONE } |
Supported regularizations used to fit the standard Hawkes Process. More... | |
enum | RegCoef { LAMBDA } |
Regularization coefficients. More... | |
Public Member Functions | |
PlainTerminating (const unsigned &n, const unsigned &num_dims) | |
The constructor. More... | |
PlainTerminating (const unsigned &n, const unsigned &num_dims, const Graph *graph) | |
The constructor. More... | |
void | fit (const std::vector< Sequence > &data, const OPTION &options) |
Maximum likelihood estimation for the model parameters. More... | |
virtual void | NegLoglikelihood (double &objvalue, Eigen::VectorXd &gradient) |
Negative loglikelihood of multivariate terminating point process. More... | |
virtual void | Gradient (const unsigned &k, Eigen::VectorXd &gradient) |
Returns the gradient w.r.t. the model parameters on the k-th sequence. More... | |
virtual double | Intensity (const double &t, const Sequence &data, Eigen::VectorXd &intensity_dim) |
Intensity function of each node. More... | |
virtual double | IntensityUpperBound (const double &t, const double &L, const Sequence &data, Eigen::VectorXd &intensity_upper_dim) |
Returns the upper bound of the summation of the intensity value on each dimension from time t to t + L given the history of past events in sequence data. Let \({\lambda_d^*(t)}\) be the conditional intensity function on the d-th dimension where \(d=1\dotso D\), and num_dims_ = D. This function returns \begin{align} \lambda_0^D(t) \geq \sum_{d=1}^D\sup_{\tau\in[t, t + \tau(t)]}\lambda^*_d(\tau), \end{align} where the returned value \(\lambda_0^D(t)\) will be used for Ogata's Thinning algorithm. More... | |
virtual double | IntensityIntegral (const double &lower, const double &upper, const Sequence &data) |
Returns the integral of the intensity function \(\int_{a}^b\lambda^*(\tau)d\tau\) where \(a = lower\) and \(b = upper\). More... | |
virtual double | PredictNextEventTime (const Sequence &data, const unsigned &num_simulations) |
Predict the next event timing by the expectation \(\int_{t_n}^\infty tf^*(t)dt\). Currently, we use the sample average by simulations to approximate the expectation since the conditional density \(f^*(t)\) normally does not have an analytic form. More... | |
Public Member Functions inherited from IProcess | |
IProcess (const unsigned &n, const unsigned &num_dims) | |
The constructor. More... | |
const Eigen::VectorXd & | GetParameters () |
Return the column vector of model parameters. More... | |
unsigned | GetNumDims () |
Return the number of dimensions in the process. More... | |
void | SetParameters (const Eigen::VectorXd &v) |
Set the model parameters. More... | |
void | PlotIntensityFunction (const Sequence &data) |
Plots the intensity functions based on the given sequence. It plots the intensity function and the associated event points up of each dimension in the same figure. More... | |
void | PlotIntensityFunction (const Sequence &data, const unsigned &dim_id) |
Plots the intensity function and the associated event points of the dimension dim_id. More... | |
Protected Member Functions | |
void | Initialize (const std::vector< Sequence > &data) |
initialize the temporal features arrayK and arrayG from the input sequences More... | |
void | InitializeWithGraph (const std::vector< Sequence > &data) |
Initialize the temporal features arrayK and arrayG from the input sequences where the network structure among the nodes is given. More... | |
void | PostProcessing () |
Post process the learned dependency structure. More... | |
Protected Member Functions inherited from IProcess | |
void | InitializeDimension (const std::vector< Sequence > &data) |
Protected Attributes | |
std::vector< Eigen::MatrixXd > | arrayK |
the temporal features associated with the intensity. More... | |
std::vector< Eigen::MatrixXd > | arrayG |
Intergral of the intensity. More... | |
Eigen::VectorXd | observation_window_T_ |
a column vector of length \(C\) which is the total number of sequences. Each component records the observation window in the respective sequence. More... | |
unsigned | num_sequences_ |
total number of observed sequences More... | |
OPTION | options_ |
A configuration object which saves the optimization options. More... | |
const Graph * | graph_ |
A graph object represents the dependency structure among the dimensions. More... | |
Protected Attributes inherited from IProcess | |
Eigen::VectorXd | parameters_ |
A column vector represents all model parameters of the process. More... | |
unsigned | num_dims_ |
The total number of dimensions of the process. More... | |
std::vector< std::vector< std::vector< double > > > | all_timestamp_per_dimension_ |
all_timestamp_per_dimension_ is a 3-d array where all_timestamp_per_dimension_[c][n][i] records the i-th event on the n-th dimension in the c-th sequence. More... | |
PlainTerminating implements the multivariate terminating process.
The Multivariate Terminating Point Process is an \(D\)-dimensional temporal point process with the conditional intensity function of each dimension \(d\) is given by \(\lambda_d^*(t) = \mathbb{I}\{N_d(t)\leq 1\}\cdot g(t)\) where \(N_d(t)\) is the number of events on the dimension \(d\), \(g(t)\) is a non-negative function, and \(\mathbb{I}\{{\cdot}\}\) is the indicator function. The Multivariate Terminating Process instantiates the continuous-time information diffusion model. In this class, we assume the pairwise diffusion time conforms to an exponential distribution, that is, \(f_{ji}(t) = \alpha_{ji}\).
Definition at line 21 of file PlainTerminating.h.
Optimization algorithms used to fit the standard Hawkes Process.
Enumerator | |
---|---|
SGD |
stochastic gradient descend. |
PLBFGS |
Definition at line 29 of file PlainTerminating.h.
Regularization coefficients.
Enumerator | |
---|---|
LAMBDA |
Regularization coefficient for \(\|\mathbf{A}\|\) |
Definition at line 61 of file PlainTerminating.h.
Supported regularizations used to fit the standard Hawkes Process.
Enumerator | |
---|---|
L1 |
Sparse L1 norm \(\|\cdot\|_1\) |
L22 |
L22 norm \(\|\cdot\|_2^2\) |
NONE |
No regularization |
Definition at line 43 of file PlainTerminating.h.
|
inline |
The constructor.
[in] | n | the number of parameters in total. |
[in] | num_dims | the number of dimensions in the process. |
Definition at line 159 of file PlainTerminating.h.
|
inline |
The constructor.
[in] | n | the number of parameters in total. |
[in] | num_dims | the number of dimensions in the process. |
[in] | graph | the graph object representing the dependency structure among the dimensions. |
Definition at line 172 of file PlainTerminating.h.
Maximum likelihood estimation for the model parameters.
[in] | data | vectors of observed sequences. |
[in] | options | data structure sotring different configuration for the optimization algorithm and the respective regularizations. |
Definition at line 140 of file PlainTerminating.cc.
|
virtual |
Returns the gradient w.r.t. the model parameters on the k-th sequence.
[in] | k | sequence index. |
[out] | gradient | the gradient vector w.r.t. the model parameters. |
Implements IProcess.
Definition at line 251 of file PlainTerminating.cc.
|
protected |
initialize the temporal features arrayK and arrayG from the input sequences
[in] | data | input collection of sequences |
Definition at line 13 of file PlainTerminating.cc.
|
protected |
Initialize the temporal features arrayK and arrayG from the input sequences where the network structure among the nodes is given.
[in] | data | input collection of sequences |
Definition at line 67 of file PlainTerminating.cc.
|
virtual |
Intensity function of each node.
The intensity function of each node is defined as \(\lambda^*_{c,i}(t) = \sum_{j\neq i}\mathbb{I}(t^c_j < t)\alpha_{ji}\).
[in] | t | the current given time. |
[in] | data | sequence of past events. |
[out] | intensity_dim | a column vector of size num_dims_ where each component stores the intensity value of the respetive dimension at time t given the past sequence in data. |
Implements IProcess.
Definition at line 258 of file PlainTerminating.cc.
|
virtual |
Returns the integral of the intensity function \(\int_{a}^b\lambda^*(\tau)d\tau\) where \(a = lower\) and \(b = upper\).
[in] | lower | starting point of the integral. |
[in] | upper | ending point of the integral. |
[in] | data | sequence of past events. |
Implements IProcess.
Definition at line 272 of file PlainTerminating.cc.
|
virtual |
Returns the upper bound of the summation of the intensity value on each dimension from time t to t + L given the history of past events in sequence data. Let \({\lambda_d^*(t)}\) be the conditional intensity function on the d-th dimension where \(d=1\dotso D\), and num_dims_ = D. This function returns
\begin{align} \lambda_0^D(t) \geq \sum_{d=1}^D\sup_{\tau\in[t, t + \tau(t)]}\lambda^*_d(\tau), \end{align}
where the returned value \(\lambda_0^D(t)\) will be used for Ogata's Thinning algorithm.
t | the starting time. |
L | the duration. |
data | the given sequence of the past events until time t. |
intensity_upper_dim | a column vector of size num_dims_ storing the upper bound of the intensity function on each dimension from time t to t + L. |
Implements IProcess.
Definition at line 265 of file PlainTerminating.cc.
|
virtual |
Negative loglikelihood of multivariate terminating point process.
\begin{align} \sum_{i=1}^D\bigg\{\frac{1}{C}\sum_{c=1}^C\bigg(\log(\sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j<t^c_i)}_{\text{arrayK[i]}(c, j)}\alpha_{ji}) - \sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)(t^c_i - t^c_j)}_{\text{arrayG}[i](c,j)}\alpha_{ji}\bigg)\bigg\}, \end{align}
where \(\alpha_{ji}\) is the pairwise infection risk from node \(j\) to node \(i\).
objvalue | negative loglikelihood. |
gradient | gradient of the parameters. |
Implements IProcess.
Definition at line 165 of file PlainTerminating.cc.
|
protected |
Post process the learned dependency structure.
Definition at line 114 of file PlainTerminating.cc.
|
virtual |
Predict the next event timing by the expectation \(\int_{t_n}^\infty tf^*(t)dt\). Currently, we use the sample average by simulations to approximate the expectation since the conditional density \(f^*(t)\) normally does not have an analytic form.
[in] | data | the sequence of past events. |
[in] | num_simulations | number of simulations we use to calculate the sample average. |
Implements IProcess.
Definition at line 278 of file PlainTerminating.cc.
|
protected |
Intergral of the intensity.
The log-likelihood of observing a collection of C sequences can be derived as the following:
\begin{align} \sum_{i=1}^D\bigg\{\frac{1}{C}\sum_{c=1}^C\bigg(\log(\sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j<t^c_i)}_{\text{arrayK[i]}(c, j)}\alpha_{ji}) - \sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)(t^c_i - t^c_j)}_{\text{arrayG}[i](c,j)}\alpha_{ji}\bigg)\bigg\}, \end{align}
where \(\alpha_{ji}\) is the pairwise infection risk from node \(j\) to node \(i\). \(\text{arrayG[i]}(c, j)\) is the time duration between the infection time \(t^c_j\) and \(t^c_i\) in the sequence \(c\).
Definition at line 111 of file PlainTerminating.h.
|
protected |
the temporal features associated with the intensity.
The log-likelihood of observing a collection of C sequences can be derived as the following:
\begin{align} \sum_{i=1}^D\bigg\{\frac{1}{C}\sum_{c=1}^C\bigg(\log(\sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j<t^c_i)}_{\text{arrayK[i]}(c, j)}\alpha_{ji}) - \sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)(t^c_i - t^c_j)}_{\text{arrayG}[i](c,j)}\alpha_{ji}\bigg)\bigg\}, \end{align}
where \(\alpha_{ji}\) is the pairwise infection risk from node \(j\) to node \(i\). \(\text{arrayK[i]}(c, j)\) indicates whether node \(j\) is the infecting parent of node \(i\) in the sequence \(c\).
Definition at line 100 of file PlainTerminating.h.
|
protected |
A graph object represents the dependency structure among the dimensions.
Definition at line 149 of file PlainTerminating.h.
|
protected |
total number of observed sequences
Definition at line 121 of file PlainTerminating.h.
|
protected |
a column vector of length \(C\) which is the total number of sequences. Each component records the observation window in the respective sequence.
Definition at line 116 of file PlainTerminating.h.
|
protected |
A configuration object which saves the optimization options.
Definition at line 144 of file PlainTerminating.h.