Multivariate Point Process Package
0.1
|
TerminatingProcessLearningTriggeringKernel implements the multivariate terminating process where the pairwise infection risk function can be learned from the data. More...
#include "include/TerminatingProcessLearningTriggeringKernel.h"
Classes | |
struct | OPTION |
Options used to configure the fitting of the general Hawkes Process with learned triggering kernels. More... | |
Public Types | |
enum | Regularizer { L22, GROUP, NONE } |
Supported regularizations used to fit the standard Hawkes Process. More... | |
enum | RegCoef { LAMBDA } |
Regularization coefficients. More... | |
Public Member Functions | |
TerminatingProcessLearningTriggeringKernel (const unsigned &n, const unsigned &num_dims, const Eigen::VectorXd &tau, const Eigen::VectorXd &sigma) | |
The constructor. More... | |
TerminatingProcessLearningTriggeringKernel (const unsigned &n, const unsigned &num_dims, const Graph *graph, const Eigen::VectorXd &tau, const Eigen::VectorXd &sigma) | |
The constructor. More... | |
void | fit (const std::vector< Sequence > &data, const OPTION &options) |
Maximum likelihood estimation for the model parameters. More... | |
virtual void | NegLoglikelihood (double &objvalue, Eigen::VectorXd &gradient) |
Negative loglikelihood of multivariate terminating point process learning triggering kernels. More... | |
virtual void | Gradient (const unsigned &k, Eigen::VectorXd &gradient) |
Returns the gradient w.r.t. the model parameters on the k-th sequence. More... | |
virtual double | Intensity (const double &t, const Sequence &data, Eigen::VectorXd &intensity_dim) |
Intensity function of each dimension(or node) More... | |
virtual double | IntensityUpperBound (const double &t, const double &L, const Sequence &data, Eigen::VectorXd &intensity_upper_dim) |
Returns the upper bound of the summation of the intensity value on each dimension from time t to t + L given the history of past events in sequence data. Let \({\lambda_d^*(t)}\) be the conditional intensity function on the d-th dimension where \(d=1\dotso D\), and num_dims_ = D. This function returns \begin{align} \lambda_0^D(t) \geq \sum_{d=1}^D\sup_{\tau\in[t, t + \tau(t)]}\lambda^*_d(\tau), \end{align} where the returned value \(\lambda_0^D(t)\) will be used for Ogata's Thinning algorithm. More... | |
virtual double | IntensityIntegral (const double &lower, const double &upper, const Sequence &data) |
Returns the integral of the intensity function \(\int_{a}^b\lambda^*(\tau)d\tau\) where \(a = lower\) and \(b = upper\). More... | |
virtual double | PredictNextEventTime (const Sequence &data, const unsigned &num_simulations) |
Predict the next event timing by the expectation \(\int_{t_n}^\infty tf^*(t)dt\). Currently, we use the sample average by simulations to approximate the expectation since the conditional density \(f^*(t)\) normally does not have an analytic form. More... | |
void | PlotTriggeringKernel (const unsigned &dim_m, const unsigned &dim_n, const double &T, const double &delta) |
Visualize the learned triggering kernel between dimension dim_m and dimension dim_n. More... | |
Public Member Functions inherited from IProcess | |
IProcess (const unsigned &n, const unsigned &num_dims) | |
The constructor. More... | |
const Eigen::VectorXd & | GetParameters () |
Return the column vector of model parameters. More... | |
unsigned | GetNumDims () |
Return the number of dimensions in the process. More... | |
void | SetParameters (const Eigen::VectorXd &v) |
Set the model parameters. More... | |
void | PlotIntensityFunction (const Sequence &data) |
Plots the intensity functions based on the given sequence. It plots the intensity function and the associated event points up of each dimension in the same figure. More... | |
void | PlotIntensityFunction (const Sequence &data, const unsigned &dim_id) |
Plots the intensity function and the associated event points of the dimension dim_id. More... | |
Protected Member Functions | |
void | Initialize (const std::vector< Sequence > &data) |
Initialize the temporal features arrayK and arrayG from the input sequences. More... | |
void | InitializeWithGraph (const std::vector< Sequence > &data) |
Initialize the temporal features arrayK and arrayG from the input sequences where the dependency structure among the dimensions is given. More... | |
void | PostProcessing () |
Post process the learned dependency structure. More... | |
void | GetNegLoglikelihood (double &objvalue, Eigen::VectorXd &gradient) |
Compute the negative loglikelihood. More... | |
Protected Member Functions inherited from IProcess | |
void | InitializeDimension (const std::vector< Sequence > &data) |
Protected Attributes | |
const double | PI = 3.14159265358979323846 |
Eigen::VectorXd | tau_ |
A column vector of size num_rbfs_. Each component of tau_ is the location of the respective RBF basis function. More... | |
unsigned | num_rbfs_ |
Total number of RBF basis functions. More... | |
Eigen::VectorXd | sigma_ |
A column vector of size num_rbfs_. Each component of sigma_ is the bandwidth of the respective RBF basis function. More... | |
Eigen::VectorXd | sqrt2sigma_ |
Helper variable defined as \(\sqrt{2}\)sigma_. More... | |
Eigen::VectorXd | sqrt2PIsigma_ |
Helper variable defined as \(0.5\sqrt{2\pi}\)sigma_. More... | |
Eigen::VectorXd | erfctau_sigma_ |
Helper variable defined as \(\text{erfc}\bigg(\frac{tau\_}{\sqrt{2}\text{sigma_}}\bigg)\). More... | |
std::vector< std::vector< Eigen::MatrixXd > > | arrayK |
Temporal features associated with the intensity function. More... | |
std::vector< std::vector< Eigen::MatrixXd > > | arrayG |
Temporal features derived from the integral of the intensity. More... | |
Eigen::VectorXd | observation_window_T_ |
A column vector of length \(C\) which is the total number of sequences. Each component records the observation window in the respective sequence. More... | |
unsigned | num_sequences_ |
Total number of observed sequences. More... | |
OPTION | options_ |
A configuration object which saves the optimization options. More... | |
const Graph * | graph_ |
A graph object represents the dependency structure among the dimensions. More... | |
Protected Attributes inherited from IProcess | |
Eigen::VectorXd | parameters_ |
A column vector represents all model parameters of the process. More... | |
unsigned | num_dims_ |
The total number of dimensions of the process. More... | |
std::vector< std::vector< std::vector< double > > > | all_timestamp_per_dimension_ |
all_timestamp_per_dimension_ is a 3-d array where all_timestamp_per_dimension_[c][n][i] records the i-th event on the n-th dimension in the c-th sequence. More... | |
TerminatingProcessLearningTriggeringKernel implements the multivariate terminating process where the pairwise infection risk function can be learned from the data.
The Multivariate Terminating Point Process is an \(D\)-dimensional temporal point process with the conditional intensity function of each dimension \(d\) is given by \(\lambda_d^*(t) = \mathbb{I}\{N_d(t)\leq 1\}\cdot g(t)\) where \(N_d(t)\) is the number of events on the dimension \(d\), \(g(t)\) is a non-negative function, and \(\mathbb{I}\{{\cdot}\}\) is the indicator function. The Multivariate Terminating Process instantiates the continuous-time information diffusion model with general pairwise infection risk functions \(\gamma_{ji}(t, t^c_j)\), so we have \(g(t) = \sum_{j\neq i}\mathbb{I}(t^c_j < t)\gamma_{ji}(t, t^c_j)\) in a sequence \(c\).
Definition at line 25 of file TerminatingProcessLearningTriggeringKernel.h.
Regularization coefficients.
Enumerator | |
---|---|
LAMBDA |
Regularization coefficient for GROUP LASSO |
Definition at line 52 of file TerminatingProcessLearningTriggeringKernel.h.
Supported regularizations used to fit the standard Hawkes Process.
Enumerator | |
---|---|
L22 |
L22 norm \(\|\cdot\|_2^2\) |
GROUP |
GROUP LASSO \(\sum_{m}\|\boldsymbol{\alpha}_{mn}\|_2\) |
NONE |
No regularization |
Definition at line 33 of file TerminatingProcessLearningTriggeringKernel.h.
|
inline |
The constructor.
[in] | n | the number of parameters in total. |
[in] | num_dims | the number of dimensions in the process. |
[in] | tau | the location of each basis function. |
[in] | sigma | the bandwidth of each basis function. |
Definition at line 180 of file TerminatingProcessLearningTriggeringKernel.h.
|
inline |
The constructor.
[in] | n | the number of parameters in total. |
[in] | num_dims | the number of dimensions in the process. |
[in] | graph | the graph object representing the dependency structure among the dimensions. |
[in] | tau | the location of each basis function. |
[in] | sigma | the bandwidth of each basis function. |
Definition at line 200 of file TerminatingProcessLearningTriggeringKernel.h.
void TerminatingProcessLearningTriggeringKernel::fit | ( | const std::vector< Sequence > & | data, |
const OPTION & | options | ||
) |
Maximum likelihood estimation for the model parameters.
[in] | data | vectors of observed sequences. |
[in] | options | data structure sotring different configuration for the optimization algorithm and the respective regularizations. |
Definition at line 193 of file TerminatingProcessLearningTriggeringKernel.cc.
|
protected |
Compute the negative loglikelihood.
[out] | objvalue | negative loglikelihood. |
[out] | gradient | gradient of the parameters. |
Definition at line 230 of file TerminatingProcessLearningTriggeringKernel.cc.
|
virtual |
Returns the gradient w.r.t. the model parameters on the k-th sequence.
[in] | k | sequence index. |
[out] | gradient | the gradient vector w.r.t. the model parameters. |
Implements IProcess.
Definition at line 327 of file TerminatingProcessLearningTriggeringKernel.cc.
|
protected |
Initialize the temporal features arrayK and arrayG from the input sequences.
[in] | data | input collection of sequences |
Definition at line 13 of file TerminatingProcessLearningTriggeringKernel.cc.
|
protected |
Initialize the temporal features arrayK and arrayG from the input sequences where the dependency structure among the dimensions is given.
[in] | data | input collection of sequences |
Definition at line 74 of file TerminatingProcessLearningTriggeringKernel.cc.
|
virtual |
Intensity function of each dimension(or node)
\begin{align} \lambda_i^*(t) = \mathbb{I}\{N_i(t)\leq 1\}\sum_{j\neq i}\mathbb{I}(t^c_j < t)\gamma_{ji}(t, t^c_j) \end{align}
in a given sequence \(c\) where \(\gamma_{ji}(t, t^c_j) = \boldsymbol{k}_{ji}(t - t^c_j)^\top\boldsymbol{\alpha}_{ji}\). \(\boldsymbol{k}_{ji}(t)\) is a column vector where the \(l\)-th component is defined as \(\boldsymbol{k}_{jil}(t)=\exp(-(\frac{\tau_l - t}{\sqrt{2}\sigma_l})^2)\), and \(\tau_l\) and \(\sigma_l\) are the parameters of the basis function.
t | current given time. |
data | the given sequence of the past events until time t. |
intensity_dim | a column vector of size num_dims_ where each component stores the intensity value of the respetive dimension at time t given the past sequence in data. |
Implements IProcess.
Definition at line 334 of file TerminatingProcessLearningTriggeringKernel.cc.
|
virtual |
Returns the integral of the intensity function \(\int_{a}^b\lambda^*(\tau)d\tau\) where \(a = lower\) and \(b = upper\).
[in] | lower | starting point of the integral. |
[in] | upper | ending point of the integral. |
[in] | data | sequence of past events. |
Implements IProcess.
Definition at line 348 of file TerminatingProcessLearningTriggeringKernel.cc.
|
virtual |
Returns the upper bound of the summation of the intensity value on each dimension from time t to t + L given the history of past events in sequence data. Let \({\lambda_d^*(t)}\) be the conditional intensity function on the d-th dimension where \(d=1\dotso D\), and num_dims_ = D. This function returns
\begin{align} \lambda_0^D(t) \geq \sum_{d=1}^D\sup_{\tau\in[t, t + \tau(t)]}\lambda^*_d(\tau), \end{align}
where the returned value \(\lambda_0^D(t)\) will be used for Ogata's Thinning algorithm.
t | the starting time. |
L | the duration. |
data | the given sequence of the past events until time t. |
intensity_upper_dim | a column vector of size num_dims_ storing the upper bound of the intensity function on each dimension from time t to t + L. |
Implements IProcess.
Definition at line 341 of file TerminatingProcessLearningTriggeringKernel.cc.
|
virtual |
Negative loglikelihood of multivariate terminating point process learning triggering kernels.
\begin{align} \sum_{i=1}^D\bigg\{\frac{1}{C}\sum_{c = 1}^C\bigg(\log(\sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)\boldsymbol{k}_{ji}(t^c_i - t^c_j)^\top}_{\text{arrayK}[i][c](j,:)}\cdot\boldsymbol{\alpha}_{ji}) - \sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)\int_{t^c_j}^{t^c_i}\boldsymbol{k}_{ji}(t - t^c_j)^\top}_{\text{arrayG[i]}[c](j,:)} dt\cdot\boldsymbol{\alpha}_{ji}\bigg)\bigg\} \end{align}
[out] | objvalue | negative loglikelihood. |
[out] | gradient | gradient of the parameters. |
Implements IProcess.
Definition at line 295 of file TerminatingProcessLearningTriggeringKernel.cc.
void TerminatingProcessLearningTriggeringKernel::PlotTriggeringKernel | ( | const unsigned & | dim_m, |
const unsigned & | dim_n, | ||
const double & | T, | ||
const double & | delta | ||
) |
Visualize the learned triggering kernel between dimension dim_m and dimension dim_n.
dim_m | the influential dimension |
dim_n | the influenced dimension |
T | observation window. |
delta | unit step size to partition the time line. |
Definition at line 359 of file TerminatingProcessLearningTriggeringKernel.cc.
|
protected |
Post process the learned dependency structure.
Definition at line 134 of file TerminatingProcessLearningTriggeringKernel.cc.
|
virtual |
Predict the next event timing by the expectation \(\int_{t_n}^\infty tf^*(t)dt\). Currently, we use the sample average by simulations to approximate the expectation since the conditional density \(f^*(t)\) normally does not have an analytic form.
[in] | data | the sequence of past events. |
[in] | num_simulations | number of simulations we use to calculate the sample average. |
Implements IProcess.
Definition at line 354 of file TerminatingProcessLearningTriggeringKernel.cc.
|
protected |
Temporal features derived from the integral of the intensity.
The log-likelihood of observing a collection of C sequences can be derived as the following:
\begin{align} \sum_{i=1}^D\bigg\{\frac{1}{C}\sum_{c = 1}^C\bigg(\log(\sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)\boldsymbol{k}_{ji}(t^c_i - t^c_j)^\top}_{\text{arrayK}[i][c](j,:)}\cdot\boldsymbol{\alpha}_{ji}) - \sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)\int_{t^c_j}^{t^c_i}\boldsymbol{k}_{ji}(t - t^c_j)^\top}_{\text{arrayG[i]}[c](j,:)} dt\cdot\boldsymbol{\alpha}_{ji}\bigg)\bigg\} \end{align}
arrayG[i][c] is a \(D\) by num_rbfs_ matrix which stores the integral of the influence from dimension dimension \(j\) to dimension \(i\) in the sequence c.
Definition at line 128 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Temporal features associated with the intensity function.
The log-likelihood of observing a collection of C sequences can be derived as the following:
\begin{align} \sum_{i=1}^D\bigg\{\frac{1}{C}\sum_{c = 1}^C\bigg(\log(\sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)\boldsymbol{k}_{ji}(t^c_i - t^c_j)^\top}_{\text{arrayK}[i][c](j,:)}\cdot\boldsymbol{\alpha}_{ji}) - \sum_{j\neq i}\underbrace{\mathbb{I}(t^c_j < t^c_i)\int_{t^c_j}^{t^c_i}\boldsymbol{k}_{ji}(t - t^c_j)^\top}_{\text{arrayG[i]}[c](j,:)} dt\cdot\boldsymbol{\alpha}_{ji}\bigg)\bigg\} \end{align}
arrayK[i][c] is a \(D\) by num_rbfs_ matrix which stores the influence from the event at the time \(t^c_{j}\) on dimension \(j\) to the event at the time \(t^c_{i}\) on dimension \(i\) in the sequence c.
Definition at line 117 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Helper variable defined as \(\text{erfc}\bigg(\frac{tau\_}{\sqrt{2}\text{sigma_}}\bigg)\).
Definition at line 105 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
A graph object represents the dependency structure among the dimensions.
Definition at line 169 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Total number of RBF basis functions.
Definition at line 88 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Total number of observed sequences.
Definition at line 137 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
A column vector of length \(C\) which is the total number of sequences. Each component records the observation window in the respective sequence.
Definition at line 132 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
A configuration object which saves the optimization options.
Definition at line 165 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Definition at line 78 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
A column vector of size num_rbfs_. Each component of sigma_ is the bandwidth of the respective RBF basis function.
Definition at line 93 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Helper variable defined as \(0.5\sqrt{2\pi}\)sigma_.
Definition at line 101 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
Helper variable defined as \(\sqrt{2}\)sigma_.
Definition at line 97 of file TerminatingProcessLearningTriggeringKernel.h.
|
protected |
A column vector of size num_rbfs_. Each component of tau_ is the location of the respective RBF basis function.
Definition at line 83 of file TerminatingProcessLearningTriggeringKernel.h.