We are surrounded by social and information share networks, over which diffusions of information, events, virus, takes place constantly. We can often observe that after some influential users adopt certain new product or idea, they actively influence the behaviors of their friends, which in turn makes more friends of friends adopt the product through word-of-mouth.
The specific questions we seek to address in this NIPS 2013 paper is to accurately estimate the number of follow-ups which can be triggered by a given set of earlier influential users, and then to identify a set of influential users, to whom we will give promotions, in order to trigger the largest expected number of follow-ups as soon as possible ? These questions are interesting for that advertisers who want to have an efficient and effective campaign for their new products. But, they are also very challenging in that these questions are time sensitive . Intuitively, 1000 buyers in a day will be much better than 1000 buyers in a year. Furthermore, the solution must be scalable enough to deal with large real-world networks.
ConTinEst is a scalable randomized algorithm for influence estimation in continuous-time diffusion networks. It can estimate the influence of every node in a network with $|\mathcal{V}|$ nodes and $|\mathcal{E}|$ edges to an accuracy of $n=\epsilon$ using $O(1/\epsilon^2)$ randomizations and $\tilde{O}(n|\mathcal{E}| + n|\mathcal{V}|)$ computations. When used as a subroutine in a greedy influence maximization approach, ConTinEst is guaranteed to find a set of $C$ nodes with the influence of at least $(1 - 1/e)OPT - 2C\epsilon$, where $OPT$ is the optimum value. Experiments on both synthetic and real-world data show that ConTinEst can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence.
Scalable Influence Estimation in Continuous-Time Diffusion Networks (Outstanding Paper Award , full oral presentation). Nan Du, Le Song, Manuel Gomez Rodriguez, and Hongyuan Zha. Neural Information Processing Systems (NIPS). Dec. 5 - Dec. 10, 2013, Lake Tahoe, Nevada, USA. [PAPER] [SLIDE][POSTER][Bibtex]
Terms of agreement. We appreciate your interest in our research. The C++ implementation is provided and maintained for our research projects conducted in the School of Computational Science and Engineering at Georgia Institute of Technology. All rights of the source code implementations are reserved. Comments, bugs, and suggestions, if any, can be directed to Nan Du (dunan AT gatech.edu). Redistributions and use of them in source or binary forms, with or without modification, are permitted only to academic purposes. If you use this code, or any part of it, please cite our paper as
@inproceedings{DuSonGomZha13, title={Scalable Influence Estimation in Continuous-Time Diffusion Networks}, author={Nan Du and Le Song and Manuel Gomez-Rodriguez and Hongyuan Zha}, booktitle={Advances in Neural Information Processing Systems (NIPS)}, year={2013} }
This code is provided free for non-commercial use under the terms of the GNU General Public License. The current version is the 1.0 and has been released on August 27, 2013. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. It is understood that by downloading any item from this Web page and the pages linked to by this page, one has entirely understood and fully acknowledged to agree all of the terms.
Input network format. We include a sample network (1,024 nodes) in the implementation. The input file "std_weibull_DAG_core-1024-1-network.txt" consists of two blocks separated by a blank line. The node ID starts from 0 to 1,023. In the 1st block, each line has the format
node ID, node Name
In the 2nd block, each line represents a directed edge from node ID1 to node ID2 with the following format.
node ID1, node ID2, shape, scale
Because we assume the most general form for the pairwise transmission function, for the simplicity of the demo code, we take the Weibull Distribution as the transmission function with the two parameters shape and scale, respectively. When $shape = 1.0$, it is reduced to the exponential distribution; When $shape = 2.0$, it is equivalent to the Rayleigh Distribution. A valid network format can be the following.
0,0
1,1
2,2
0,1,0.0959956,0.0888947
0,2,3.95348,9.42268
1,2,1.61835,8.38794
The implementation consists of the following five classes
Class Graph
Graph has the following properties.
Graph has the following methods.
The constructor function where
Read network structure from the input graph file.
Draw samples from the pairwise transmission functions for each edge.
An utility function to print the loaded network.
An utility function to find the node with the maximum out-degree. Return a pair of (node ID, out-degree)
Class ConTinEst
ConTinEst has the following properties.
ConTinEst has the following methods.
The constructor function where
Generate least-label lists for each node given a sampled set of transmission times, and a sampled set of random labels.
Generate all least-label lists for each node based on the required number of sampled sets of pairwise transmission times (m_num_samples), and the number of sampled sets of labels (m_num_rankings).
Estimate the influence of a given set of sources by time T.
Maximize the influence using the greedy algorithm.
Return the set of selected sources with different sizes at different time.
Compile on Linux/MacOS
To compile the sources, simply type the following to get the executable file with the name execute
tar -zxvf ConTinEst.tgz
cd ConTinEst
make
Run the demo on Linux/MacOS
To run the demo code within the directory ConTinEst, simply type
./execute
The demo code first loads the network file and calculate the least-label lists for each node.
std_weibull_DAG_core-1024-1-network.txt
Get all least-label lists : 10000 sets of transmission times; 5 sets of random labels; done
Then, it finds the node with the maximum out-degree.
node 0 has the largest out-degree 24
Next, it compares the estimated influence at different time given by ConTinEst with that given by the naive simulation for the node with the largest out-degree.
Estimate Influence T = 1 done
Simulation Check T = 1 with 10000 samples
ConTinEst : 7.70149
Simulation : 7.6682
Estimate Influence T = 2 done
Simulation Check T = 2 with 10000 samples
ConTinEst : 14.0139
Simulation : 13.8913
Estimate Influence T = 3 done
Simulation Check T = 3 with 10000 samples
ConTinEst : 24.2149
Simulation : 24.1826
Estimate Influence T = 4 done
Simulation Check T = 4 with 10000 samples
ConTinEst : 38.9468
Simulation : 39.0016
Estimate Influence T = 5 done
Simulation Check T = 5 with 10000 samples
ConTinEst : 57.3021
Simulation : 57.9743
Estimate Influence T = 6 done
Simulation Check T = 6 with 10000 samples
ConTinEst : 79.7948
Simulation : 79.656
Estimate Influence T = 7 done
Simulation Check T = 7 with 10000 samples
ConTinEst : 104.704
Simulation : 104.389
Estimate Influence T = 8 done
Simulation Check T = 8 with 10000 samples
ConTinEst : 133.896
Simulation : 133.148
Estimate Influence T = 9 done
Simulation Check T = 9 with 10000 samples
ConTinEst : 163.586
Simulation : 163.121
Estimate Influence T = 10 done
Simulation Check T = 10 with 10000 samples
ConTinEst : 192.411
Simulation : 192.212
Finally, it selects the 10 sources that can achieve the largest expected number of infected nodes by time 10.
Influence Maximization by selecting 10 nodes with T = 10 done
selected sources : 0;1;2;4;10;16;17;524;890;929;