Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
dlhu authored Nov 7, 2023
0 parents commit b5e4666
Show file tree
Hide file tree
Showing 48 changed files with 119,552 additions and 0 deletions.
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<h3>
<center>Time-aware Influence Minimization via Blocking Social Networks</center>
</h3>

### Information

Version 1.0: Implementation of Algorithm for Time-aware Influence Minimization in Social Networks. For more details about our code, please read our paper: "Xueqin C., Jiajie F., Qing L., Yunjun G., Baihua Z., Lu C., Time-aware Influence Minimization via Blocking Social Networks"

### Introduction

1. This repository contains the full version of our paper.
2. This repository contains the codes and datasets used in our paper.
3. **Time-aware Influence Minimization via Blocking Social Networks**.

Abstract: We study the problem of Time-aware Influence Minimization (Timin) in social networks, aiming to minimize the negative influence concerning a critical deadline by temporarily blocking some nodes of the given social network. To this end, first, we introduce a novel Time-delayed Linear Threshold (TLT) model by considering the time delay of influence, i.e., when a node is active, its out-neighbors receive the influence weight after a certain time delay. Building on the TLT model, we formally define the Timin problem, and prove that it is NP-hard, monotone, and supermodular. To tackle the Timin problem, we initially devise a basic greedy algorithm, Timin-Greedy, achieving ($1-1/e$) approximation. Since it is #P-hard to compute the exact negative influence spread for any node set in Timin-Greedy, we devise a Temporal Reverse Influence Sampling technique to estimate the expected negative influence spread, and propose a more efficient algorithm TESTIM, maintaining ($1-1/e-\epsilon$) approximation. To further improve the efficiency, we propose a heuristic algorithm NeighborReplace based on an important observation that potential blocking nodes are often located near the negative source. Furthermore, we investigate two variants of the Timin problem, which consider additional constraints. Finally, our extensive experiments demonstrate that (1) TESTIM is up to 10$\times$ faster than the baselines, yielding 30%-50% more negative influence spread reductions, and (2) compared with TESTIM, NeighborReplace exhibits 5$\times$ speedup while having comparable negative influence spread reductions.
### Datasets

We use six publicly available real-world road networks, including EmailCore, Epinions, Amazon, Youtube, FaceBook and LiveJournal datasets.

All of them can be obtained from [1].

[1] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.

### Algorithms

The following files are the codes for our proposed algorithms. We implemented all the codes using C++ with CLion 2022.3.2.

1. First we use el2bin.cpp$^{[2]}$ (can be found in genSeed folder) to convert from a graph file in weighted edge list to two binary files that encode the graph and its transpose;

```shell
./el2bin <input file> <output file> <transpose output file>
```

2. Then we use fake_seeds.cpp to generate fake_seeds (random or influential). Specifically, when generating influential seeds, we set -m top, and the -f parameter means that the fake seeds will be randomly drawn from the top f-th fraction for generating influential fake seeds, the orders of the nodes are determined by the singleton influence file. When choosing random seeds, just use four parameters (-n -o -k and -m), while setting -m random. The usages are listed as follows (the first line is for influential seeds and the second is for random seeds):

```shell
./fake_seeds -n <number of nodes> -o <seed output file> -k <number of seeds> -m <top> -f <fraction> -s <singleton influence file>
```

```shell
./fake_seeds -n <number of nodes> -o <seed output file>  -k <number of seeds> -m <random>
```

3. Then use **algorithm** (can be found in TESTIM directory) to tackle our problem, **algorithm** includes:

- TLT: A TM-loss Greedy function for finding the best seeds set for temporal influence minimization;
- Advanced_tlt: The heurist way for solving the problem;
- deadline_solution: For solving variant problem DSTIMIN;
- Minimal_block_solution: For solving variant problem BSTIMIN.

The usages are listed as follows:

```shell
./TLT -i <input networkFile> -o <result output file> -fakeseeds <fakeSeed file> -k <budget of blockSet> -epsilon <xx> -t <max edge delay> -T <deadline> -lamda <edge delay parameters> -delta <xx>
```

```shell
./Advanced_tlt -i <input networkFile> -o <result output file> -fakeseeds <fakeSeed file> -k <budget of blockSet> -t <max edge delay> -T <deadline> -lamda <edge delay parameters>
```

```shell
./deadline_solution -i <input networkFile> -o <result output file> -fakeseeds <fakeSeed file> -k <budget of blockSet> -t <max edge delay> -T <deadline> -lamda <edge delay parameters> -preFile <file for determine the max_T and max_alpha> -alpha <the percentage of users influenced>
```

```shell
./Minimal_block_solution -i <input networkFile> -o <result output file> -fakeseeds <fakeSeed file> -t <max edge delay> -T <deadline> -lamda <edge delay parameters> -alpha <the percentage of users influenced>
```

[2] Michael Simpson, Farnoosh Hashemi, and Laks VS Lakshmanan. 2022. Misinformation mitigation under differential propagation rates and temporal penalties. Proceedings of the VLDB Endowment 15, 10 (2022), 2216–2229.

### Running Environment

A 64-bit Linux-based OS.

GCC 4.7.2 and later.
234 changes: 234 additions & 0 deletions TESTIM/Minimal_block_solution.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
//
// Created by fujiajie on 7/2/23.
//
#include "option.h"
#include "hypergraph.hpp"
#include "graph.h"
#include <iostream>
#include <ctime>
#include <cmath>
#include <cstdlib>
#include <cstdio>
#include <vector>
#include <cstring>
#include <omp.h>

using namespace std;


/*
* read fake seeds
*/
float readFakeInfluence(const char* filename)
{
float fi;
ifstream in(filename);
in >> fi;
in.close();
return fi;
}


int main(int argc, char ** argv)
{
srand(time(NULL));
bool time_generate_flag = false;

OptionParser op(argc, argv);
if (!op.validCheck()){
printf("Parameters error, please check the readme.txt file for correct format!\n");
return -1;
}

char * inFile = op.getPara("-i");
if (inFile == NULL){
inFile = (char*)"network";
}


char * outFile = op.getPara("-o");
if (outFile == NULL){
outFile = (char*)"results.txt";
}

char * fakeSeedsFile = op.getPara("-fakeseeds");
if (fakeSeedsFile == NULL){
fakeSeedsFile = (char*)"fake.seeds";
}


char * tmp = op.getPara("-epsilon");
float epsilon = 0.2;
if (tmp != NULL){
epsilon = atof(tmp);
}

int t = 8;
tmp = op.getPara("-t");
if (tmp != NULL){
t = atoi(tmp);
}

int T = 32;
tmp = op.getPara("-T");
if (tmp != NULL){
T = atoi(tmp);
}

int lamda= 1;
tmp = op.getPara("-lamda");
if (tmp != NULL){
lamda = atoi(tmp);
time_generate_flag = false;
}

float p = 0.3;
tmp = op.getPara("-p");
if (tmp != NULL){
p = atof(tmp);
time_generate_flag = true;
}


float alpha = 0.25;
tmp = op.getPara("-alpha");
if (tmp != NULL){
alpha = atof(tmp);
}

bool nub = false;
tmp = op.getPara("-nub");
if (tmp != NULL){
nub = atoi(tmp);
}


float ew = -1.0;
tmp = op.getPara("-ew");
if (tmp != NULL){
ew = atof(tmp);
}
bool fixed = (ew < 0.0) ? false : true;

cout << "\n*******************" << endl;
cout << "\tSTART" << endl;
cout << "*******************\n" << endl;

Graph g(t,T);
g.readGraph(inFile, fixed, ew);
g.readFakeSeeds(fakeSeedsFile);
g.init_visit_thresh_hold();
int n = g.getNumNodes();

float delta = 1.0/n;
tmp = op.getPara("-delta");
if (tmp != NULL){
delta = atof(tmp);
}

float precision = 1-1/exp(1);
tmp = op.getPara("-precision");
if (tmp != NULL){
precision = atof(tmp);
}



double start_total = omp_get_wtime();
double start = omp_get_wtime();

cout << "fake seed set: ";
const vi &fs = g.getFakeSeeds();
for (unsigned int s = 0; s < g.getNumFakeSeeds(); s++) {
cout << fs[s] << " ";
}
cout << endl;

const int try_times = 3;
start = omp_get_wtime();
double e = exp(1);
// double primary_bound = (8 * e * e + e * 4 + 2 * e * e * epsilon) / (epsilon * epsilon * e * e + e * 4 * epsilon + 4) * n * (log(6 / delta)+ n * log(2)) / epsilon / epsilon;
double primary_bound = (8 + 2 * epsilon) * n * (log(6 / delta)+ n * log(2)) / epsilon / epsilon;
float ratio;
float write_ratio;
int write_block_size = 0;
for (int try_time = 0; try_time < try_times; try_time++) {
int MIN_left = 1, MAX_right = n;
HyperGraph coll_one(n);
coll_one.hyperGT.clear();
vector<int> RR_block_set;
long long cur_samples = g.getNumNodes();
long long samples_done = 0;
unsigned int count_cnt=0;
vector<int> is_RB_set_last;
vector<int> set_dist_last;
int usefulTR = 0, usefulBR = 0;
int mid;
float inf_number_record = 0.0, inf_number_rec = 0.0;
vector<bool>mark_RB_set, mark_dist_last;
coll_one.hyperG.clear();
coll_one.hyperGB.clear();
for(int i=0; i<n; i++)
coll_one.hyperG.push_back(vi());
for(int i=0; i<n; i++)
coll_one.hyperGB.push_back(vi());
mid = MIN_left + (MAX_right - MIN_left+1) / 2;
do{
count_cnt++;
auto bd_return = addHyperedgeParallel(g, coll_one, cur_samples, samples_done, RR_block_set, count_cnt, delta, epsilon, is_RB_set_last, set_dist_last, usefulTR, usefulBR, mid, time_generate_flag, lamda, p);
samples_done = cur_samples;
if(bd_return.first){
inf_number_rec = bd_return.second;
inf_number_record = bd_return.second - coll_one.EstPro1;
break;
}
cur_samples *= 2;
if(cur_samples > (long long)primary_bound * (1 + coll_one.eps1) / coll_one.EstPro2){
inf_number_record = bd_return.second - coll_one.EstPro1;
cout << "It is because of sample too small\n";
}
}while(cur_samples < (long long)primary_bound * (1 + coll_one.eps1) / coll_one.EstPro2);
ratio = inf_number_record / n;
if(ratio > alpha){
MIN_left = mid + 1;
}
else
MAX_right = mid;
while(MIN_left < MAX_right){
mid = ( MIN_left + MAX_right ) / 2;
//cout<< MIN_left << MAX_right << endl; //test code
//cout << "now block " << mid << "nodes\n"; //test code
auto saved_num = Simple_BuildBlockSet(g, coll_one, usefulBR, RR_block_set, mid, inf_number_rec);
inf_number_record = inf_number_rec - saved_num;
ratio = inf_number_record / n;
if(ratio > alpha){
MIN_left = mid + 1;
}
else
MAX_right = mid;
}
cout<<"The Minimal block size is " << MIN_left <<"\n";
cout<<"The last ratio " << ratio <<"\n";
// if(ratio <= alpha ){
if(MIN_left > write_block_size){
write_block_size = MIN_left;
write_ratio = ratio;
}
// }
}
double time_total = omp_get_wtime()-start_total;
ofstream out(outFile);
out << "The minimal block size is " << write_block_size << endl;
out << "inf ratio after block = " << write_ratio << endl;
out << "The average time cost is " << time_total * 1.0 / try_times << "s";
out.close();
cout << "\nAverage Time: " << time_total / try_times << "s" << endl;
cout << "\n*******************" << endl;
cout << "\tALL DONE" << endl;
cout << "*******************" << endl;

//
// ALL DONE
//
}

Loading

0 comments on commit b5e4666

Please sign in to comment.