Efficient Location-Aware Influence Maximization

Guoliang Li, Shuo Chen, Jianhua Feng, Kian-lee Tan, Wen-Syan Li

Abstract

Although influence maximization, which selects a set of users in a social network to maximize the expected number of users influenced by the selected users (called influence spread), has been extensively studied, existing works neglected the fact that the location information can play an important role in influence maximization. Many real-world applications such as location-aware word-of-mouth marketing have location-aware requirement. In this paper we study the location-aware influence maximization problem. One big challenge in location-aware influence maximization is to develop an efficient scheme that offers wide influence spread.

To address this challenge, we propose two greedy algorithms with 1 − 1/e approximation ratio. To meet the instantspeed requirement, we propose two efficient algorithms with ǫ · (1 − 1/e) approximation ratio for any ǫ ∈ (0, 1]. Experimental results on real datasets show our method achieves high performance while keeping large influence spread and significantly outperforms state-of-the-art algorithms.

Download

Input Format

All in plain text files (unzip first). Must be unzipped under the data/ directory. Because I only use data_name, not the file path as the input argument. For example, to use gowalla.inf and gowalla.loc dataset, only need to say gowalla in the input argument list. See details below.

Social graph

the social graph has extension '.inf'

the beginning line is the number of users (user id starts from 1 in this dataset), and the number of bi-directed edges (also half of the total rows in the file minus the first line, since each row represents a directed edge).

the rest lines are splitted by " ". First two fields are the ids, and the third is the weight from source node to target, and the last vice versa.

[# users] [# bi-directed edges]
[source-id] [target-id] [weight-1] [weight-2]

# example
3 2
1 3 0.00047619047619 0.00444444444444
3 1 0.00444444444444 0.00047619047619
1 5 0.0266666666667 0.0266666666667
5 1 0.0266666666667 0.0266666666667

Location

the social graph has extension '.loc'

the beginning line is the number of rows, also number of users (must be the same with the social graph data. user id starts from 0 in this dataset) with location information.

the rest lines are splitted by " ".

[# users]
[id] [latitude] [longitude]

# example
5
0 30.262996 -97.750338
1 46.040318 6.6285
2 34.059847 -118.341607
3 37.780893 -122.414839
4 30.286406 -97.733543

Queries

Your queries file should have the following format.

The first line is the number of queries (also number of rows in this file minus the first line).

The rest, each line is a query, with following format.

[latitude] [longitude] [width] [height] [k]

width is represented by number of latitudes. height is represented by number of longitudes.

In total they represent a box centering at (latitude, longitude), with width and height.

IMPORTANT NOTE : make sure the id in location file is 0 based and id in social graph file is 1 based. Also, make sure these are one-to-one mapped. I,E, there can not be a user in the social graph but without location data.

Program Usage

./mip -method [k] [data_name] < [queries]

Hint or assembly method may need wait for a while at the beginning, since it will build index on-the-go.

Output Format

IMPORTANT: you need to create a tmp/ folder in the current directory, that's where output files will be located.

[time] [k] [spread] [# users within query area]
[time] [k] [spread] [# users within query area]
[time] [k] [spread] [# users within query area]

Contact

For any question about this study, please feel free to contact Guoliang Li.