Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (6) : 146809    https://doi.org/10.1007/s11704-019-8235-9
RESEARCH ARTICLE
Matching user identities across social networks with limited profile data
Ildar NURGALIEV1,2, Qiang QU1(), Seyed Mojtaba Hosseini BAMAKAN1,3, Muhammad MUZAMMAL1,4
1. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
2. Sberbank, Moscow 121170, Russia
3. Department of Management, Yazd University, Yazd 89195-741, Iran
4. Department of Computer Science, Bahria University, Islamabad 44000, Pakistan
 Download: PDF(610 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Privacy preservation is a primary concern in social networkswhich employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age, location, education, interests, and others. The task of matching user identities across different social networks is considered a challenging task. In this work, we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data, i.e., user-name and friendship. Thus, we propose a framework, ExpandUIL, that includes three standalone algorithms based on (i) the percolation graph matching in ExpandFullName algorithm, (ii) a supervised machine learning algorithm that works with the graph embedding, and (iii) a combination of the two, ExpandUserLinkage algorithm. The proposed framework as a set of algorithms is significant as, (i) it is based on the network topology and requires only name feature of the nodes, (ii) it requires a considerably low initial seed, as low as one initial seed suffices, (iii) it is iterative and scalable with applicability to online incoming stream graphs, and (iv) it has an experimental proof of stability over a real ground-truth dataset. Experiments on real datasets, Instagram and VK social networks, show upto 75% recall for linked accounts with 96% accuracy using only one given seed pair.

Keywords social networks      user identity linkage      graph structure learning      maximum subgraph matching      graph percolation     
Corresponding Author(s): Qiang QU   
Just Accepted Date: 18 March 2019   Issue Date: 26 May 2020
 Cite this article:   
Ildar NURGALIEV,Qiang QU,Seyed Mojtaba Hosseini BAMAKAN, et al. Matching user identities across social networks with limited profile data[J]. Front. Comput. Sci., 2020, 14(6): 146809.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8235-9
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I6/146809
1 K Shu, S, Wang J Tang, R Zafarani, H Liu. User identity linkage across online social networks: a review. ACM SIGKDD Explorations Newsletter, 2017, 18(2): 5–17
https://doi.org/10.1145/3068777.3068781
2 F Carmagnola, F Cena. User identification for cross-system personalisation. Information Sciences, 2009, 179(1): 16–32
https://doi.org/10.1016/j.ins.2008.08.022
3 M Madden, A Lenhart, S Cortesi, U Gasser, M Duggan, A Smith, M Beaton. Teens, social media, and privacy. Pew Research Center, 2013, 21: 2–86
4 O Goga, P Loiseau, R Sommer, R Teixeira, K P Gummadi. On the reliability of profile matching across large online social networks. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 1799–1808
https://doi.org/10.1145/2783258.2788601
5 A Korolova, R Motwani, S U Nabar, Y Xu. Link privacy in social networks. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008, 289–298
https://doi.org/10.1145/1458082.1458123
6 C F Chiasserini, M Garetto, E Leonardi. Social network deanonymization under scale-free user relations. IEEE/ACM Transactions on Networking, 2016, 24(6): 3756–3769
https://doi.org/10.1109/TNET.2016.2553843
7 A Narayanan, V Shmatikov. De-anonymizing social networks. In: Proceedings of the 30th IEEE Symposium on Security and Privacy. 2009, 173–187
https://doi.org/10.1109/SP.2009.22
8 K Sharad. True friends let you down: benchmarking social graph anonymization schemes. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. 2016, 93–104
https://doi.org/10.1145/2996758.2996765
9 C F Chiasserini, M Garetto, E Leonardi. Social network deanonymization under scale-free user relations. IEEE/ACM Transactions on Networking, 2016, 24(6): 3756–3769
https://doi.org/10.1109/TNET.2016.2553843
10 E Kazemi, S H Hassani, M Grossglauser. Growing a graph matching from a handful of seeds. Proceedings of the VLDB Endowment, 2015, 8(10): 1010–1021
https://doi.org/10.14778/2794367.2794371
11 J Vosecky, D Hong, V Y Shen. User identification across multiple social networks. In: Proceedings of the 1st International Conference on Networked Digital Technologies. 2009, 360–365
https://doi.org/10.1109/NDT.2009.5272173
12 S Vosoughi, H Zhou, D Roy. Digital stylometry: linking profiles across social networks. In: Proceedings of International Conference on Social Informatics. 2015, 164–177
https://doi.org/10.1007/978-3-319-27433-1_12
13 H Hazimeh, E Mugellini, O A Khaled, P Cudr�-Mauroux. Socialmatching++: a novel approach for interlinking user profiles on social networks. In: Proceedings of the 4th International Workshop on PROFILES Co-located with the 16th ISWC. 2017
14 A R Lorenzo Livi. The graph matching problem. Pattern Analysis and Applications, 2013, 16: 253–283
https://doi.org/10.1007/s10044-012-0284-8
15 D J Cook, L B Holder. Graph-based data mining. IEEE Intelligent Systems, 2000, 15(2): 32–41
https://doi.org/10.1109/5254.850825
16 R Singh, J Xu, B Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 2008, 105(35): 12763–12768
https://doi.org/10.1073/pnas.0806627105
17 E Kazemi, M Grossglauser. On the structure and efficient computation of isorank node similarities. Journal of CoRR, 2016, arXiv preprint arXiv:1602.00668
18 D Al-Azizy, D E Millard, I Symeonidis, K O’Hara, N Shadbolt. A literature survey and classifications on data deanonymisation. In: Proceedings of the 10th International Conference on Risks and Security of Internet and Systems. 2015, 36–51
https://doi.org/10.1007/978-3-319-31811-0_3
19 H Xie, K Gao, Y Zhang, J Li, H Ren. Common visual pattern discovery via graph matching. In: Proceedings of the 19th International Conference on Multimedia. 2011, 1385–1388
https://doi.org/10.1145/2072298.2072021
20 A Shaji, A Varol, L Torresani, P Fua. Simultaneous point matching and 3D deformable surface reconstruction. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010, 1221–1228
https://doi.org/10.1109/CVPR.2010.5539827
21 A Sanfeliu, K Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on System, Man, and Cybemetics, 1983, 13(3): 353–362
https://doi.org/10.1109/TSMC.1983.6313167
22 B T Messmer, H Bunke. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Transaction on Pattern Analysis & Machine Intelligence, 1998, 20(5): 493–504
https://doi.org/10.1109/34.682179
23 H Bunke, G Allermann. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1983, 1(4): 245–253
https://doi.org/10.1016/0167-8655(83)90033-8
24 M Neuhaus, H Bunke. An error-tolerant approximate matching algorithm for attributed planar graphs and its application to fingerprint classification. In: Proceedings of Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). 2004, 180–189
https://doi.org/10.1007/978-3-540-27868-9_18
25 G Levi. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo, 1973, 9(4): 341
https://doi.org/10.1007/BF02575586
26 H Bunke, K Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 1998, 19(3–4): 255–259
https://doi.org/10.1016/S0167-8655(97)00179-7
27 M L Fernández, G Valiente. A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 2001, 22(6–7): 753–758
https://doi.org/10.1016/S0167-8655(01)00017-4
28 W D Wallis, P Shoubridge, M Kraetzl, D Ray. Graph distances using graph union. Pattern Recognition Letters, 2001, 22(6–7): 701–704
https://doi.org/10.1016/S0167-8655(01)00022-8
29 H Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 1997, 18(8): 689–694
https://doi.org/10.1016/S0167-8655(97)00060-3
30 M Neuhaus, H Bunke. A probabilistic approach to learning costs for graph edit distance. In: Proceedings of the 17th International Conference on Pattern Recognition. 2004, 389–393
https://doi.org/10.1109/ICPR.2004.1334548
31 A P Dempster, N M Laird, D B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 1977, 39(1): 1–22
https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
32 R A Redner, H F Walker. Mixture densities, maximum likelihood and the em algorithm. SIAM Review, 1984, 26(2): 195–239
https://doi.org/10.1137/1026034
33 F M Bianchi, L Livi, A Rizzi, A Sadeghian. A granular computing approach to the design of optimized graph classification systems. Soft Computing, 2014, 18(2): 393–412
https://doi.org/10.1007/s00500-013-1065-z
34 F M Bianchi, L Livi, A Rizzi. Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Analysis and Applications, 2016, 19(3): 745–763
https://doi.org/10.1007/s10044-014-0440-4
35 M A Lozano, F Escolano. Graph matching and clustering using kernel attributes. Neurocomputing, 2013, 113: 177–194
https://doi.org/10.1016/j.neucom.2013.01.015
36 S Janson, T Łuczak, T Turova, T Vallier. Bootstrap percolation on the random graph Gn,p. The Annals of Applied Probability, 2012, 22(5): 1989–2047
https://doi.org/10.1214/11-AAP822
37 L Yartseva , M Grossglauser. On the performance of percolation graph matching. Proceedings of the 1st ACM Conference on Online Social Networks. 2013, 119–130
https://doi.org/10.1145/2512938.2512952
38 P Erdös, A Rényi. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences, 1960, 5(1): 17–60
39 D J Watts, S H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 1998, 393(6684): 440
https://doi.org/10.1038/30918
40 A L Barabási, R Albert. Emergence of scaling in random networks. Science, 1999, 286(5439): 509–512
https://doi.org/10.1126/science.286.5439.509
41 F Chung, L Lu. Connected components in random graphs with given expected degree sequences. Annals of Combinatorics, 2002, 6(2): 125–145
https://doi.org/10.1007/PL00012580
42 K Sharad. Change of guard: the next generation of social graph deanonymization attacks. In: Proceedings of ACM Workshop on Artificial Intelligence and Security. 2016, 105–116
https://doi.org/10.1145/2996758.2996763
43 T K Ho. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995, 278–282
[1] Article highlights Download
[1] Gang WU, Zhiyong CHEN, Jia LIU, Donghong HAN, Baiyou QIAO. Task assignment for social-oriented crowdsourcing[J]. Front. Comput. Sci., 2021, 15(2): 152316-.
[2] Kai LI, Guangyi LV, Zhefeng WANG, Qi LIU, Enhong CHEN, Lisheng QIAO. Understanding the mechanism of social tie in the propagation process of social network with communication channel[J]. Front. Comput. Sci., 2019, 13(6): 1296-1308.
[3] Zhongqing WANG, Shoushan LI, Guodong ZHOU. Personal summarization from profile networks[J]. Front. Comput. Sci., 2017, 11(6): 1085-1097.
[4] Yuan SU,Xi ZHANG,Lixin LIU,Shouyou SONG,Binxing FANG. Understanding information interactions in diffusion: an evolutionary game-theoretic perspective[J]. Front. Comput. Sci., 2016, 10(3): 518-531.
[5] Rong-Hua LI, Jianquan LIU, Jeffrey Xu YU, Hanxiong CHEN, Hiroyuki KITAGAWA. Co-occurrence prediction in a large location-based social network[J]. Front Comput Sci, 2013, 7(2): 185-194.
[6] Xiaolong ZHENG, Yongguang ZHONG, Daniel ZENG, Fei-Yue WANG. Social influence and spread dynamics in social networks[J]. Front Comput Sci, 2012, 6(5): 611-620.
[7] Guojun WANG, Jie WU. FlowTrust: trust inference with network flows[J]. Front Comput Sci Chin, 2011, 5(2): 181-194.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed