Joint user profiling with hierarchical attention networks
Xiaojian LIU1,2, Yi ZHU1,2,3, Xindong WU1,2,4()
1. Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei 230009, China 2. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China 3. School of Information Engineering, Yangzhou University, Yangzhou 225009, China 4. Mininglamp Academy of Sciences, Mininglamp, Beijing 100193, China
User profiling by inferring user personality traits, such as age and gender, plays an increasingly important role in many real-world applications. Most existing methods for user profiling either use only one type of data or ignore handling the noisy information of data. Moreover, they usually consider this problem from only one perspective. In this paper, we propose a joint user profiling model with hierarchical attention networks (JUHA) to learn informative user representations for user profiling. Our JUHA method does user profiling based on both inner-user and inter-user features. We explore inner-user features from user behaviors (e.g., purchased items and posted blogs), and inter-user features from a user-user graph (where similar users could be connected to each other). JUHA learns basic sentence and bag representations from multiple separate sources of data (user behaviors) as the first round of data preparation. In this module, convolutional neural networks (CNNs) are introduced to capture word and sentence features of age and gender while the self-attention mechanism is exploited to weaken the noisy data. Following this, we build another bag which contains a user-user graph. Inter-user features are learned from this bag using propagation information between linked users in the graph. To acquire more robust data, inter-user features and other inner-user bag representations are joined into each sentence in the current bag to learn the final bag representation. Subsequently, all of the bag representations are integrated to lean comprehensive user representation by the self-attention mechanism. Our experimental results demonstrate that our approach outperforms several state-of-the-art methods and improves prediction performance.
A, Culotta N K, Ravi J Cutler . Predicting Twitter user demographics using distant supervision from website traffic data. Journal of Artificial Intelligence Research, 2016, 55( 1): 389– 408
2
J, Hu H J, Zeng H, Li C, Niu Z Chen. Demographic prediction based on user’s browsing behavior. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 151– 160
3
J J C, Ying Y J, Chang C M, Huang V S Tseng. Demographic prediction based on user’s mobile behaviors. In: Proceedings of the 16th International Conference on World Wide Web. 2012, 1– 4
4
Z, Lu S J, Pan Y, Li J, Jiang Q Yang. Collaborative evolution for user profiling in recommender systems. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 3804– 3810
5
S, Chen C, Li F, Ji W, Zhou H Chen. Review-driven answer generation for product-related questions in e-commerce. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 2019, 411– 419
6
L, Wu C, Quan C, Li Q, Wang B Zheng . A context-aware user-item representation learning for item recommendation. ACM Transactions on Information Systems, 2019, 37( 2): 22
7
W, Chen Y, Gu Z, Ren X, He H, Xie T, Guo D, Yin Y Zhang. Semi-supervised user profiling with heterogeneous graph attention networks. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 2116– 2122
8
Y, Dong Y, Yang J, Tang Y, Yang N V Chawla. Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 15– 24
9
Y, Miura M, Taniguchi T, Taniguchi T Ohkuma. Unifying text, metadata, and user network representations with a neural network for geolocation prediction. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 1260– 1272
10
C, Wu F, Wu J, Liu S, He Y, Huang X Xie. Neural demographic prediction using search query. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 2019, 654– 662
11
G, Farnadi J, Tang Cock M, De M F Moens. User profiling through deep multimodal fusion. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 2018, 171– 179
12
Y, Gu Z, Ding S, Wang D Yin. Hierarchical user profiling for e-commerce recommender systems. In: Proceedings of the 13th International Conference on Web Search and Data Mining. 2020, 223– 231
13
M, Heidari J H, Jones O Uzuner. Deep contextualized word embedding for text-based online user profiling to detect social bots on twitter. In: Proceedings of 2020 International Conference on Data Mining Workshops. 2020, 480– 487
14
K Filippova. User demographics and language in an implicit social network. In: Proceedings of 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 1478– 1488
15
W, Li M Dickinson. Gender prediction for Chinese social media data. In: Proceedings of International Conference on Recent Advances in Natural Language Processing. 2017, 438– 445
16
C, Peersman W, Daelemans Vaerenbergh L Van. Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents. 2011, 37– 44
17
Zamal F, Al W, Liu D Ruths. Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: Proceedings of the 6th International Conference on Weblogs and Social Media. 2012, 387– 390
18
S, Liang X, Zhang Z, Ren E Kanoulas. Dynamic embeddings for user profiling in twitter. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, 1764– 1773
19
S, Rosenthal K McKeown. Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 763– 772
20
M, Kosinski D, Stillwell T Graepel . Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110( 15): 5802– 5805
21
M, McPherson L, Smith-Lovin J M Cook . Birds of a feather: homophily in social networks. Annual Review of Sociology, 2001, 27: 415– 444
22
G, Farnadi Z, Mahdavifar I, Keller J, Nelson A, Teredesai M F, Moens Cock M De. Scalable adaptive label propagation in Grappa. In: Proceedings of 2015 IEEE International Conference on Big Data. 2015, 1485– 1491
23
R, Rothe R, Timofte Gool L Van. DEX: deep expectation of apparent age from a single image. In: Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. 2015, 252– 257
24
L, Liu D, Preotiuc-Pietro Z R, Samani M E, Moghaddam L Ungar. Analyzing personality through social media profile picture choice. In: Proceedings of the 10th International AAAI Conference on Web and Social Media. 2016, 211– 220
25
J I, Biel D Gatica-Perez . The YouTube lens: crowdsourced personality impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia, 2013, 15( 1): 41– 55
26
D, Nguyen N A, Smith C Rosé. Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. 2011, 115– 123
27
D, Kosmajac V Keselj. Twitter user profiling: bot and gender identification. In: Proceedings of the 11th International Conference of the Cross-Language Evaluation Forum for European Languages. 2020, 141– 153
28
Y, Zhu X, Hu Y, Zhang P Li . Transfer learning with stacked reconstruction independent component analysis. Knowledge-Based Systems, 2018, 152: 100– 106
29
Y, Zhu X, Wu P, Li Y, Zhang X Hu . Transfer learning with deep manifold regularized auto-encoders. Neurocomputing, 2019, 369: 145– 154
30
L, Wang Q, Li X, Chen S Li. Multi-task learning for gender and age prediction on Chinese microblog. In: Proceedings of the 5th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2016, and 24th International Conference on Computer Processing of Oriental Languages. 2016, 189– 200
31
D, Zhang S, Li H, Wang G Zhou. User classification with multiple textual perspectives. In: Proceedings of the 26th International Conference on Computational Linguistics. 2016, 2112– 2121
32
W, Lin H, Xu J, Li Z, Wu Z, Hu V, Chang J Z Wang. Deep-profiling: a deep neural network model for scholarly Web user profiling. Cluster Computing, 2021, doi: https://doi.org/10.1007/s10586–021-03315–2
33
L, Li K, Hu Y, Zheng J, Liu K A Lee. COOPNet: multi-modal cooperative gender prediction in social media user profiling. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 4310– 4314
34
G, Farnadi G, Sitaraman S, Sushmita F, Celli M, Kosinski D, Stillwell S, Davalos M F, Moens Cock M De . Computational personality recognition in social media. User modeling and User-Adapted Interaction, 2016, 26( 2): 109– 142
Z, Geng Y, Zhang Y Han . Joint entity and relation extraction model based on rich semantics. Neurocomputing, 2021, 429: 132– 140
37
Y, Hong Y, Liu S, Yang K, Zhang J Hu . Joint extraction of entities and relations using graph convolution over pruned dependency trees. Neurocomputing, 2020, 411: 302– 312
38
T N, Kipf M Welling. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2016
39
A, Rahimi T, Cohn T Baldwin. Semi-supervised user geolocation via graph convolutional networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 2009– 2019
40
P, Veličković G, Cucurull A, Casanova A, Romero P, Liò Y Bengio. Graph attention networks. 2017, arXiv preprint arXiv: 1710.10903