Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2020, Vol. 14 Issue (5): 145612   https://doi.org/10.1007/s11704-019-8201-6
  本期目录
Event detection and evolution in multi-lingual social streams
Yaopeng LIU1,2, Hao PENG1,2, Jianxin LI1,2(), Yangqiu SONG3, Xiong LI4
1. Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100083, China
2. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100083, China
3. Department of Computer Science and Engineering, HKUST, Hong Kong 99907, China
4. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
 全文: PDF(647 KB)  
Abstract

Real-life events are emerging and evolving in social and news streams. Recent methods have succeeded in capturing designed features of monolingual events, but lack of interpretability and multi-lingual considerations. To this end, we propose a multi-lingual event mining model, namely MLEM, to automatically detect events and generate evolution graph in multilingual hybrid-length text streams including English, Chinese, French, German, Russian and Japanese. Specially, we merge the same entities and similar phrases and present multiple similarity measures by incremental word2vec model. We propose an 8-tuple to describe event for correlation analysis and evolution graph generation. We evaluate the MLEM model using a massive humangenerated dataset containing real world events. Experimental results show that our new model MLEM outperforms the baseline method both in efficiency and effectiveness.

Key wordsevent detection    event evolution    stream processing    multi-lingual anomaly detection
收稿日期: 2018-06-05      出版日期: 2020-03-17
Corresponding Author(s): Jianxin LI   
 引用本文:   
. [J]. Frontiers of Computer Science, 2020, 14(5): 145612.
Yaopeng LIU, Hao PENG, Jianxin LI, Yangqiu SONG, Xiong LI. Event detection and evolution in multi-lingual social streams. Front. Comput. Sci., 2020, 14(5): 145612.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-019-8201-6
https://academic.hep.com.cn/fcs/CN/Y2020/V14/I5/145612
1 M Mathioudakis, N Koudas. Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 2010, 1155–1158
https://doi.org/10.1145/1807167.1807306
2 A Guille, C Favre. Event detection, tracking, and visualization in twitter: a mention-anomaly-based approach. Social Network Analysis & Mining, 2015, 5(1): 18
https://doi.org/10.1007/s13278-015-0258-0
3 W Xie, F Zhu, J Jiang, E P Lim, K Wang. Topicsketch: real-time bursty topic detection from twitter. IEEE Transactions on Knowledge & Data Engineering, 2016, 28(8): 2216–2229
https://doi.org/10.1109/TKDE.2016.2556661
4 J Li, J Wen, Z Tai, R Zhang, W Yu. Bursty event detection from microblog: a distributed and incremental approach. Concurrency & Computation Practice & Experience, 2016, 28(11): 3115–3130
https://doi.org/10.1002/cpe.3657
5 X Zhang, X Chen, Y Chen, S Wang, Z Li, J Xia. Event detection and popularity prediction in microblogging. Neurocomputing, 2015, 149: 1469–1480
https://doi.org/10.1016/j.neucom.2014.08.045
6 W Yu, J Li, M Z A Bhuiyan, R Zhang, J Huai. Ring: real-time emerging anomaly monitoring system over text streams. IEEE Transactions on Big Data, 2017, DOI:10.1109/TBDATA.2017.2672672
https://doi.org/10.1109/TBDATA.2017.2672672
7 M Cordeiro. Twitter event detection: combining wavelet analysis and topic inference summarization. In: Proceedings of the Doctoral Symposium on Informatics Engineering. 2012, 11–16
8 A Weiler, M Grossniklaus, M H Scholl. Event identification and tracking in social media streaming data. In: Proceedings of the Workshop on Multimodal Social Data Management. 2014, 798–807
https://doi.org/10.1145/2484702.2484703
9 X Yan, J Guo, Y Lan, X Cheng. A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web. 2013, 1445–1456
https://doi.org/10.1145/2488388.2488514
10 X Cheng, X Yan, Y Lan, J Guo. BTM: topic modeling over short texts. IEEE Transactions on Knowledge & Data Engineering, 2014, 26(12): 2928–2941
https://doi.org/10.1109/TKDE.2014.2313872
11 H Peng, J Li, Y He, Y Liu, M Bao, L Wang, Y Song, Q Yang. Largescale hierarchical text classification with recursively regularized deep graph-CNN. In: Proceedings of the 2018WorldWideWeb Conference. 2018, 1063–1072
https://doi.org/10.1145/3178876.3186005
12 E Benson, A Haghighi, R Barzilay. Event discovery in social media feeds. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 389–398
13 A Angel, N Koudas, N Sarkas, D Srivastava, M Svendsen, S Tirthapura. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. The VLDB Journal, 2014, 23(2): 175–199
https://doi.org/10.1007/s00778-013-0340-z
14 M K Agarwal, M Bhide, M Bhide. Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proceedings of the VLDB Endowment, 2012, 5(10): 980–991
https://doi.org/10.14778/2336664.2336671
15 H Cai, Z Huang, D Srivastava, Q Zhang. Indexing evolving events from tweet streams. In: Proceedings of the 32nd IEEE International Conference on Data Engineering. 2016, 1538–1539
https://doi.org/10.1109/ICDE.2016.7498413
16 J Wang, W Tong, H Yu, M Li, X Ma, H Cai, T Hanratty, J Han. Mining multi-aspect reflection of news events in twitter: discovery, linking and presentation. In: Proceedings of the 15th IEEE International Conference on Data Mining. 2016, 429–438
https://doi.org/10.1109/ICDM.2015.112
17 F Bonchi, I Bordino, F Gullo, G Stilo. Identifying buzzing stories via anomalous temporal subgraph discovery. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence. 2017, 161–168
https://doi.org/10.1109/WI.2016.0032
18 D Li, S Chakradhar, M Becchi. Grapid: a compilation and runtime framework for rapid prototyping of graph applications on many-core processors. In: Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems. 2015, 174–182
https://doi.org/10.1109/PADSW.2014.7097806
19 D Li, X Chen, M Becchi, Z Zong. Evaluating the energy efficiency of deep convolutional neural networks on CPUs and GPUs. In: Proceedings of IEEE International Conferences on Big Data and Cloud Computing, Social Computing and Networking, Sustainable Computing and Communications. 2016, 477–484
https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.76
20 D Li, H Wu, M Becchi. Exploiting dynamic parallelism to efficiently support irregular nested loops on GPUs. In: Proceedings of International Workshop on Code Optimisation for Multi and Many Cores. 2015
https://doi.org/10.1145/2723772.2723780
21 D Li, K Sajjapongse, H Truong, G Conant , M Becchi. A distributed CPU-GPU framework for pairwise alignments on large-scale sequence datasets. In: Proceedings of the 24th IEEE International Conference on Application-Specific Systems, Architectures and Processors. 2013, 329–338
https://doi.org/10.1109/ASAP.2013.6567598
22 D Li, M Becchi. Deploying graph algorithms on GPUs: an adaptive solution. In: Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing. 2013, 1013–1024
https://doi.org/10.1109/IPDPS.2013.101
23 S Wang, X Hu, P S Yu, Z Li. Mmrate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 1246–1255
https://doi.org/10.1145/2623330.2623728
24 J Leskovec, L Backstrom, J Kleinberg. Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 497–506
https://doi.org/10.1145/1557019.1557077
25 C C Yang, X Shi, C P Wei. Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans, 2009, 39(4): 850–863
https://doi.org/10.1109/TSMCA.2009.2015885
26 L Pei, L V S Lakshmanan, E E Milios. Incremental cluster evolution tracking from highly dynamic network data. In: Proceedings of the 30th IEEE International Conference on Data Engineering. 2014, 3–14
27 Z Lu, W Yu, R Zhang, J Li, H Wei. Discovering event evolution chain in microblog. In: Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications. 2015, 635–640
https://doi.org/10.1109/HPCC-CSS-ICESS.2015.81
28 A Marcus, M S Bernstein, O Badar, D R Karger, S Madden, R C Miller. Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2011, 227–236
https://doi.org/10.1145/1978942.1978975
29 P Lee, L V S Lakshmanan, E E Milios. Event evolution tracking from streaming social posts. Computer Science, 2013
30 H Peng, J Li, Y Song, Y Liu. Incrementally learning the hierarchical softmax function for neural language models. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017
31 H Peng, M Bao, J Li, M Z Bhuiyan, Y Liu, Y He, E Yang. Incremental term representation learning for social network analysis. Future Generation Computer Systems, 2018, 86: 1503–1512
https://doi.org/10.1016/j.future.2017.05.020
32 D T Nguyen, J E Jung. Real-time event detection for online behavioral analysis of big social data. Future Generation Computer Systems, 2017, 66: 137–145
https://doi.org/10.1016/j.future.2016.04.012
33 Y Liu, H Peng, J Guo, T He, X Li, Y Song, J Li. Event detection and evolution based on knowledge base. In: Proceedings of the 1st Work shop on Knowledge Base Construction, Reasoning and Mining. 2018, 38–39
34 G Lejeune, R Brixtel, A Doucet, N Lucas. Multilingual event extraction for epidemic detection. Artificial Intelligence in Medicine, 2015, 65(2): 131–143
https://doi.org/10.1016/j.artmed.2015.06.005
35 R Agerri, I Aldabe, E Laparra, G Rigau, A Fokkens, P Huijgen, M V Erp, R I Bevia, P Vossen, A L Minard. Multilingual event detection using the newsreader pipelines. In: Proceedings of International Conference on Language Resources and Evaluation. 2016
36 C Lin, C Lin, J Li, D Wang, Y Chen, T Li. Generating event storylines from microblogs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 175–184
https://doi.org/10.1145/2396761.2396787
37 T Ge, W Pei, H Ji, S Li, B Chang, Z Sui. Bring you to the past: automatic generation of topically relevant event chronicles. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015, 575–585
https://doi.org/10.3115/v1/P15-1056
38 P Zhou, B Wu, Z Cao. Emmbtt: a novel event evolution model based on TFxIEF and TDC in tracking news streams. In: Proceedings of the 2nd IEEE International Conference on Data Science in Cyberspace. 2017, 102–107
https://doi.org/10.1109/DSC.2017.53
39 C D Manning, M Surdeanu, J Bauer, J Finkel, S J Bethard, D Mcclosky. The stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2014
https://doi.org/10.3115/v1/P14-5010
40 W Yu, C C Aggarwal, S Ma, H Wang. On anomalous hotspot discovery in graph streams. In: Proceedings of the 13th IEEE International Conference on Data Mining. 2014, 1271–1276
https://doi.org/10.1109/ICDM.2013.32
41 F Reid, A Mcdaid, N Hurley. Percolation computation in complex networks. In: Proceedings of IEEE/ACMInternational Conference on Advances in Social Networks Analysis and Mining. 2012, 274–281
https://doi.org/10.1109/ASONAM.2012.54
42 R Mihalcea, P Tarau. Textrank: bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004, 404–411
43 J Zhao, L Dong, J Wu, K Xu. Moodlens: an emoticon-based sentiment analysis system for Chinese tweets. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1528–1531
https://doi.org/10.1145/2339530.2339772
44 S Ioffe. Improved consistent sampling, weighted minhash, l1 sketching. In: Proceedings of IEEE International Conference on Data Mining. 2010, 246–255
https://doi.org/10.1109/ICDM.2010.80
45 G Palla, I Derényi, I Farkas, T Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435(7043): 814
https://doi.org/10.1038/nature03607
46 R Nallapati, A Feng, F Peng, J Allan. Event threading within news topics. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 2004, 446–453
https://doi.org/10.1145/1031171.1031258
47 J Devlin , M Chang, K Lee, K Toutanova. Bert: pre-training of deep bidirectional transformers for language understanding. 2018, arXiv preprint arXiv:1810.04805
[1] Article highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed