Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng Chin    2009, Vol. 4 Issue (1) : 20-23    https://doi.org/10.1007/s11460-009-0002-5
RESEARCH ARTICLE
Approach to extracting hot topics based on network traffic content
Yadong ZHOU1(), Xiaohong GUAN2, Qindong SUN3, Wei LI1, Jing TAO1
1. MOE Key Lab for Intelligent Networks and Network Security, State Key Lab for Manufacturing Systems, Xi’an Jiaotong University, Xi’an 710049, China; 2. Department of Automation, Tsinghua National Lab for Information Science and Technology, Tsinghua University, Beijing 100084, China; 3. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
 Download: PDF(98 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

This article presents the formal definition and description of popular topics on the Internet, analyzes the relationship between popular words and topics, and finally introduces a method that uses statistics and correlation of the popular words in traffic content and network flow characteristics as input for extracting popular topics on the Internet. Based on this, this article adapts a clustering algorithm to extract popular topics and gives formalized results. The test results show that this method has an accuracy of 16.7% in extracting popular topics on the Internet. Compared with web mining and topic detection and tracking (TDT), it can provide a more suitable data source for effective recovery of Internet public opinions.

Keywords hot topic extraction      network traffic content      Internet public opinion analysis     
Corresponding Author(s): ZHOU Yadong,Email:yadongzhou@gmai.com   
Issue Date: 05 March 2009
 Cite this article:   
Yadong ZHOU,Xiaohong GUAN,Qindong SUN, et al. Approach to extracting hot topics based on network traffic content[J]. Front Elect Electr Eng Chin, 2009, 4(1): 20-23.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-009-0002-5
https://academic.hep.com.cn/fee/EN/Y2009/V4/I1/20
P(W1,W2,…,Wl)(T1,T2,…,Tm)S1,S2, …,Sn
1l=81m=3n=1(biology, electric, energy, science, business, commerce, accountant, automation, computer, pathology, software, algorithm,…, economy)(enrolling new students, intro to the majors, enrollment mark of masters in 2003)http://www.xjtu.edu.cn
2l=35m=3n=2(predecessor, Shanghai, Jiaotong, Xi’an, history, industry, address of university, a hundred years, tradition, western China, creation, finance, run a school, Nanyang, higher education, internal, history of university,…, sites)(introduction to Xi’an Jiaotong University, history of Xi’an Jiaotong university, 110th school anniversary)(http://www.xjtu.edu.cn,http://newsxq.xjtu.edu.cn)
Tab.1  Description of hot topics
1 Allan J, Carbonell J, Doddington G, Yamron J, Yang Y. Topic detection and tracking pilot study: final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop . San Francisco: Morgan Kaufmann Publishers, 1998, 194–218
2 Yu M, Luo W, Xu H, Bai S. Research on hierarchical topic detection in topic detection and tracking. Journal of Computer Research and Development , 2006, 43(3): 489–495 (in Chinese)
doi: 10.1360/crad20060318
3 Kosala R, Blockeel H. Web mining research: a survey. ACM SIGKDD Explorations Newsletter , 2000, 2(1): 1–15
doi: 10.1145/360402.360406
4 Wang Z, Jin F, Li X, Wang G. Web data mining technique and realization. Journal of Harbin Institute of Technology , 2005, 37(10): 1403–1405 (in Chinese)
5 Li B, Yu S. Research on topic detection and tracking. Computer Engineering and Applications , 2003, 39(17): 7–10 (in Chinese)
6 Topic detection and tracking (TDT) evaluation workshop. The 2002 topic detection and tracking task definition and evaluation plan. [4/20/2006]. ftp://jaguar.ncsl.nist.gov/tdt/tdt2002/
7 Jain R, Routhier S A. Packet trains–measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications , 1986, 4(6): 986–995
doi: 10.1109/JSAC.1986.1146410
8 Mogul J C. Observing TCP dynamics in real networks. ACM SIGCOMM Computer Communication Review , 1992, 22(4): 305–317
doi: 10.1145/144191.144305
9 Claffy K C, Braun H W, Polyzos G C. A parameterizable methodology for Internet traffic flow profiling. IEEE Journal on Selected Areas in Communications , 1995, 13(8): 1481–1494
doi: 10.1109/49.464717
10 Ester M, Kriegel H P, Sander J, . A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining . Menlo Park, USA: AAAI Press, 1996, 226–231
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed