Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2018, Vol. 12 Issue (1): 135-145   https://doi.org/10.1007/s11704-016-5415-8
  本期目录
A multi-level approach to highly efficient recognition of Chinese spam short messages
Weimin WANG1(), Dan ZHOU2
1. School of Computer Science & Engineering, Jiangsu University of Science and Technology, Jiangsu 212003, China
2. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100190, China
 全文: PDF(589 KB)  
Abstract

The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. Themethod can learnmany interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.

Key wordsspam short message    spam recognition    similarity computing    pattern learning
收稿日期: 2015-10-06      出版日期: 2018-01-12
Corresponding Author(s): Weimin WANG   
 引用本文:   
. [J]. Frontiers of Computer Science, 2018, 12(1): 135-145.
Weimin WANG, Dan ZHOU. A multi-level approach to highly efficient recognition of Chinese spam short messages. Front. Comput. Sci., 2018, 12(1): 135-145.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-016-5415-8
https://academic.hep.com.cn/fcs/CN/Y2018/V12/I1/135
1 Chen Y W. The research of treatment for spam message in China. Dissertation for the Doctoral Degree. Shanghai: Shanghai Jiao Tong University, 2010
2 Huang L Y. On the countermeasures of junk message. Journal of Chongqing University of Posts and Telecommunications (Social Science Edition), 2010, 3: 25–30
3 Jia X Z. A study on legal governance of spam messages in China. Dissertation for the Doctoral Degree. Changchun: Jilin University, 2013
4 Yi Y F. Principles and implementation of spam short message monitoring. Zhongxing Telecom Technology, 2005, 11(6): 49–54
5 Zhang Y, Fu J M. Identifying and trace backing short message spam. Application Research of Computers, 2006, 23(3): 245–247
6 Wang B, Pan W F. A survey of content-based anti-spam email filtering. Journal of Chinese Information Processing, 2006, 19(5): 1–10
7 Shan G Y, Fan X H, Yang Y X. Short message service system security analysis. Information Network Security, 2003, 11: 52–54
8 Shi J. An effective spam short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2010
9 Wang R, Tan W. Management of spam SMS based on big data mining. Telecom Engineering Technics and Standardization, 2015, 2: 78–82
10 Qian Q, Wan B. Spam messages intercept strategy research based on the generalized digit. China New Communication, 2015, 4: 42–43
11 Zhang Y J, Liu J L, Gao S B. Spam short message classifier model based on association rules. Journal of Nantong University (Natural Science Edition), 2014, 3: 6–12
12 Sun D. Application and implementation of Hadoop cloud computing technology in junk message filtering. Netinfo Security, 2015, 7: 13–19
13 Uysal A K, Gunal S, Ergin S, Gunal E S. A novel framework for SMS spam filtering. In: Proceedings of 2012 International Symposium on Innovations in Intelligent Systems and Applications (INISTA). 2012
https://doi.org/10.1109/INISTA.2012.6246947
14 Duan L Z, Li N, Huang L J. A new spam short message classification. In: Proceedings of the 1st International Workshop on Education Technology and Computer Science. 2009
https://doi.org/10.1109/etcs.2009.299
15 Rafique M Z, Farooq M. SMS SPAM detection by operating on bytelevel distributions using hidden markov models. In: Proceedings of the 20th Virus Bulletin International Conference. 2010
16 Chen K X, Chen J Y. An improved spam short message filtering technology based on the naive Bayesian algorithm. Fujian Computer, 2014, 3: 42–43
17 Wu N N, Wu M G, Chen S. Real-time monitoring and filtering system for mobile SMS. In: Proceedings of the 3rd IEEE Conference on Industrial Electronics and Applications. 2008
18 Ma N. Research on content based spam short message identifying. Dissertation for the Doctoral Degree. Beijing: Beijing University of Posts and Telecommunications, 2014
19 Huang W L. Research on key techniques of spam short message filtering. Dissertation for the Doctoral Degree. Hangzhou: Zhejiang University, 2008
20 Li Y T. Research on spam short message text classification algorithm. Heilongjiang Science and Technology Information, 2015, 19: 144
21 Gong C C. Research on short text language computing. Dissertation for the Doctoral Degree. Beijing: The Institute of Computing Technology of the Chinese Academy of Sciences, 2008
22 Ma X, Xu W R, Guo J, Hu R L. SMS-2008: an annotated Chinese short messages corpus. Journal of Chinese Information, 2009, 23(4): 22–26
23 He X. Design and implementation of junk short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2009
24 Li H, Zhang Y, Lu H. Junk SMS filtering based on context. Computer Engineering, 2008, 34(12): 154–156
[1] Supplementary Material Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed