Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (1) : 119-129    https://doi.org/10.1007/s11704-016-6053-x
RESEARCH ARTICLE
Fast counting the cardinality of flows for big traffic over sliding windows
Jingsong SHAN1,2,Yinjin FU1,Guiqiang NI1(),Jianxin LUO1,Zhaofeng WU1
1. College of Command Information Systems, PLA University of Science and Technology, Nanjing 223002, China
2. Huaiyin Institute of Technology, Huaian 221002, China
 Download: PDF(1167 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Counting the cardinality of flows for massive high-speed traffic over sliding windows is still a challenging work under time and space constrains, but plays a key role in many network applications, such as traffic management and routing optimization in software defined network. In this paper, we propose a novel data structure (called LRU-Sketch) to address the problem. The significant contributions are as follows. 1) The proposed data structure adapts a well-known probabilistic sketch to sliding window model; 2) By using the least-recently used (LRU) replacement policy, we design a highly time-efficient algorithm for timely forgetting stale information, which takes constant (O(1)) time per time slot; 3) Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy; 4) Comprehensive experiments, performed on two real IP trace files, confirm that the proposed schema attains high accuracy and high time efficiency.

Keywords probabilistic data structure      sketch      streaming data      cardinality      flow     
Corresponding Author(s): Guiqiang NI   
Just Accepted Date: 19 July 2016   Issue Date: 11 January 2017
 Cite this article:   
Jingsong SHAN,Yinjin FU,Guiqiang NI, et al. Fast counting the cardinality of flows for big traffic over sliding windows[J]. Front. Comput. Sci., 2017, 11(1): 119-129.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-6053-x
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I1/119
1 Callegari C, Giordano S, Pagano M, Procissi G. Opencounter: counting unknown flows in software defined networks. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. 2015, 1–7
https://doi.org/10.1109/spects.2015.7285295
2 Callegari C, Pietro A D, Giordano S, Pepe T, Procissi G. The loglog counting reversible sketch: a distributed architecture for detecting anomalies in backbone networks. In: Proceedings of IEEE International Conference on Communications. 2012, 1287–1291
https://doi.org/10.1109/icc.2012.6363825
3 Estan C, Varghese G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems, 2003, 21(3): 270–313
https://doi.org/10.1145/859716.859719
4 Estan C, Varghese G, Fisk M E. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking, 2006, 14(5): 925–937
https://doi.org/10.1109/TNET.2006.882836
5 Chen W J, Liu Y, Guan Y. Cardinality change-based early detection of large-scale cyber-attacks. In: Proceedings of IEEE International Conference on Communications. 2013, 1788–1796
https://doi.org/10.1109/infcom.2013.6566977
6 Cao J, Jin Y, Chen A, Bu T, Zhang Z L. Identifying high cardinality internet hosts. In: Proceedings of IEEE International Conference on Communications. 2009, 810–818
https://doi.org/10.1109/infcom.2009.5061990
7 Zheng Y Q, Li M. ZOE: fast cardinality estimation for large-scale RFID systems. In: Proceedings of IEEE International Conference on Communications. 2013, 908–916
https://doi.org/10.1109/infcom.2013.6566879
8 Chen A, Li L E, Cao J. Tracking cardinality distributions in network traffic. In: Proceedings of IEEE International Conference on Communications. 2009, 819–827
https://doi.org/10.1109/infcom.2009.5061991
9 Huang Q, Lee P P C. Ld-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. In: Proceedings of IEEE International Conference on Communications. 2014, 1420–1428
https://doi.org/10.1109/infocom.2014.6848076
10 Huang Q, Lee P P C. A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data streams. Computer Networks, 2015, 91: 298–315
https://doi.org/10.1016/j.comnet.2015.08.025
11 Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems. 2002, 1–16
https://doi.org/10.1145/543613.543615
12 Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 1999, 909–910
13 Hua Y, Xiao B, Veeravalli B, Feng D. Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers, 2012, 61(6): 817–830
https://doi.org/10.1109/TC.2011.108
14 Yu Y, Qian C, Li X. Distributed and collaborative traffic monitoring in software defined networks. In: Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking. 2014, 85–90
https://doi.org/10.1145/2620728.2620739
15 Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 1990, 15(02): 208–229
https://doi.org/10.1145/78922.78925
16 Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
https://doi.org/10.1016/0022-0000(85)90041-8
17 Giroire F. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 2009, 157(2): 406–427
https://doi.org/10.1016/j.dam.2008.06.020
18 Durand M, Flajolet P. Loglog counting of large cardinalities (extended abstract). In: Proceedings of European Symposium on Algorithms. 2003, 605–617
19 Ońeil E, Ońeil P, Weikum G. The LRU-K page replacement algorithm for database disk buffering. In: Proceedings of ACM Special Interest Group on Management Of Data. 1993, 297–306
https://doi.org/10.1145/170035.170081
20 Heule S, Nunkesser M, Hall A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of Joint International Conference on Extending Database Technology. 2013, 683–692
https://doi.org/10.1145/2452376.2452456
21 Chen A, Cao J. Distinct counting with a self-learning bitmap. In: Proceedings of the 25th International Council for Open and Distance Education. 2009, 1171–1174
https://doi.org/10.1109/icde.2009.193
22 Metwally A, Agrawal D, El Abbadi A. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic. In: Proceedings of Joint International Conference on Extending Database Technology. 2008, 618–629
https://doi.org/10.1145/1353343.1353418
23 Aouiche K, Lemire D. A comparison of five probabilistic view-size estimation techniques in OLAP. In: Proceedings of the 10th ACM International Workshop on Data Warehousing and OLAP. 2007, 17–24
https://doi.org/10.1145/1317331.1317335
24 Chabchoub Y, Chiky R, Dogan B. How can sliding hyperloglog and EWMA detect port scan attacks in IP traffic? EURASIP Journal on Information Security, 2014, 2014: 5
25 Ben-Basat R, Einziger G, Friedman R, Kassner Y. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE International Conference on Communications. 2016
https://doi.org/10.1109/infocom.2016.7524364
26 Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 2002, 31(6): 1794–1813
https://doi.org/10.1137/S0097539701398363
27 Kim H, O’Hallaron D R. Counting network flows in real time. In: Proceedings of IEEE Global Communications Conference. 2003, 3888–3893
28 Sanjuàs-Cuxart J, Barlet-Ros P, Solé-Pareta J. Counting flows over sliding windows in high speed networks. In: Proceedings of International Conference on Research in Networking. 2009, 79–91
https://doi.org/10.1007/978-3-642-01399-7_7
29 Yi K, Wang L, Wei Z W. Indexing for summary queries: theory and practice. ACM Transactions on Database Systems, 2014, 39(1): 2
https://doi.org/10.1145/2508702
30 Zhang Z, Wang B Q, Lan J L. Identifying elephant flows in Internet backbone traffic with bloom filters and LRU. Computer Communications, 2015, 61: 70–78
https://doi.org/10.1016/j.comcom.2014.12.003
31 Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. New York: Cambridge University Press, 2005
https://doi.org/10.1017/CBO9780511813603
32 Jain R, Rountier S A. Packet trains-measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications, 1986, 4(6): 986–995
https://doi.org/10.1109/JSAC.1986.1146410
[1] Genan DAI, Xiaoyang HU, Youming GE, Zhiqing NING, Yubao LIU. Attention based simplified deep residual network for citywide crowd flows prediction[J]. Front. Comput. Sci., 2021, 15(2): 152317-.
[2] Cheng WANG, Kyung Tae KIM, Hee Yong YOUN. PopFlow: a novel flow management scheme for SDN switch of multiple flow tables based on flow popularity[J]. Front. Comput. Sci., 2020, 14(6): 146505-.
[3] Gan HUANG, Hee Yong YOUN. Proactive eviction of flow entry for SDN based on hidden Markov model[J]. Front. Comput. Sci., 2020, 14(4): 144502-.
[4] Tian WANG, Meina QIAO, Aichun ZHU, Guangcun SHAN, Hichem SNOUSSI. Abnormal event detection via the analysis of multi-frame optical flow information[J]. Front. Comput. Sci., 2020, 14(2): 304-313.
[5] Xuepeng FAN, Xiaofei LIAO, Hai JIN. FunctionFlow: coordinating parallel tasks[J]. Front. Comput. Sci., 2019, 13(1): 73-85.
[6] Xiaohui TAN, Yachun FAN, Ruiliang GUO. Local features and manifold ranking coupled method for sketch-based 3D model retrieval[J]. Front. Comput. Sci., 2018, 12(5): 1000-1012.
[7] Chao DING,Ligang LIU. A survey of sketch based modeling systems[J]. Front. Comput. Sci., 2016, 10(6): 985-999.
[8] Yili GONG,Wei HUANG,Wenjie WANG,Yingchun LEI. A survey on software defined networking and its applications[J]. Front. Comput. Sci., 2015, 9(6): 827-845.
[9] Genggeng LIU,Wenzhong GUO,Rongrong LI,Yuzhen NIU,Guolong CHEN. XGRouter: high-quality global router in X-architecture with particle swarm optimization[J]. Front. Comput. Sci., 2015, 9(4): 576-594.
[10] Amit Kumar SINGH, Akash KUMAR, Jigang WU, Thambipillai SRIKANTHAN. CADSE: communication aware design space exploration for efficient run-time MPSoC management[J]. Front Comput Sci, 2013, 7(3): 416-430.
[11] Zheng WANG, Geguang PU, Jiangwen LI, Yuxiang CHEN, Yongxin ZHAO, Mingsong CHEN, Bin GU, Mengfei YANG, Jifeng HE. A novel requirement analysis approach for periodic control systems[J]. Front Comput Sci, 2013, 7(2): 214-235.
[12] Biao LENG, Jiabei ZENG, Zhang XIONG, Weifeng LV, Yueliang WAN. Probability tree based passenger flow prediction and its application to the Beijing subway system[J]. Front Comput Sci, 2013, 7(2): 195-203.
[13] Jia LIU, Zhiguo JIANG, Hongjun LI, Xiaopeng ZHANG. Easy modeling of realistic trees from freehand sketches[J]. Front Comput Sci, 2012, 6(6): 756-768.
[14] Robert GWADERA. Multi-stream join answering for mining significant cross-stream correlations[J]. Front Comput Sci, 2012, 6(2): 131-142.
[15] Wei WANG. Certifying assembly programs with trails[J]. Front Comput Sci Chin, 2011, 5(4): 472-485.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed