Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (3) : 143801    https://doi.org/10.1007/s11704-018-7451-z
RESEARCH ARTICLE
TPII: tracking personally identifiable information via user behaviors in HTTP traffic
Yi LIU1,2, Tian SONG1(), Lejian LIAO1
1. School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
2. Network Information Center, Yan’an University, Yan’an 716000, China
 Download: PDF(518 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately.Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wirespeed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.

Keywords network traffic analysis      personally identifiable information      privacy leakage      mobile applications      HTTP     
Corresponding Author(s): Tian SONG   
Issue Date: 10 January 2020
 Cite this article:   
Yi LIU,Tian SONG,Lejian LIAO. TPII: tracking personally identifiable information via user behaviors in HTTP traffic[J]. Front. Comput. Sci., 2020, 14(3): 143801.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7451-z
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I3/143801
1 M Falahrastegar, H Haddadi, S Uhlig, R Mortier. Tracking personal identifiers across theWeb. In: Proceedings of International Conference on Passive and Active Network Measurement. 2016, 30–41
2 A P Felt, E Ha, S Egelman, A Haney, E Chin, D Wagner. Android permissions: user attention, comprehension, and behavior. In: Proceedings of the 8th Symposium on Usable Privacy and Security. 2012, 1–14
3 Y B Liu, K P Gummadi, B Krishnamurthy, A Mislove. Analyzing facebook privacy settings:user expectations vs. reality. In: Proceedings of ACM Sigcomm Conference on Internet Measurement Conference. 2011, 61–70
4 B Krishnamurthy, C E Wills. On the leakage of personally identifiable information via online social networks. In: Proceedings of ACM Workshop on Online Social Networks. 2009, 7–12
5 B Krishnamurthy, C E Wills. Privacy diffusion on the Web: a longitudinal perspective. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 541–550
6 B Krishnamurthy, K Naryshkin, C E Wills. Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web Workshop on Security & Privacy. 2011, 2–11
7 F Roesner, T Kohno, D Wetherall. Detecting and defending against third-party tracking on the web. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 12
8 A P Felt, E Chin, S Hanna, D Song, D Wagner. Android permissions demystified. In: Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011, 17–21
9 A Bartel, J Klein, Y L Traon, M Monperrus. Automatically securing permission-based software by reducing the attack surface: an application to android. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 274–277
10 K W Y Au, Y F Zhou, Z Huang, D Lie. Pscout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 217–228
11 A Atzeni, T Su, M Baltatu, R D’Alessandro. How dangerous is your android app? An evaluation methodology. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 2014, 130–139
12 J Jeon, K K Micinski, J A Vaughan, A Fogel, N Reddy, J S Foster. Dr. Android and Mr. Hide: fine-grained permissions in android application. In: Proceedings of ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. 2012, 3–14
13 M Backes, S Gerling, C Hammer, M Maffei, P V Styp-Rekowsky. App-Guard- fine-grained policy enforcement for untrusted android applications. In: Proceedings of International Workshop on Data Privacy Management and Autonomous Spontaneous Security. 2013, 213
14 R Xu, H Sadi, R J Anderson. Aurasium: practical policy enforcement for android applications. In: Proceedings of Usenix Conference on Security Symposium. 2012, 27
15 M Sun, G Tan. Nativeguard: protecting android applications from third-party native libraries. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks. 2014, 165–176
16 P Gerber, M Volkamer, K Renaud. Usability versus privacy instead of usable privacy: Google’s balancing act between usability and privacy. ACM Sigcas Computers & Society, 2015, 45(1): 16–21
17 E J Schwartz, T Avgerinos, D Brumley. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of IEEE Symposium on Security and Privacy. 2010, 317–331
18 W Cheng, D R K Ports, A Blankstein, Cowling J. Abstractions for usable information flow control in aeolus. In: Proceedings of USENIX Annual Technical Conference. 2012, 139–151
19 C Gibler, J Crussell, J Erickson, C Hao. AndroidLeaks: automatically detecting potential privacy leaks in android applications on a large scale. In: Proceedings of International Conference on Trust and Trustworthy Computing. 2012, 291–307
20 L Lu, Z Li, Z Wu, W Lee, G Jiang. Chex: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 229–240
21 A Bichhawat, V Rajani, D Garg, C Hammer. Information flow control in WebKit’s JavaScript bytecode. In: Proceedings of International Conference on Principles of Security and Trust. 2014, 159–178
22 P Efstathopoulos, M Krohn, S Vandebogart, C Frey, D Ziegler, E Kohler. Labels and event processes in the asbestos operating system. ACM Transactions on Computer Systems, 2005, 39(5): 17–30
23 N Zeldovich, S Boyd-Wickizer, E Kohler, D Mazieres. Making information flow explicit in HiStar. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 2006, 263–278
24 W Enck, P Gilbert, B G Chun, L P Cox, J Jung, P Mcdaniel, A Sheth. TaintDroid: an information flow tracking system for real-time privacy monitoring on smartphones. ACM Transactions on Computer Systems, 2010, 32(2): 1–29
25 S Arzt, S Rasthofer, C Fritz, E Bodden, A Bartel, J Klein, Y L Traon, D Octeau, P Mcdaniel. Flowdroid: precise context, flow, field, objectsensitive and lifecycle-aware taint analysis for Android apps. ACM Sigplan Notices, 2014, 49(6): 259–269
26 D King, B Hicks, M Hicks, T Jaeger. Implicit flows: can’t live with ’em, can’t live without ’em. In: Proceedings of International Conference on Information Systems Security. 2008, 56–70
27 N Vallina-Rodriguez, J Shah, A Finamore, Y Grunenberger, K Papagiannaki, H Haddadi. Breaking for commercials: characterizing mobile advertising. In: Proceedings of ACM Conference on Internet Measurement Conference. 2012, 343–356
28 P Gill, V Erramilli, A Chaintreau, B Krishnamurthy, P Rodriguez. Follow the money: understanding economics of online aggregation and advertising. In: Proceedings of the 2013 Conference on Internet Measurement Conference. 2013, 141–148
29 J Ren, M Lindorfer, M Lindorfer, A Legout, D Choffnes. Recon: revealing and controlling PII leaks in mobile network traffic. In: Proceedings of the 14th International Conference on Mobile Systems, Applications, and Services. 2016, 361–374
30 Y Liu, H H Song, I Bermudez, A Mislove, M Baldi, A Tongaonkar. Identifying personal information in internet traffic. In: Proceedings of ACM Conference on Online Social Networks. 2015, 59–70
31 N Xia, H H Song, Y Liao, M Iliofotou, A Nucci, Z L Zhang. Mosaic: quantifying privacy leakage in mobile networks. Computer Communication Review, 2013, 43(4): 279–290
32 S Lee, E L Wong, D Goel, M Dahlin, V Shmatikov. πBox: a platform for privacy-preserving apps. In: Proceedings of the 10th Usenix Conference on Networked Systems Design and Implementation. 2013, 501–514
33 R Herbster, S Dellatorre, P Druschel, B Bhattacharjee. Privacy capsules: preventing information leaks by mobile apps. In: Proceedings of International Conference on Mobile Systems, Applications, and Services. 2016, 399–411
34 Y Song, U Hengartner. Privacyguard: a VPN-based platform to detect information leakage on android devices. In: Proceedings of the 5th ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices. 2015, 15–26
35 A Le, J Varmarken, S Langhoff, A Shuba, M Gjoka, A Markopoulou. AntMonitor: a system for monitoring from mobile devices. In: Proceedings of ACM SIGCOMM Workshop on Crowdsourcing and Crowdsharing of Big Data. 2015, 15–20
36 A Razaghpanah, N Vallinarodriguez, S Sundaresan, C Kreibich, P Gill, M Allman. Haystack: a multi-purpose mobile vantage point in user space. Computer Science, 2015, 1–15
37 Q Xu, J Erman, A Gerber, Z M Mao, J Pang, S Venkataraman. Identifying diverse usage behaviors of smartphone apps. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement Conference. 2011, 329–344
38 H Falaki, D Lymberopoulos, R Mahajan, S Kandula, D Estrin. A first look at traffic on smartphones. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement. 2010, 281–287
39 M Lindorfer, M Neugschwandtner, L Weichselbaum, Y Fratantonio, V V D Veen, C Platzer. Andrubis – 1,000,000 apps later: a view on current android malware behaviors. In: Proceedings of the 3rd International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. 2014, 3–17
40 E Mccallister, T Grance, K A Scarfone. SP 800-122. Guide to protecting the confidentiality of personally identifiable information (PII). Washington: National Institute of Standards & Technology, 2010
41 L A Johnson, K L Dempsey, D Bailey. SP 800-128. Guide for securityfocused configuration management of information systems. Journal of Dairy Science, 2011, 77(6): 1604–1617
42 S S Greene. Security Program and Policies: Principles and Practices. Pearson Education, 2014, 349
43 S Dai, A Tongaonkar, X Wang, A Nucci, D Song. NetworkProfiler: towards automatic fingerprinting of Android apps. In: Proceedings of IEEE INFOCOM. 2013, 809–817
44 S Han, J Jung, D Wetherall. A study of third-party tracking by mobile apps in the wild. University of Washington: Technical Report UWCSE-12-03-01, 2012
[1] Article highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed