Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2022, Vol. 16 Issue (5) : 165825    https://doi.org/10.1007/s11704-021-0412-y
RESEARCH ARTICLE
Efficient protocols for heavy hitter identification with local differential privacy
Dan ZHAO1,2, Suyun ZHAO1,2, Hong CHEN1,2(), Ruixuan LIU1,2, Cuiping LI1,2, Wenjuan LIANG1,2
1. Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China, Beijing 100872, China
2. School of Information, Renmin University of China, Beijing 100872, China
 Download: PDF(5598 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Local differential privacy (LDP), which is a technique that employs unbiased statistical estimations instead of real data, is usually adopted in data collection, as it can protect every user’s privacy and prevent the leakage of sensitive information. The segment pairs method (SPM), multiple-channel method (MCM) and prefix extending method (PEM) are three known LDP protocols for heavy hitter identification as well as the frequency oracle (FO) problem with large domains. However, the low scalability of these three LDP algorithms often limits their application. Specifically, communication and computation strongly affect their efficiency. Moreover, excessive grouping or sharing of privacy budgets makes the results inaccurate. To address the above-mentioned problems, this study proposes independent channel (IC) and mixed independent channel (MIC), which are efficient LDP protocols for FO with a large domains. We design a flexible method for splitting a large domain to reduce the number of sub-domains. Further, we employ the false positive rate with interaction to obtain an accurate estimation. Numerical experiments demonstrate that IC outperforms all the existing solutions under the same privacy guarantee while MIC performs well under a small privacy budget with the lowest communication cost.

Keywords local differential privacy      frequency oracle      heavy hitter     
Corresponding Author(s): Hong CHEN   
About author:

Miaojie Yang and Mahmood Brobbey Oppong contributed equally to this work.

Just Accepted Date: 21 June 2021   Issue Date: 28 April 2022
 Cite this article:   
Dan ZHAO,Suyun ZHAO,Hong CHEN, et al. Efficient protocols for heavy hitter identification with local differential privacy[J]. Front. Comput. Sci., 2022, 16(5): 165825.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-021-0412-y
https://academic.hep.com.cn/fcs/EN/Y2022/V16/I5/165825
Protocol Total communication Computation
User side Aggregator side Aggregator side
SPM O(max(log?n,1glog?(d))) ? O(n32)
MCM O(log?n) ? O(n32)
PEM O(log?n) O(nd1g) O(n32)
IC O(k?g) O(n) O(n)
MIC O(log?(k?g)) O(n) O(n)
Tab.1  Total communication cost and computational complexity
Problem Metrics
Method Communication (user side) Advance Contribution Drawback
FO RAPPOR [6] O(d) Additional regression and encoding First general protocol High error
SH [5] O(log?n) Using matrix calculation Reduce communication High error
OUE [42] O(d) Additional encoding and mapping Improve accuracy High communication
OLH [42] O(log?n) Additional hashing and mapping Improve accuracy High computation
HR [43] O(log?d) Using matrix calculation Reduce communication and computation Medium error
Subset [3] O(log?d) Additional calculating the element frequency according to the set frequency Improve accuracy High computation
Heavy Hitter SPM(O-RAPPOR) [44] O(d) m Same as RAPPOR Interaction High error
MCM [5] O(log?n) Same as OUE Effective interaction Medium error
PEM [10] O(log?n) Same as OLH Progressive iteration Low error
Tab.2  Protocols of FO and heavy hitter
Notation Definition
vi Value that user i holds
V Domain of users’ values
d Size of domain, i.e., d=|V|
m Length of a value in V
Hr Hadamard matrices of order 2r for every nonnegative integer r
g Number of groups
b Hash bits vector
Tab.3  Notation definitions
Fig.1  Illustration of IC and MIC. The users to the left are partitioned into g+1 groups. The aggregator to the right updates frequent hash-bits (FH) for each collection interactively. Then, the aggregator estimates the top- k heavy hitters while satisfying the whole FH in group g+1
Fig.2  
d, g, t Group
1 2 3 4 5
216, 3, 18.3 0.04089 0.02729 0.0136 ? ?
216, 4, 8.8 0.11171 0.08441 0.05657 0.02837 ?
216, 5, 5.9 0.20003 0.1632 0.12444 0.0839 0.0422
220, 3, 46.2 0.01622 0.01082 0.00541 ? ?
220, 4, 17.7 0.05625 0.04227 0.02821 0.01412 ?
220, 5, 10.4 0.1179 0.09498 0.07161 0.0479 0.024
Tab.4  Example: FPR for different d, g, t
Fig.3  Example in group 1: the red circle represents the top- k values. We only need to ensure that the red circles are detected in each group
Fig.4  
Fig.5  Illustration of HHR
Fig.6  
Fig.7  Evaluation of different protocols with k=16. (a) URL, NCR, vary ε; (b) URL, F1, vary ε; (c) AOL, NCR, vary ε; (d) AOL, F1, vary ε
Fig.8  Evaluation of protocols with different k under varying ε. (a) URL, NCR, k=8; (b) URL, F1, k=8; (c) AOL, NCR, k=8; (d) AOL, F1, k=8; (e) URL, NCR, k=32; (f) URL, F1, k=32; (g) AOL, NCR, k=32; (h) AOL, F1, k=32
Fig.9  Evaluation of group with varying ε and k=16. (a) IC, URL, NCR, k=16; (b) IC, URL, F1, k=16; (c) MIC,URL, NCR, k=16; (d) MIC, URL, F1, k=16; (e) IC, AOL, NCR, k=16; (f) IC, AOL, F1, k=16; (g) MIC,AOL, NCR, k=16; (h) MIC, AOL, F1, k=16
Fig.10  Evaluation protocols with varying k and ε=2. (a) URL, NCR, ε=2; (b) URL, F1, ε=2; (c) AOL, NCR, ε=2; (d) AOL, F1, ε=2
Fig.11  Running time with varying ε
  
  
  
  
  
  
1 G Cormode S Jha T Kulkarni N Li D Srivastava T Wang. Privacy at scale: local differential privacy in practice. In: Proceedings of 2018 International Conference on Management of Data. 2018, 1655− 1658
2 K Peter O Sewoong V Pramod. Extremal mechanisms for local differential privacy. Advances in Neural Information Processing Systems (NIPS), 2014, 4( January): 2879– 2887
3 M Ye , A Barg . Optimal schemes for discrete distribution estimation under locally differential privacy. IEEE Transactions on Information Theory, 2018, 64( 8): 5662– 5676
4 J Acharya Z Sun H Zhang. Communication efficient, sample optimal, linear time locally private discrete distribution estimation. 2018, arXiv preprint arXiv: 1802.04705
5 R Bassily A Smith. Local, private, efficient protocols for succinct histograms. In: Proceedings of the 47th Annual ACM Symposium on Theory of Computing. 2015, 127− 135
6 Ú Erlingsson V Pihur A Korolova. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of 2014 ACM SIGSAC Conference on Computer and Communications Security. 2014, 1054− 1067
7 G T Abhradeep H V Andrew S V Umesh K Gaurav F Julien R S Vivek D Doug. Learning new words. US Patent, No. 9594741, 14 Mar. 2017
8 B Ding J Kulkarni S Yekhanin. Collecting telemetry data privately. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 3574− 3583
9 N Mishra M Sandler. Privacy via pseudorandom sketches. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2006, 143− 152
10 T Wang , N Li , S Jha . Locally differentially private heavy hitter identification. IEEE Transactions on Dependable and Secure Computing, 2021, 18( 2): 982– 993
11 C Dwork F McSherry K Nissim A Smith. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. 2006, 265− 284
12 K Chatzikokolakis M E Andrés N E Bordenabe C Palamidessi. Broadening the scope of differential privacy using metrics. In: Proceedings of the 13th International Symposium on Privacy Enhancing Technologies Symposium. 2013, 82− 102
13 D Kifer A Machanavajjhala. A rigorous and customizable framework for privacy. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2012, 77− 88
14 X He A Machanavajjhala B Ding. Blowfish privacy: tuning privacy-utility trade-offs using policies. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data. 2014, 1447− 1458
15 M Bun T Steinke. Concentrated differential privacy: simplifications, extensions, and lower bounds. In: Proceedings of the 14th International Conference on Theory of Cryptography. 2016, 635− 658
16 Z Jorgensen T Yu G Cormode. Conservative or liberal? Personalized differential privacy. In: Proceedings of the IEEE the 31st International Conference on Data Engineering. 2015, 1023− 1034
17 C Dwork. Differential privacy: a survey of results. In: Proceedings of the 5th International Conference on Theory and Applications of Models of Computation. 2008, 1− 19
18 Z Qin Y Yang T Yu I Khalil X Xiao K Ren. Heavy hitter estimation over set-valued data with local differential privacy. In: Proceedings of 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016, 192− 203
19 J C Duchi M I Jordan M J Wainwright. Local privacy and minimax bounds: sharp rates for probability estimation. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 1529− 1537
20 J Hsu S Khanna A Roth. Distributed private heavy hitters. In: Proceedings of the 39th International Colloquium on Automata, Languages, and Programming. 2012, 461− 472
21 S P Kasiviswanathan H K Lee K Nissim S Raskhodnikova A Smith. What can we learn privately? SIAM Journal on Computing, 2011, 40( 3): 793− 826
22 Z Li T Wang M Lopuhaä-Zwakenberg N Li B Škoric. Estimating numerical distributions under local differential privacy. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 621− 635
23 J Jia N Z Gong. Calibrate: frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge. In: Proceedings of 2019 IEEE Conference on Computer Communications. 2019, 2008− 2016
24 J C Duchi , M I Jordan , M J Wainwright . Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 2018, 113( 521): 182– 201
25 N Wang X Xiao Y Yang J Zhao S C Hui H Shin J Shin G Yu. Collecting and analyzing multidimensional data with local differential privacy. In: Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE). 2019, 638− 649
26 S Wang , Y Qian , J Du , W Yang , L Huang , H Xu . Set-valued data publication with local privacy: tight error bounds and efficient mechanisms. Proceedings of the VLDB Endowment, 2020, 13( 8): 1234– 1247
27 Q Ye H Hu X Meng H Zheng. PrivKV: key-value data collection with local differential privacy. In: Proceedings of 2019 IEEE Symposium on Security and Privacy (SP). 2019, 317− 331
28 X Gu M Li Y Cheng L Xiong Y Cao. PCKV: locally differentially private correlated key-value data collection with optimized utility. In: Proceedings of the 29th USENIX Conference on Security Symposium. 2020, 55
29 G Cormode , T Kulkarni , D Srivastava . Answering range queries under local differential privacy. Proceedings of the VLDB Endowment, 2019, 12( 10): 1126– 1138
30 N Wang X Xiao Y Yang T D Hoang H Shin J Shin G Yu. PrivTrie: effective frequent term discovery under local differential privacy. In: Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE). 2018, 821− 832
31 T Wang M Lopuhaä-Zwakenberg Z Li B Skoric N Li. Locally differentially private frequency estimation with consistency. In: Proceedings of the 27th Annual Network and Distributed System Security Symposium. 2020
32 R Chen H Li A K Qin S P Kasiviswanathan H Jin. Private spatial data aggregation in the local setting. In: Proceedings of the 32nd IEEE International Conference on Data Engineering (ICDE). 2016, 289− 300
33 Y Nie , W Yang , L Huang , X Xie , Z Zhao , S Wang . A utility-optimized framework for personalized private histogram estimation. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 4): 655– 669
34 M E Andrés N E Bordenabe K Chatzikokolakis C Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. In: Proceedings of 2013 ACM SIGSAC Conference on Computer & Communications Security. 2013, 901− 914
35 S Wang Y Nie P Wang H Xu W Yang L Huang. Local private ordinal data distribution estimation. In: Proceedings of 2017 IEEE Conference on Computer Communications. 2017, 1− 9
36 X L Gu M Li Y Cao L Xiong. Supporting both range queries and frequency estimation with local differential privacy. In: Proceedings of 2019 IEEE Conference on Communications and Network Security (CNS). 2019, 124− 132
37 M E Gursoy , A Tamersoy , S Truex , W Wei , L Liu . Secure and utility-aware data collection with condensed local differential privacy. IEEE Transactions on Dependable and Secure Computing, 2021, 18( 5): 2365– 2378
38 T Murakami Y Kawamoto. Utility-optimized local differential privacy mechanisms for distribution estimation. In: Proceedings of the 28th USENIX Conference on Security Symposium. 2019, 1877− 1894
39 B Balle J Bell A Gascón K Nissim. The privacy blanket of the shuffle model. In: Proceedings of the 39th Annual International Cryptology Conference. 2019, 638− 667
40 A Cheu A Smith J Ullman D Zeber M Zhilyaev. Distributed differential privacy via shuffling. In: Proceedings of the 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2019, 375− 403
41 Ú Erlingsson V Feldman I Mironov A Raghunathan K Talwar A Thakurta. Amplification by shuffling: from local to central differential privacy via anonymity. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms. 2019, 2468− 2479
42 T Wang J Blocki N Li S Jha. Locally differentially private protocols for frequency estimation. In: Proceedings of the 26th USENIX Security Symposium. 2017, 729− 745
43 J Acharya Z Sun H Zhang. Hadamard response: estimating distributions privately, efficiently, and with little communication. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 1120− 1129
44 G Fanti , V Pihur , Ú Erlingsson . Building a RAPPOR with the unknown: privacy-preserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing Technologies, 2016, 2016( 3): 41– 61
45 S L Warner . Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 1965, 60( 309): 63– 69
46 M Bun , J Nelson , U Stemmer . Heavy hitters and the structure of local privacy. ACM Transactions on Algorithms, 2019, 15( 4): 51
47 P Kairouz K A Bonawitz D Ramage. Discrete distribution estimation under local privacy. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 2436− 2444
48 R Bassily K Nissim U Stemmer A Thakurta. Practical locally private heavy hitters. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 2285− 2293
49 A G Thakurta A H Vyrros U S Vaishampayan G Kapoor J Freudinger V V Prakash A Legendre S Duplinsky. Emoji frequency detection and deep link frequency: 9705908. 2020-07-11
50 A Beimel K Nissim U Stemmer. Private learning and sanitization: pure vs. approximate differential privacy. In: Proceedings of the 16th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. 2013, 363− 378
51 M Bun K Nissim U Stemmer S Vadhan. Differentially private release and learning of threshold functions. In: Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science. 2015, 634− 649
52 G Cormode T Kulkarni D Srivastava. Marginal release under local differential privacy. In: Proceedings of 2018 International Conference on Management of Data. 2018, 131− 146
53 T T Nguyên X Xiao Y Yang S C Hui H Shin J Shin. Collecting and analyzing data from smart device users with local differential privacy. 2016, arXiv preprint arXiv: 1606.05053
54 C D Manning P Raghavan H Schütze. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008
[1] Qiao XUE, Youwen ZHU, Jian WANG. Mean estimation over numeric data with personalized local differential privacy[J]. Front. Comput. Sci., 2022, 16(3): 163806-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed