Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (4) : 174809    https://doi.org/10.1007/s11704-022-1651-2
RESEARCH ARTICLE
Differential privacy histogram publishing method based on dynamic sliding window
Qian CHEN1,2, Zhiwei NI1,2(), Xuhui ZHU1,2, Pingfan XIA1,2
1. School of Management, Hefei University of Technology, Hefei 230009, China
2. Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei 230009, China
 Download: PDF(5502 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Differential privacy has recently become a widely recognized strict privacy protection model of data release. Differential privacy histogram publishing can directly show the statistical data distribution under the premise of ensuring user privacy for data query, sharing, and analysis. The dynamic data release is a study with a wide range of current industry needs. However, the amount of data varies considerably over different periods. Unreasonable data processing will result in the risk of users’ information leakage and unavailability of the data. Therefore, we designed a differential privacy histogram publishing method based on the dynamic sliding window of LSTM (DPHP-DL), which can improve data availability on the premise of guaranteeing data privacy. DPHP-DL is integrated by DSW-LSTM and DPHK+. DSW-LSTM updates the size of sliding windows based on data value prediction via long short-term memory (LSTM) networks, which evenly divides the data stream into several windows. DPHK+ heuristically publishes non-isometric histograms based on k-mean++ clustering of automatically obtaining the optimal K, so as to achieve differential privacy histogram publishing of dynamic data. Extensive experiments on real-world dynamic datasets demonstrate the superior performance of the DPHP-DL.

Keywords differential privacy      dynamic data      histogram publishing      sliding window     
Corresponding Author(s): Zhiwei NI   
Just Accepted Date: 21 July 2022   Issue Date: 12 December 2022
 Cite this article:   
Qian CHEN,Zhiwei NI,Xuhui ZHU, et al. Differential privacy histogram publishing method based on dynamic sliding window[J]. Front. Comput. Sci., 2023, 17(4): 174809.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1651-2
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I4/174809
Symbol Description
? Privacy budget
D The data stream
xi Data point on the timestamp of i
wj The window with the number of j
L(w) The window size
H(w) The histograms of each window
l The fixed part of the window size
Δl The variable part of the window
s The average statistic value of window data
s The predicted statistic value of window data
K The number of clustering centers
SSE Square sum of errors
k The slope of K and SSE
r The slope change degree
N The total number of windows
W The workload error
E The publishing error
R The reconstruction error
n The number of buckets in a histogram
Tab.1  Summary of frequently used symbols
Fig.1  Basic structure diagram of LSTM
Fig.2  Schematic diagram of LSTM cell logical architecture
Fig.3  Dynamic data histogram publishing model of disease surveillance
  
  
Fig.4  Stacked LSTM model structure
wi l λ s δ s Δl L
w2 30 5 15
w3 24 2 10
w4 10 1 20 0.3 20 0 10
w5 18 ?1 10
w6 10 ?5 5
Tab.2  An example of various windows size updates
  
Fig.5  An example of relation graph between K and SSE
k1,2 k2,3 k3,4 k4,5 k5,6 k6,7
?1771 ?741 ?117 ?22 ?9 ?4
r2 r3 r4 r5 r6
0.817 1.200 0.236 0.034 0.013
Tab.3  The example of the calculation of k and r for choosing optimal value of K
Methods Characteristics The window size (l=100,150,200,250,300)
FSW Fixed size of sliding window l
DSW-speed The dynamic sliding window based on the data speed l+Δd
DSW-LSTM The dynamic sliding window based on LSTM l+Δl
DPHK Differential privacy histogram publishing based on K-means l
DPHK+ Differential privacy histogram publishing based on K-means++ l
Baseline The method adding Laplace noise directly to the original histogram l
FSW-DPHK DPHK with fixed sliding window size l
FSW-DPHK+ DPHK+ with fixed sliding window size l
DSWL-DPHK DPHK with DSW-LSTM l+Δl
SHP Using adaptive sampling method to predict the next arriving count value l
DP-FC Using fractal dimension to cluster datasets l
DPHP-DL The method of this paper (DSW-LSTM + DPHK+) l+Δl
Tab.4  The characteristics and the window size setting of all compared methods
Fig.6  The variance of data statistics of windows in a data stream
Fig.7  SSE of clustering in fixed window
Fig.8  Runtime of clustering in fixed window
Fig.9  Workload error with various privacy budgets on the Adult Dataset
Fig.10  Workload error with various privacy budgets on the Cancer Patient Dataset
Fig.11  Workload error with various privacy budgets on the Ocular Disease Dataset
Fig.12  Workload error with various initial window sizes on the Adult Dataset
Fig.13  Workload error with various initial window sizes on the Cancer Patient Dataset
Fig.14  Workload error with various initial window sizes on the Ocular Disease Dataset
Adult Cancer patient Ocular disease
Baseline 0.37 0.42 0.39
FSW-DPHK 1.30 1.87 1.35
FSW-DPHK+ 0.43 0.52 0.45
DSWL-DPHK 4.20 5.01 4.54
SHP 4.12 4.88 4.05
DP-FC 3.26 3.95 3.28
DPHP-DL 3.40 3.81 3.31
Tab.5  The runtime(s) of methods on various datasets
Time complexity Workload Error/106 Run time/s
DSW-LSTM O(N2) 6.73 2.47
DPHK+ O(N) 7.62 0.72
DPHP-DL O(N2) 5.39 3.40
Tab.6  The study of components of the proposed method
  
  
  
  
1 C Dwork . Differential privacy: a survey of results. In: Proceedings of the 5th International Conference on Theory and Applications of Models of Computation. 2008, 1–19
2 T, Zhu G, Li W, Zhou P S Yu . Differentially private data publishing and analysis: a survey. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 8): 1619–1638
3 T H H, Chan E, Shi D Song . Private and continual release of statistics. In: Proceedings of the 37th International Colloquium on Automata, Languages, and Programming. 2010, 405–417
4 G, Acs C, Castelluccia R Chen . Differentially private histogram publishing through lossy compression. In: Proceedings of the 12th International Conference on Data Mining. 2012, 1–10
5 Dwork C, Naor M, Pitassi T, Rothblum G N. Differential privacy under continual observation. In: Proceedings of the 42nd ACM Symposium on Theory of Computing. 2010, 715–724
6 C, Fang E C Chang . Differential privacy with δ-neighbourhood for spatial and dynamic datasets. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security. 2014, 159–170
7 Xu J, Zhang Z, Xiao X, Yang Y, Yu G. Differentially private histogram publication. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 32–43
8 M Aissaoui . Proportional differential privacy (PDP): a new approach for differentially private histogram release based on buckets densities. In: Proceedings of the 9th IFIP International Conference on Performance Evaluation and Modeling in Wireless Networks. 2020, 1–7
9 X J, Zhang X F Meng . Streaming histogram publication method with differential privacy. Journal of Software, 2016, 27( 2): 381–393
10 L, Yang X, Zheng W Zhao . Non-equal-width histogram publishing method based on differential privacy. Chinese Journal of Network and Information Security, 2020, 6( 3): 39–49
11 F, Yan X, Zhang C, Li W, Li S, Li F Sun . Differentially private histogram publishing through fractal dimension for dynamic datasets. In: Proceedings of the 13th IEEE Conference on Industrial Electronics and Applications. 2018, 1542–1546
12 Z, Zheng T, Wang J, Wen S, Mumtaz A K, Bashir S H Chauhdary . Differentially private high-dimensional data publication in internet of things. IEEE Internet of Things Journal, 2020, 7( 4): 2640–2650
13 N, Wang Y, Gu J, Xu F, Li G Yu . Differentially private high-dimensional data publication via grouping and truncating techniques. Frontiers of Computer Science, 2019, 13( 2): 382–395
14 K, Shah D Jinwala . Privacy preserving secure expansive aggregation with malicious node identification in linear wireless sensor networks. Frontiers of Computer Science, 2021, 15( 6): 156813
15 Q, He P, Xia B, Li J B Liu . Evaluating investors’ recognition abilities for risk and profit in online loan markets using nonlinear models and financial big data. Journal of Function Spaces, 2021, 2021: 5178970
16 R, Chen Y, Shen H Jin . Private analysis of infinite data streams via retroactive grouping. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015, 1061–1070
17 G, Jo K, Jung S Park . An adaptive window size selection method for differentially private data publishing over infinite trajectory stream. Journal of Advanced Transportation, 2018, 2018: 8297678
18 Z, Zhang H, Wang W, Xue Y Xia . Approach for data streams clustering over dynamic sliding windows. Computer Engineering and Applications, 2011, 47( 7): 135–138
19 Q, Wang Y, Zhang X, Lu Z, Wang Z, Qin K Ren . Real-time and spatio-temporal crowd-sourced social network data publishing with differential privacy. IEEE Transactions on Dependable and Secure Computing, 2018, 15( 4): 591–606
20 X, Xiong S, Liu D, Li Z, Cai X Niu . Real-time and private spatio-temporal data aggregation with local differential privacy. Journal of Information Security and Applications, 2020, 55: 102633
21 P, Tang X, Cheng S, Su R, Chen H Shao . Differentially private publication of vertically partitioned data. IEEE Transactions on Dependable and Secure Computing, 2021, 18( 2): 780–795
22 D, Ye T, Zhu S, Shen W Zhou . A differentially private game theoretic approach for deceiving cyber adversaries. IEEE Transactions on Information Forensics and Security, 2021, 16: 569–584
23 Q, Xue Y, Zhu J Wang . Mean estimation over numeric data with personalized local differential privacy. Frontiers of Computer Science, 2022, 16( 3): 163806
24 Z, Huo P, He L, Hu H Zhao . DP-UserPro: differentially private user profile construction and publication. Frontiers of Computer Science, 2021, 15( 5): 155811
25 H, Li L, Xiong X, Jiang J Liu . Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015, 1001–1010
26 R, Gao X Ma . Dynamic data histogram publishing based on differential privacy. In: Proceedings of 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications. 2018, 737–743
27 C, Dwork F, McSherry K, Nissim A Smith . Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. 2006, 265–284
28 Arthur D, Vassilvitskii S. k-means++: the advantages of careful seeding. In: Proceedings of 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 2007, 1027–1035
[1] FCS-21651-OF-QC_suppl_1 Download
[1] Dan ZHAO, Suyun ZHAO, Hong CHEN, Ruixuan LIU, Cuiping LI, Wenjuan LIANG. Efficient protocols for heavy hitter identification with local differential privacy[J]. Front. Comput. Sci., 2022, 16(5): 165825-.
[2] Qiao XUE, Youwen ZHU, Jian WANG. Mean estimation over numeric data with personalized local differential privacy[J]. Front. Comput. Sci., 2022, 16(3): 163806-.
[3] Zheng HUO, Ping HE, Lisha HU, Huanyu ZHAO. DP-UserPro: differentially private user profile construction and publication[J]. Front. Comput. Sci., 2021, 15(5): 155811-.
[4] Zhiyun ZHENG, Ke RUAN, Mengyao YU, Xingjin ZHANG, Ning WANG, Dun LI. k-dominant Skyline query algorithm for dynamic datasets[J]. Front. Comput. Sci., 2021, 15(1): 151602-.
[5] Ning WANG, Yu GU, Jia XU, Fangfang LI, Ge YU. Differentially private high-dimensional data publication via grouping and truncating techniques[J]. Front. Comput. Sci., 2019, 13(2): 382-395.
[6] Chen LUO, Fei HE. SMT-based query tracking for differentially private data analytics systems[J]. Front. Comput. Sci., 2018, 12(6): 1192-1207.
[7] Jiali MAO, Qiuge SONG, Cheqing JIN, Zhigang ZHANG, Aoying ZHOU. Online clustering of streaming trajectories[J]. Front. Comput. Sci., 2018, 12(2): 245-263.
[8] Xiaojian ZHANG,Xiaofeng MENG. Discovering top-k patterns with differential privacy–an accurate approach[J]. Front. Comput. Sci., 2014, 8(5): 816-827.
[9] Zhenyong CHEN, Wei FAN, Zhang XIONG, Pingan ZHANG, Lixin LUO, . Visual data security and management for smart cities[J]. Front. Comput. Sci., 2010, 4(3): 386-393.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed