Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (2) : 382-395    https://doi.org/10.1007/s11704-017-6591-x
RESEARCH ARTICLE
Differentially private high-dimensional data publication via grouping and truncating techniques
Ning WANG1, Yu GU1, Jia XU2, Fangfang LI1, Ge YU()
1. School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
2. School of Computer, Electronics and Information, Guangxi University, Guangxi 530004, China
 Download: PDF(822 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The count of one column for high-dimensional datasets, i.e., the number of records containing this column, has been widely used in numerous applications such as analyzing popular spots based on check-in location information and mining valuable items from shopping records. However, this poses a privacy threat when directly publishing this information. Differential privacy (DP), as a notable paradigm for strong privacy guarantees, is thereby adopted to publish all column counts. Prior studies have verified that truncating records or grouping columns can effectively improve the accuracy of published results. To leverage the advantages of the two techniques, we combine these studies to further boost the accuracy of published results. However, the traditional penalty function, which measures the error imported by a given pair of parameters including truncating length and group size, is so sensitive that the derived parameters deviate from the optimal parameters significantly. To output preferable parameters, we first design a smart penalty function that is less sensitive than the traditional function. Moreover, a two-phase selection method is proposed to compute these parameters efficiently, together with the improvement in accuracy. Extensive experiments on a broad spectrum of real-world datasets validate the effectiveness of our proposals.

Keywords differential privacy      high-dimensional data      truncation optimization      grouping      penalty function     
Corresponding Author(s): Ge YU   
Just Accepted Date: 03 May 2017   Online First Date: 25 May 2018    Issue Date: 08 April 2019
 Cite this article:   
Ning WANG,Yu GU,Jia XU, et al. Differentially private high-dimensional data publication via grouping and truncating techniques[J]. Front. Comput. Sci., 2019, 13(2): 382-395.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-017-6591-x
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I2/382
1 CDwork, F McSherry, KNissim, ASmith. Calibrating noise to sensitivity in private data analysis. In: Proceedings of Theory of Cryptography Conference. 2006, 265–284
https://doi.org/10.1007/11681878_14
2 CDwork, G N Rothblum, S PVadhan. Boosting and differential privacy. In: Proceedings of Annual IEEE Symposium on Foundations of Computer Science. 2010, 51–60
https://doi.org/10.1109/FOCS.2010.12
3 FMcSherry, KTalwar. Mechanism design via differential privacy. In: Proceedings of Annual IEEE Symposium on Foundations of Computer Science. 2007, 94–103
https://doi.org/10.1109/FOCS.2007.66
4 FMcSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2009, 19–30
https://doi.org/10.1145/1559845.1559850
5 VRastogi, SNath. Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 735–746
https://doi.org/10.1145/1807167.1807247
6 GKellaris, S Papadopoulos. Practical differential privacy via grouping and smoothing. In: Proceedings of International Conference on Very Large Data Bases. 2013, 301–312
https://doi.org/10.14778/2535573.2488337
7 W YDay, N HLi. Differentially private publishing of high-dimensional data using sensitivity control. In: Proceedings of ACM Symposium on Information, Computer and Communications Security. 2015, 451–462
https://doi.org/10.1145/2714576.2714621
8 W YDay, N HLi, MLyu . Publishing graph degree distribution with node differential privacy. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2016, 123–138
https://doi.org/10.1145/2882903.2926745
9 MHardt, KLigett, FMcSherry. A simple and practical algorithm for differentially private data release. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2348–2356
10 X KXiao, G ZWang, JGehrke. Differential privacy via wavelet transforms. In: Proceedings of International Conference on Data Engineering. 2010, 225–236
https://doi.org/10.1109/ICDE.2010.5447831
11 X JZhang, RChen, J LXu, X F Meng, Y TXie. Towards accurate histogram publication under differential privacy. In: Proceedings of SIAM International Conference on Data Mining. 2014, 587–595
https://doi.org/10.1137/1.9781611973440.68
12 JXu, Z JZhang, X KXiao, Y Yang, GYu. Differentially private histogram publication. In: Proceedings of IEEE International Conference on Data Engineering. 2012, 32–43
https://doi.org/10.1109/ICDE.2012.48
13 MHay, V Rastogi, GMiklau, DSuciu. Boosting the accuracy of differentially private histograms through consistency. In: Proceedings of International Conference on Very Large Data Bases. 2010, 1021–1032
https://doi.org/10.14778/1920841.1920970
14 CLi, MHay, VRastogi, G Miklau, AMcGregor. Optimizing linear counting queries under differential privacy. In: Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2010, 123–134
https://doi.org/10.1145/1807085.1807104
15 W HQardaji, W NYang, N HLi. Understanding hierarchical methods for differentially private histograms. In: Proceedings of International Conference on Very Large Data Bases. 2013, 1954–1965
https://doi.org/10.14778/2556549.2556576
16 CLi, MHay, GMiklau, Y Wang. A data- and workload-aware query answering algorithm for range queries under differential privacy. In: Proceedings of International Conference on Very Large Data Bases. 2014, 341–352
17 RChen, N Mohammed, C MFung, B CDesai, LXiong. Publishing set-valued data via differential privacy. In: Proceedings of International Conference on Very Large Data Bases. 2011, 1087–1098
18 RChen, G Ács, CCastelluccia. Differentially private sequential data publication via variable-length n-grams. In: Proceedings of ACM Conference on Computer and Communications Security. 2012, 638–649
https://doi.org/10.1145/2382196.2382263
19 JZhang, G Cormode, C MProcopiuc, DSrivastava, X KXiao. Privbayes: private data release via bayesian networks. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2014, 1423–1434
https://doi.org/10.1145/2588555.2588573
20 W HQardaji, W NYang, N HLi. Priview: practical differentially private release of marginal contingency tables. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2014, 1435–1446
https://doi.org/10.1145/2588555.2588575
21 XHe, G Cormode, AMachanavajjhala, C MProcopiuc, D Srivastava. DPT: differentially private trajectory synthesis using hierarchical reference systems. In: Proceedings of International Conference on Very Large Data Bases. 2015, 1154–1165
https://doi.org/10.14778/2809974.2809978
22 S PKasiviswanathan, K Nissim, SRaskhodnikova, ASmith. Analyzing graphs with node differential privacy. In: Proceedings of Theory of Cryptography Conference. 2013, 457–476
https://doi.org/10.1007/978-3-642-36594-2_26
[1] Chen LUO, Fei HE. SMT-based query tracking for differentially private data analytics systems[J]. Front. Comput. Sci., 2018, 12(6): 1192-1207.
[2] Xiaojian ZHANG,Xiaofeng MENG. Discovering top-k patterns with differential privacy–an accurate approach[J]. Front. Comput. Sci., 2014, 8(5): 816-827.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed