Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2022, Vol. 16 Issue (2) : 162801    https://doi.org/10.1007/s11704-020-0102-1
RESEARCH ARTICLE
Return just your search: privacy-preserving homoglyph search for arbitrary languages
Bowen ZHAO1, Shaohua TANG1,2, Ximeng LIU3(), Yiming WU1
1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
2. Zheng Zhiming Lab, Peng Cheng Laboratory, Shenzhen 518055, China
3. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
 Download: PDF(11395 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Searchable encryption is an effective way to ensure the security and availability of encrypted outsourced cloud data. Among existing solutions, the keyword exact search solution is relatively inflexible, while the fuzzy keyword search solution either has a high index overhead or suffers from the false-positive. Furthermore, no existing fuzzy keyword search solution considers the homoglyph search on encrypted data. In this paper, we propose an efficient privacy-preserving homoglyph search scheme supporting arbitrary languages (POSA, in short). We enhance the performance of the fuzzy keyword search in three aspects. Firstly, we formulate the similarity of homoglyph and propose a privacy-preserving homoglyph search. Secondly, we put forward an index build mechanism without the false-positive, which reduces the storage overhead of the index and is suitable for arbitrary languages. Thirdly, POSA returns just the user’s search, i.e., all returned documents contain the search keyword or its homoglyph. The theoretical analysis and experimental evaluations on real-world datasets demonstrate the effectiveness and efficiency of POSA.

Keywords searchable encryption      cloud storage      fuzzy search      privacy preservation      arbitrary languages     
Corresponding Author(s): Ximeng LIU   
Just Accepted Date: 15 May 2020   Issue Date: 08 September 2021
 Cite this article:   
Bowen ZHAO,Shaohua TANG,Ximeng LIU, et al. Return just your search: privacy-preserving homoglyph search for arbitrary languages[J]. Front. Comput. Sci., 2022, 16(2): 162801.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-0102-1
https://academic.hep.com.cn/fcs/EN/Y2022/V16/I2/162801
Schemes Small overhead No false-positive Fuzzy keyword sets Arbitrary language
[ 6] × × ×
[ 7] × ×
[ 11] × × ×
[ 13] × × ×
[ 14] × × ×
[ 15] × ×
POSA
Tab.1  Comparison with previous solutions
Fig.1  System model of POSA
Notations Descriptions
D Plaintext collection of ndocuments D = { d 1 , d 2 , , d n }
λ Security parameter, i.e., a given positive integer
k d , k t Data encryption/decryption key and the trapdoor generation key, respectively
C Ciphertext collection of ndocuments C = { c 1 , c 2 , , c n }
Γ Plaintext inverted index of D
W Distinct keywords set of Γ , and we call the keyword in Was the initial keyword
w A keyword or word

a
If a is a set, a denotes the cardinality of a; If a is a string, a denotes the bit length of a
I d ( d i ) Identity of document d i containing w
D w Set of the identity of documents containing w, D w is denoted by D w = { I d ( d i ) }, where I d ( d i ) = λ
C w C w denotes the encrypted documents of D w, and is a subset of C
I The securely searchable index
θ A letter
S θ A set of characters that θ is commonly misspelled
S Common homoglyph character set S = { S θ 1 , , S θ t }
Operation of concatenating two strings
Ω w Homoglyph set Ω w = { w , w ~ 1 , w ~ 2 , , w ~ η } of wand Ω w W = w
Θ Fuzzy plaintext inverted index of Γ , Θ = { ξ j }
W Distinct keywords set of Θ
F A pseudo-random function F : { 0 , 1 } λ { 0 , 1 } ? × { 0 , 1 } λ
t p The search trapdoor
H A hash function H : { 0 , 1 } 2 λ { 0 , 1 } ?
P w, R Both of them are random numbers of λ bits
B r A B-tree
? R Choosing independently random
Tab.2  Notations used in this paper
Fig.2  
Fig.3  Example of construction of fuzzy keyword sets Ω w.
Fig.4  The main steps of POSA
Fig.5  
Fig.6  
Fig.7  
Parameters Values
λ (security parameter) 128
F (the pseudo-random function) HMAC-MD5
H (the hash function) SHA256
S (the number of S) 29 (English), 725 (Chinese)
d (edit distance) 0, 1, 2, 3
W (the number of W) [1,000, 10,000]
Tab.3  Experimental parameter settings
Schemes Fuzzy processing Searchable index Trapdoor computation Communication Search
Storage Computation Storage Computation
[ 6] O ( m ? d ) O ( m ? d ) O ( m n ? d ) O ( m n ? d ) O ( 1 ) O ( λ ) O ( m n 2 ? d )
[ 13] O ( n B F ) O ( n m l ν ( ? ? 1 ) ) O ( n B F ) O ( n B F 2 ) O ( l ν ( ? ? 1 ) + B F 2 ) O ( B F ) O ( n B F 2 )
[ 14] O ( n B F ) O ( n m l ν ? ) O ( n B F ) O ( n B F 2 ) O ( l ν ? + B F 2 ) O ( B F ) O ( n B F 2 )
POSA O ( m S θ ? ) O ( m S ? S θ ) O ( m n S θ ? ) O ( m n S θ ? ) O ( 1 ) O ( λ ) O ( log ? ( m n 2 S θ ? )
Tab.4  Overhead comparisons between POSA and previous solutions
Fig.8  The construction time of fuzzy keyword sets of different languages. (a) Chinese; (b) English
Schemes Number of distinct keywords
1,016 1,609 5,305 7,788 9,174 10,003
baseline 26 43 178 299 385 444
d1 184 303 1,288 2,164 2,780 3,207
d2 668 1,089 4,735 7,911 10,149 11,685
d3 1,785 2,855 12,572 20,829 26,726 30,708
POSA 56 93 417 700 895 1,028
Tab.5  Index storage overhead for Chinese documents (KB)
Schemes Number of distinct keywords
1,064 2,016 5,008 7,999 9,001 10,000
baseline 21 41 129 251 300 346
d1 271 536 1,694 3,269 3,911 4,526
d2 1,844 3,681 11,626 22,241 26,717 31,270
POSA 70 137 432 839 1,011 1,169
Tab.6  Index storage overhead for English documents (KB)
Fig.9  The building time of searchable index for different languages. (a) Chinese; (b) English
Fig.10  The searching time of different languages. (a) Chinese; (b) English
1 X Li , J Li , S Yiu , C Gao , J Xiong . Privacy-preserving edge-assisted image retrieval and classification in IoT. Frontiers of Computer Science, 2019, 13( 5): 1136– 1147
https://doi.org/10.1007/s11704-018-8067-z
2 Z Shen , J Shu , W Xue . Preferred search over encrypted data. Frontiers of Computer Science, 2018, 12( 3): 593– 607
https://doi.org/10.1007/s11704-016-6244-5
3 P Xu , Q Wu , W Wang , W Susilo , J Domingoferrer , H Jin . Generating searchable public-key ciphertexts with hidden structures for fast keyword search. IEEE Transactions on Information Forensics and Security, 2015, 10( 9): 1993– 2006
https://doi.org/10.1109/TIFS.2015.2442220
4 Chen B, Wu L, Kumar N, Choo K K R, He D. Lightweight searchable public-key encryption with forward privacy over IIoT outsourced data. IEEE Transactions on Emerging Topics in Computing, 2019, DOI:10.1109/TETC.2019.2737789
5 Kamara S, Papamanthou C, Roeder T. Dynamic searchable symmetric encryption. In: Proceedings of ACM Conference on Computer and Communications Security. 2012, 965–976
6 Li J, Wang Q, Wang C, Cao N, Ren K, Lou W. Fuzzy keyword search over encrypted data in cloud computing. In: Proceedings of IEEE International Conference on Computer Communications. 2010, 1–5
7 Yang Y, Liu X, Deng R H, Weng J. Flexible wildcard searchable encryption system. IEEE Transactions on Services Computing, 2020, 13(3): 464−477
8 Woodbridge J, Anderson H S, Ahuja A, Grant D. Detecting homoglyph attacks with a siamese neural network. In: Procedings of IEEE Symposium on Security and Privacy Workshops. 2018, 22–28
9 Wang C, Ren K, Yu S, Urs K M R. Achieving usable, privacy-assured similarity search over outsourced cloud data. In: Proceedings of IEEE International Conference on Computer Communications. 2012, 451–459
10 Zhu X, Wang G, Xie D. Fuzzy and semantic search over encrypted data in the cloud. In: Proceedings of International Conference on Security Privacy and Anonymity in Computation Communication and Storage. 2016, 332–341
11 A Awad , A Matthews , Y Qiao , B Lee . Chaotic searchable encryption for mobile cloud storage. IEEE Transactions on Cloud Computing, 2018, 6( 2): 440– 452
https://doi.org/10.1109/TCC.2015.2511747
12 Bringer J, Chabanne H, Kindarji B. Error-tolerant searchable encryption. In: Proceedings of IEEE International Conference on Computer Communications. 2009, 1–6
13 Wang B, Yu S, Lou W, Hou Y T. Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In: Proceedings of IEEE International Conference on Computer Communications. 2014, 2112–2120
14 Z Fu , X Wu , C Guan , X Sun , K Ren . Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security, 2016, 11( 12): 2706– 2716
https://doi.org/10.1109/TIFS.2016.2596138
15 Yang Y, Liu X, Deng R H. Multi-user multi-keyword rank search over encrypted data in arbitrary language. IEEE Transactions on Dependable and Secure Computing, 2020, 17(2): 320−334
16 S Yang , S Tang , X Zhang . Privacy-preserving k nearest neighbor query with authentication on road networks. Journal of Parallel and Distributed Computing, 2019, 134 : 25– 36
https://doi.org/10.1016/j.jpdc.2019.07.013
17 Y Wu , S Tang , B Zhao , Z Peng . BPTM: blockchain-based privacy-preserving task matching in crowdsourcing. IEEE Access, 2019, 7 : 45605– 45617
https://doi.org/10.1109/ACCESS.2019.2908265
18 Shu J, Yang K, Jia X, Liu X, Wang C, Deng R H. Proxy-free privacy-preserving task matching with efficient revocation in crowdsourcing. IEEE Transactions on Dependable and Secure Computing, 2021, 18(1): 117−130
19 Park H A, Kim B H, Lee D H, Chung Y D, Zhan J. Secure similarity search. In: Proceedings of IEEE International Conference on Granular Computing. 2007, 598–598
20 Kuzu M, Islam M S, Kantarcioglu M. Efficient similarity search over encrypted data. In: Proceedings of IEEE International Conference on Data Engineering. 2012, 1156–1167
21 J Wang , X Yu , M Zhao . Privacy-preserving ranked multi-keyword fuzzy search on cloud encrypted data supporting range query. Arabian Journal for Science and Engineering, 2015, 40( 8): 2375– 2388
https://doi.org/10.1007/s13369-015-1737-3
22 Zhu X, Liu Q, Wang G. Verifiable dynamic fuzzy search over encrypted data in cloud computing. In: Proceedings of International Conference on Algorithms and Architectures for Parallel Processing. 2015, 655–666
23 X Yuan , X Wang , C Wang , C Yu , S Nutanong . Privacy-preserving similarity joins over encrypted data. IEEE Transactions on Information Forensics and Security, 2017, 12( 11): 2763– 2775
https://doi.org/10.1109/TIFS.2017.2721221
24 Homann D, Göge C, Wiese L. Dynamic similarity search over encrypted data with low leakage. In: Proceedings of International Workshop on Security and Trust Management. 2017, 19–35
25 Liu Z, Jia C, Yang J, Yuan K. Format-preserving fuzzy query mechanism. In: Proceedings of IEEE International Conference on Emerging Intelligent Data and Web Technologies. 2013, 220–226
26 Zhu H, Mei Z, Wu B, Li H, Cui Z. Fuzzy keyword search and access control over ciphertexts in cloud computing. In: Proceedings of Australasian Conference on Information Security and Privacy. 2017, 248–265
27 Y Hua , B Xiao , X Liu , D Feng . The design and implementations of locality-aware approximate queries in hybrid storage systems. IEEE Transactions on Parallel and Distributed Systems, 2014, 26( 11): 3194– 3207
28 Y Yang , S Yang , M Ke . Ranked fuzzy keyword search based on simhash over encrypted cloud data. Chinese Journal of Computers, 2017, 40( 2): 431– 444
29 S Raghavendra , S Girish , C M Geeta , R Buyya , K R Venugopal , S S Iyengar , L M Patnaik . Split keyword fuzzy and synonym search over encrypted cloud data. Multimedia Tools and Applications, 2018, 77( 8): 10135– 10156
https://doi.org/10.1007/s11042-017-5055-5
30 W Zhou , L Liu , H Jing , C Zhang , S Yao , S Wang . K-gram based fuzzy keyword search over encrypted cloud computing. Journal of Software Engineering and Applications, 2013, 6( 1): 29– 33
https://doi.org/10.4236/jsea.2013.61004
31 Ahsan M A M, Chowdhury F Z, Sabilah M, Wahab A W B A, Idris M Y I B. An efficient fuzzy keyword matching technique for searching through encrypted cloud data. In: Proceedings of IEEE International Conference on Research and Innovation in Information Systems. 2017, 1–5
32 Krishna C R, Mittal S A. Privacy preserving synonym based fuzzy multi-keyword ranked search over encrypted cloud data. In: Proceedings of IEEE International Conference on Computing Communication and Automation. 2016, 1187–1194
33 Shi X J, Hu S P. Fuzzy multi-keyword query on encrypted data in the cloud. In: Proceedings of IEEE Internation Conference on Applied Computing and Information Technology/Computational Science/Intelligence and Applied Informatics/Big Data Cloud Computing Data Science & Engineering. 2016, 419–425
34 Ding W, Liu Y, Zhang J. Chinese-keyword fuzzy search and extraction over encrypted patent documents. In: Proceedings of IEEE International Joint Conference on Knowledge Discovery Knowledge Engineering and Knowledge Management. 2015, 168–176
35 V Levenshtein . Binary codes capable of correcting spurious insertions and deletion of ones. Problems of information Transmission, 1965, 1( 1): 8– 17
36 Ran H, Wang Q, Jiang C. Similar Chinese Characters Dictionary. 1st ed. Beijing: Foreign Language Teaching and Research Press, 2010
37 R Curtmola , J Garay , S Kamara , R Ostrovsky . Searchable symmetric encryption: improved definitions and efficient constructions. Journal of Computer Security, 2011, 19( 5): 895– 934
https://doi.org/10.3233/JCS-2011-0426
38 Xu P, Liang S, Wang W, Susilo W, Wu Q, Jin H. Dynamic searchable symmetric encryption with physical deletion and small leakage. In: Proceddings of Australasian Conference on Information Security and Privacy. 2017, 207–226
39 M F Porter . An algorithm for suffix stripping. Program, 2006, 14( 3): 130– 137
40 C Wang , N Cao , K Ren , W Lou . Enabling secure and efficient ranked keyword search over outsourced cloud data. IEEE Transactions on Parallel and Distributed Systems, 2011, 23( 8): 1467– 1479
41 Lai J, Zhou X, Deng R H, Li Y, Chen K. Expressive search on encrypted data. In: Proceedings of ACM SIGSAC Symposium on Information Computer and Communications Security. 2013, 243–252
42 C Bösch , P Hartel , W Jonker , A Peter . A survey of provably secure searchable encryption. ACM Computing Surveys, 2015, 47( 2): 1– 51
43 J Ning , J Xu , K Liang , F Zhang , E C Chang . Passive attacks against searchable encryption. IEEE Transactions on Information Forensics and Security, 2019, 14( 3): 789– 802
https://doi.org/10.1109/TIFS.2018.2866321
44 Cash D, Grubbs P, Perry J, Ristenpart T. Leakage-abuse attacks against searchable encryption. In: Proceedings of ACM Conference on Computer and Communications Security. 2015, 668–679
45 Xu P, Tang S, Xu P, Wu Q, Hu H, Susilo W. Practical multi-keyword and boolean search over encrypted e-mail in cloud server. IEEE Transactions on Services Computing, 2019, DOI:10.1109/TSC.2019.2903502
[1] Qingqing GAN, Joseph K. LIU, Xiaoming WANG, Xingliang YUAN, Shi-Feng SUN, Daxin HUANG, Cong ZUO, Jianfeng WANG. Verifiable searchable symmetric encryption for conjunctive keyword queries in cloud storage[J]. Front. Comput. Sci., 2022, 16(6): 166820-.
[2] Zhangjie FU, Yan WANG, Xingming SUN, Xiaosong ZHANG. Semantic and secure search over encrypted outsourcing cloud based on BERT[J]. Front. Comput. Sci., 2022, 16(2): 162802-.
[3] Abhishek MAJUMDAR, Arpita BISWAS, Atanu MAJUMDER, Sandeep Kumar SOOD, Krishna Lal BAISHNAB. A novel DNA-inspired encryption strategy for concealing cloud storage[J]. Front. Comput. Sci., 2021, 15(3): 153807-.
[4] Xiao PAN, Xiaofeng MENG. Preserving location privacy without exact locations in mobile services[J]. Front Comput Sci, 2013, 7(3): 317-340.
[5] Yubao LIU, Xiuwei CHEN, Zhan LI, Zhijie LI, Raymond Chi-Wing WONG. An efficient method for privacy preserving location queries[J]. Front Comput Sci, 2012, 6(4): 409-420.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed