Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2015, Vol. 9 Issue (5) : 741-750    https://doi.org/10.1007/s11704-015-4192-0
RESEARCH ARTICLE
Fast approximate matching of binary codes with distinctive bits
Chenggang Clarence YAN2,3, Hongtao XIE1(), Bing ZHANG4, Yanping MA5, Qiong DAI1, Yizhi LIU6
1. Institute of Information Engineering, Chinese Academy of Sciences, National Engineering Laboratory for Information Security Technologies, Beijing 100093, China
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
3. Department of Automation, Tsinghua University, Beijing 100083, China
4. School of Physics, Beijing Institute of Technology, Beijing 100081, China
5. School of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
6. School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
 Download: PDF(442 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.

Keywords binary codes      approximate nearest neighbor search      hierarchical clustering index     
Corresponding Author(s): Hongtao XIE   
Just Accepted Date: 16 January 2015   Issue Date: 24 September 2015
 Cite this article:   
Chenggang Clarence YAN,Hongtao XIE,Bing ZHANG, et al. Fast approximate matching of binary codes with distinctive bits[J]. Front. Comput. Sci., 2015, 9(5): 741-750.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-015-4192-0
https://academic.hep.com.cn/fcs/EN/Y2015/V9/I5/741
1 W Zhang, K Gao, Y Zhang, J Li. Efficient approximate nearest neighbor search with integrated binary codes. In: Proceedings of ACM International Conference on Multimedia. 2011, 1189−1192
https://doi.org/10.1145/2072298.2071971
2 W Chu, C Li, S Tseng. Travelmedia: an intelligent management system for media captured in travel. Journal of Visual Communication and Image Representation, 2011, 22(1): 93−104
https://doi.org/10.1016/j.jvcir.2010.10.008
3 M Wang, H Li, D Tao, K Lu, X Wu. Multimodal graph-based reranking for Web image search. IEEE Transactions on Image Processing, 2012, 21(11): 4649−4661
https://doi.org/10.1109/TIP.2012.2207397
4 M Wang, G Li, Z Lu, Y Gao, T Chua. When amazon meets google: product visualization by exploring multiple Web sources. ACM Transactions on Internet Technology, 2013, 12(4): 12
https://doi.org/10.1145/2499926.2492690
5 Y Zhang, C Yan, F Dai, Y Ma. Efficient parallel framework for H.264/AVC deblocking filter on many-core platform. IEEE Transactions on Multimedia, 2012, 14(3): 510−524
https://doi.org/10.1109/TMM.2012.2190391
6 C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu. A highly parallel framework for HEVC coding unit partitioning tree decision on manycore processors. IEEE Signal Processing letters, 2014, 21(5): 573−576
https://doi.org/10.1109/LSP.2014.2310494
7 C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu. Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(12): 2077−2089
https://doi.org/10.1109/TCSVT.2014.2335852
8 A Torralba, R Fergus, Y Weiss. Small codes and large image databases for recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1−8
https://doi.org/10.1109/cvpr.2008.4587633
9 L Zhang, Y Zhang, J Tang, X Gu, J Li, Q Tian. Topology preserving hashing for similarity search. In: Proceedings of ACM International Conference on Multimedia. 2013, 123−132
https://doi.org/10.1145/2502081.2502091
10 H Xie, Y Zhang, J Tan, L Guo, J Li. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia, 2014, 16(4): 1104−1114
https://doi.org/10.1109/TMM.2014.2305909
11 R Salakhutdinov, G Hinton. Semantic hashing. International Journal of Approximate Reasoning, 2009, 50(7): 969−978
https://doi.org/10.1016/j.ijar.2008.11.006
12 C Strecha, A Bronstein, M Bronstein, P Fua. LDAHash: improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(1): 66−78
https://doi.org/10.1109/TPAMI.2011.103
13 E Rublee, V Rabaud, K Konolige, G Bradski. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 2564−2571
https://doi.org/10.1109/iccv.2011.6126544
14 M Norouzi, A Punjani, D J Fleet. Fast search in hamming space with multi-index hashing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012: 3108−3115
https://doi.org/10.1109/cvpr.2012.6248043
15 M Muja, D G Lowe. Fast matching of binary features. In: Proceedings of Computer and Robot Vision. 2012: 404−410
https://doi.org/10.1109/crv.2012.60
16 M Muja, D G Lowe. Flann, fast library for approximate nearest neighbors.
17 C L Zitnick. Binary coherent edge descriptors. Computer Vision− ECCV 2010. Springer Berlin Heidelberg, 2010, 170−182
https://doi.org/10.1007/978-3-642-15552-9_13
18 Y Weiss, A Torralba, R Fergus. Spectral hashing. In: Proceedings of Advances in Neural Information Processing Systems. 2008, 1753−1760
19 R W Yeung. Information Theory and Network Coding. Springer, 2008
20 M Ou, P Cui, F Wang, J Wang, W Zhu, S Yang. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 230−238
https://doi.org/10.1145/2487575.2487668
21 Y Wei, Y Song, Y Zhen, B Liu, Q Yang. Scalable heterogeneous translated hashing. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2014, 791−800
https://doi.org/10.1145/2623330.2623688
22 S Liu, P Cui, W Zhu, S Yang, Q Tian. Social embedding image distance learning. In: Proceedings of the ACM International Conference on Multimedia. 2014, 617−626
https://doi.org/10.1145/2647868.2654905
23 L Zhang, Y Zhang, J Tang, J Tang, K Lu, Q Tian. Binary code ranking with weighted hamming distance. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1586−1593
https://doi.org/10.1109/CVPR.2013.208
24 H Jegou, M Douze, C Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117−128
https://doi.org/10.1109/TPAMI.2010.57
25 Y Gong, S Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In: Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. 2011, 817−824
https://doi.org/10.1109/cvpr.2011.5995432
26 J P Heo, Y Lee, J He, S Chang, S Yoon. Spherical hashing. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2012, 2957−2964
27 P Li, M Wang, J Cheng, C Xu, H Lu. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia, 2013, 15(1): 141−152
https://doi.org/10.1109/TMM.2012.2199970
28 M M Esmaeili, R K Ward, M Fatourechi. A fast approximate nearest neighbor search algorithm in the hamming space. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(12): 2481−2488
https://doi.org/10.1109/TPAMI.2012.170
29 X Zhang, J Qin, W Wang, et al. Hmsearch: An efficient hamming distance query processing algorithm. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 2013, 19
https://doi.org/10.1145/2484838.2484842
30 M Aly, M Munich, P Perona. Distributed kd-trees for retrieval from very large image collections. In: Proceedings of British Machine Vision Conference. 2011
31 A Babenko, V Lempitsky. The inverted multi-index. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3069−3076
https://doi.org/10.1109/CVPR.2012.6248038
32 C Silpa-Anan, R Hartley. Optimised KD-trees for fast image descriptor matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1−8
https://doi.org/10.1109/cvpr.2008.4587638
33 A Gionis, P Indyk, R Motwani. Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases. 1999, 99: 518−529
34 A.Z Broder,. On the resemblance and containment of documents. In: Proceedings of IEEE Compression and Complexity of Sequences. 1997, 21−29
35 H S Park, C H Jun. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 2009, 36(2): 3336−3341
https://doi.org/10.1016/j.eswa.2008.01.039
36 J M Bland, D G Altman. Statistics notes: measurement error. BMJ, 1996, 312(7047): 1654
https://doi.org/10.1136/bmj.312.7047.1654
37 H Jégou, M Douze, C Schmid. Improving bag-of-features for large scale image search. International Journal of Computer Vision, 2010, 87(3): 316−336
https://doi.org/10.1007/s11263-009-0285-2
38 C Yan, Y Zhang, F Dai, X Wang, L Li, Q Dai. Parallel deblocking filter for HEVC on many-core processor. Electronics Letters, 2014, 50(5): 367−368
https://doi.org/10.1049/el.2013.3235
39 C Yan, Y Zhang, F Dai, L Li. Highly parallel framework for HEVC motion estimation on many-core platform. In: Proceedings of Data Compression Conference. 2013, 63−72
40 C Yan, F Dai, Y Zhang, Y Ma. Parallel deblocking filter for H.264/AVC implemented on Tile64 Platform. In: Proceedings of International Conference on Multimedia and Expo. 2011, 1−6
41 C Yan, Y Zhang, F Dai, J Zhang, L Li, Q Dai. Efficient parallel HEVC intra prediction on many-core processor. Electronics Letters, 2014, 50(11): 805−806
https://doi.org/10.1049/el.2014.0611
[1] Supplementary Material-Highlights in 3-page ppt
Download
[1] Linhao LI, Qinghua HU. Optimized high order product quantization for approximate nearest neighbors search[J]. Front. Comput. Sci., 2020, 14(2): 259-272.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed