Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (1) : 151308    https://doi.org/10.1007/s11704-019-9059-3
REVIEW ARTICLE
A survey of density based clustering algorithms
Panthadeep BHATTACHARJEE(), Pinaki MITRA()
Department of Computer Science and Engineering, Indian Institute of Technology, Guwahati 781039, India
 Download: PDF(915 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Density based clustering algorithms (DBCLAs) rely on the notion of density to identify clusters of arbitrary shapes, sizes with varying densities. Existing surveys on DBCLAs cover only a selected set of algorithms. These surveys fail to provide an extensive information about a variety of DBCLAs proposed till date including a taxonomy of the algorithms. In this paper we present a comprehensive survey of various DBCLAs over last two decades along with their classification. We group the DBCLAs in each of the four categories: density definition, parameter sensitivity, execution mode and nature of data and further divide them into various classes under each of these categories. In addition, we compare the DBCLAs through their common features and variations in citation and conceptual dependencies. We identify various application areas of DBCLAs in domains such as astronomy, earth sciences, molecular biology, geography, multimedia. Our survey also identifies probable future directions of DBCLAs where involvement of density based methods may lead to favorable results.

Keywords clustering      density based clustering      survey      classification      common properties      applications     
Corresponding Author(s): Panthadeep BHATTACHARJEE   
Just Accepted Date: 18 September 2019   Issue Date: 24 September 2020
 Cite this article:   
Panthadeep BHATTACHARJEE,Pinaki MITRA. A survey of density based clustering algorithms[J]. Front. Comput. Sci., 2021, 15(1): 151308.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-9059-3
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I1/151308
1 A K Jain, M N Murty, P J Flynn. Data clustering: a review. ACM Computing Surveys, 1999, 31(3): 264–323
https://doi.org/10.1145/331499.331504
2 Z Yan, W Luo, C Bu, L Ni. Clustering spatial data by the neighbors intersection and the density difference. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT). 2016, 217–226
https://doi.org/10.1145/3006299.3006332
3 R Xu, D Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645–678
https://doi.org/10.1109/TNN.2005.845141
4 L Ertöz, M Steinbach, V Kumar. Finding clusters of different sizes, shapes and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining. 2003, 47–58
https://doi.org/10.1137/1.9781611972733.5
5 G Karypis, E H Han, V Kumar. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68–75
https://doi.org/10.1109/2.781637
6 M Ester, H P Kriegel, J Sander, X Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeedings of International Conference on KDD. 1996, 226–231
7 E Keogh, A Mueen. Encyclopedia ofMachine Learning and DataMining. Curse of Dimensionality. 2nd ed. Springer, Boston, MA, 2017, 314–315
https://doi.org/10.1007/978-1-4899-7687-1_192
8 T N Raymond, J Han. Clarans: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(5): 1003–1016
https://doi.org/10.1109/TKDE.2002.1033770
9 J A Hartigan, M A Wong. Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979, 28(1): 100–108
https://doi.org/10.2307/2346830
10 B W Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC press, 1998
11 C Aggarwal, C K Reddy. Data Clustering: Algorithms and Applications. CRC press, 2013
https://doi.org/10.1201/b15410
12 R Agrawal, J Gehrke, D Gunopulos, P Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference onManagement of data. 1998, 94–105
https://doi.org/10.1145/276305.276314
13 M Parimala, D Lopez, N C Senthilkumar. A survey on density based clustering algorithms for mining large spatial databases. International Journal of Advanced Science and Technology, 2011, 31(1): 59–66
14 M Ankerst, M M Breunig, H P Kriegel, J Sander. Optics: ordering points to identify the clustering structure. ACM Sigmod Record, 1999, 28: 49–60
https://doi.org/10.1145/304181.304187
15 A Ram, S Jalal, A S Jalal, M Kumar. A density based algorithm for discovering density varied clusters in large spatial databases. International Journal of Computer Applications, 2010, 3(6): 1–4
https://doi.org/10.5120/739-1038
16 D Birant, A Kut. ST-DBSCAN: an algorithm for clustering spatialtemporal data. Data & Knowledge Engineering, 2007, 60(1): 208–221
https://doi.org/10.1016/j.datak.2006.01.013
17 R Bhuyan, S Borah. A survey of some density based clustering techniques. In: Proceedings of the 2013 Conference on Advancements in Information, Computer and Communication. 2013
18 S Chaudhary, R Chauhan, P Batra. A survey of density based clustering algorithms. International Journal of Computer Science and Technology, 2014, 5(2): 169–171
19 A Hinneburg, D A Keim. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998, 58–65
20 P B Nagpal, P A Mann. Comparative study of density based clustering algorithms. International Journal of Computer Applications, 2011, 27(11): 421–435
https://doi.org/10.5120/3341-4600
21 X Xu, M Ester, H P Kriegel, J Sander. A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th IEEE International Conference on Data Engineering. 1998, 324–331
22 V Thiagarasu, R Prabahari. A comparative analysis of density based clustering techniques for outlier mining. International Journal of Engineering Sciences and Research Technology, 2014, 3(11): 132–136
23 A Rajavat, P Dubey. Comparative study between density based clustering- DBSCAN and OPTICS. International Journal of Advance Computational Engineering and Networking, 2016, 4(12): 34–37
24 A Smiti, Z Eloudi. Soft DBSCAN: improving DBSCAN clustering method using fuzzy set theory. In: Proceedings of the 6th International Conference on Human System Interactions. 2013, 380–385
https://doi.org/10.1109/HSI.2013.6577851
25 W K Loh, Y H Park. A survey on density-based clustering algorithms. In: Jeong Y S, Park Y H, Hsu C H, Park J, eds. Ubiquitous Information Technologies and Applications. Springer, Berlin, Heidelberg, 2014, 775–780
https://doi.org/10.1007/978-3-642-41671-2_98
26 X Xu, J Jäger, H P Kriegel. A fast parallel clustering algorithm for large spatial databases. In: Guo Y, Grossman R, eds. High Performance Data Mining. Springer, Boston, MA, 1999, 263–290
https://doi.org/10.1007/0-306-47011-X_3
27 C Böhm, R Noll, C Plant, B Wackersreuther. Density-based clustering using graphics processors. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 661–670
https://doi.org/10.1145/1645953.1646038
28 W K Loh, Y S Moon, Y H Park. Erratum: fast density-based clustering using graphics processing units. IEICE TRANSACTIONS on Information and Systems, 2014, 97(7): 1947–1951
https://doi.org/10.1587/transinf.E97.D.1947
29 K M Ting, Y Zhu, M J Carman, Y Zhu, Z H Zhou. Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1205–1214.
https://doi.org/10.1145/2939672.2939779
30 J Sander, M Ester, H P Kriegel, X Xu. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 1998, 2(2): 169–194
https://doi.org/10.1023/A:1009745219419
31 M Ester, H P Kriegel, J Sander, M Wimmer, X Xu. Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th International Conference on Very Large Data Bases. 1998, 323–333
32 S Zhou, A Zhou, J Cao, J Wen, Y Fan, Y Hu. Combining sampling technique with DBSCAN algorithm for clustering large spatial databases. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000, 169–172
https://doi.org/10.1007/3-540-45571-X_20
33 R A Jarvis, E A Patrick. Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 1973, 100(11): 1025–1034
https://doi.org/10.1109/T-C.1973.223640
34 E Januzaj, H P Kriegel, M Pfeifle. DBDC: density based distributed clustering. Advances in Database Technology-EDBT 2004, 2004, 529–530
https://doi.org/10.1007/978-3-540-24741-8_7
35 B Borah, D K Bhattacharyya. An improved sampling-based DBSCAN for large spatial databases. In: Proceedings of IEEE International Conference on Intelligent Sensing and Information Processing. 2004, 92–96
36 C F Tsai, C W Liu. KIDBSCAN: a newefficient data clustering algorithm. In: Proceedings of International Conference on Artifical Intelligence and Soft Computing. 2006, 702–711
https://doi.org/10.1007/11785231_73
37 S Kisilevich, F Mansmann, D Keim. P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research and Application. 2010
https://doi.org/10.1145/1823854.1823897
38 N An, L Khac, M T Kechadi. On a distributed approach for density-based clustering. In: Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops. 2011, 283–286
39 Y He, H Tan, W Luo, H Mao, D Ma, S Feng, J Fan. MR-DBSCAN: an efficient parallel density-based clustering algorithm using MapReduce. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems. 2011, 473–480
https://doi.org/10.1109/ICPADS.2011.83
40 R J Campello, D Moulavi, J Sander. Density-based clustering based on hierarchical density estimates. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2013, 160–172
https://doi.org/10.1007/978-3-642-37456-2_14
41 Y Yu, J Zhao, X Wang, Q Wang, Y Zhang. Cludoop: an efficient distributed density-based clustering for big data using hadoop. International Journal of Distributed Sensor Networks, 2015, 11(6): 379–391
https://doi.org/10.1155/2015/579391
42 J Hou, H Gao, X Li. DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Transactions on Image Processing, 2016, 25(7): 3182–3193
https://doi.org/10.1109/TIP.2016.2559803
43 M Pavan, M Pelillo. Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 29(1): 167–172
https://doi.org/10.1109/TPAMI.2007.250608
44 J Gan, Y Tao. Dynamic density based clustering. In: Proceedings of the 2017 ACM International Conference on Management of Data. 2017, 1493–1507
https://doi.org/10.1145/3035918.3064050
45 W Wang, J Yang, R Muntz. Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Data Bases. 1997, 186–195
46 L Davis. Bit-climbing, representational bias, and test suite design. In: Proceedings of the 4th International Conference on Genetic Algorithms. 1991, 18–23
47 A Hinneburg, D A Keim. Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. In: Proceedings of the 25th International Conference on Very Large Data Bases. 1999
48 T Zhang, R Ramakrishnan, M Livny. BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141–182
https://doi.org/10.1023/A:1009783824328
49 G Sheikholeslami, S Chatterjee, A Zhang. WaveCluster: a multiresolution clustering approach for very large spatial databases. In: Proceedings of the 24th International Conference on Very Large Data Bases. 1998, 428–439
50 C C Aggarwal. A human-computer interactive method for projected clustering. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(4): 448–460
https://doi.org/10.1109/TKDE.2004.1269669
51 Y Chen, L Tu. Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 133–142
https://doi.org/10.1145/1281192.1281210
52 C C Aggarwal. Outlier Analysis. In: Data Mining. Springer, 2015, 237–263
https://doi.org/10.1007/978-3-319-14142-8_8
53 H P Kriegel, M Pfeifle. Hierarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005
https://doi.org/10.1145/1081870.1081955
54 H P Kriegel, M Pfeifle. Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 672–677
https://doi.org/10.1145/1081870.1081955
55 M Stonebraker, J Frew, J Dozier. The SEQUOIA 2000 Project. In: Proceedings of International Symposium on Spatial Databases. 1993, 397–412
https://doi.org/10.1007/3-540-56869-7_22
56 T Zhang, R Ramakrishnan, M Livny. BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Record, 1996, 25(2): 103–114
https://doi.org/10.1145/235968.233324
57 Y Zheng, X Xie, W Y Ma. Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Engineering Bulletin, 2010, 33(2): 32–39
58 J Yuan, Y Zheng, X Xie, G Sun. T-drive: enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge and Data Engineering, 2011, 25(1): 220–232, 2011
https://doi.org/10.1109/TKDE.2011.200
59 K Y Yeung, C Fraley, A Murua, A E Raftery, W L Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 2001, 17(10): 977–987
https://doi.org/10.1093/bioinformatics/17.10.977
60 C M Bishop. Pattern Recognition and Machine Learning. Springer, 2006
61 C D Lu, T Y Zhang, X Z Du, C P Li. A robust kernel PCA algorithm. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics. 2004, 3084–3087
62 S Balakrishnama, A Ganapathiraju. Linear discriminant analysis-a brief tutorial. Institute for Signal and Information Processing, 1998, 18: 1–8
63 G Baudat, F Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 2000, 12(10): 2385–2404
https://doi.org/10.1162/089976600300014980
64 V A Epanechnikov. Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications, 1969, 14(1): 153–158
https://doi.org/10.1137/1114019
65 S Guha, R Rastogi, K Shim. CURE: an efficient clustering algorithm for large databases. ACM Sigmod Record, 1998, 27(2): 73–84
https://doi.org/10.1145/276305.276312
66 M Tavallaee, E Bagheri, W Lu, A A Ghorbani. A detailed analysis of the KDD cup 99 data set. In: Proceedings of IEEE Symposium on Computational Intelligence for Security and Defense Applications. 2009, 1–6
https://doi.org/10.1109/CISDA.2009.5356528
67 G D Bader, C W V Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003, 4(1): 2
https://doi.org/10.1186/1471-2105-4-2
68 M L Connolly. Measurement of protein surface shape by solid angles. Journal of Molecular Graphics, 1986, 4(1): 3–6
https://doi.org/10.1016/0263-7855(86)80086-8
69 R H Becker, R L White, D J Helfand. The first survey: faint images of the radio sky at twenty centimeters. The Astrophysical Journal, 1995, 450: 559
https://doi.org/10.1086/176166
70 A F Zepka, J M Cordes, I Wasserman. Signal detection amid noise with known statistics. The Astrophysical Journal, 1994, 427: 438–445
https://doi.org/10.1086/174154
71 N Weir, U M Fayyad, S Djorgovski. Automated star/galaxy classification for digitized POSS-II. The Astronomical Journal, 1995, 109: 2401
https://doi.org/10.1086/117459
72 N R Chrisman, J A Dougenik, D White. Lessons for the design of polygon overlay processing from the odyssey whirlpool algorithm. In: Proceedings of the 5th International Symposium on Spatial Data Handling. 1992, 401–410
73 Q Du, Z Dong, C Huang, F Ren. Density-based clustering with geographical background constraints using a semantic expression model. ISPRS International Journal of Geo-Information, 2016, 5(5): 72
https://doi.org/10.3390/ijgi5050072
74 J Hendler. Data integration for heterogenous datasets. Big Data, 2014, 2(4): 205–215
https://doi.org/10.1089/big.2014.0068
75 T E Allen, M J Herrgård, M Liu, Y Qiu, J Glasner, F R Blattner, B Palsson. Genome-scale analysis of the uses of the escherichia coli genome: modeldriven analysis of heterogeneous data sets. Journal of Bacteriology, 2003, 185(21): 6392–6399
https://doi.org/10.1128/JB.185.21.6392-6399.2003
76 P Pavlidis, J Weston, J Cai, W N Grundy. Gene functional classification from heterogeneous data. In: Proceedings of the 5th Annual International Conference on Computational Biology, 2001, 249–255
https://doi.org/10.1145/369133.369228
77 J Angelier. Tectonic analysis of fault slip data sets. Journal of Geophysical Research: Solid Earth, 1984, 89(B7): 5835–5848
https://doi.org/10.1029/JB089iB07p05835
78 M Kowalski, D Rubin, G Aldering, R J Agostinho, A Amadon, R Amanullah, C Balland, K Barbary, G Blanc, P J Challis, et al. Improved cosmological constraints from new, old, and combined supernova data sets. The Astrophysical Journal, 2008, 686(2): 749
https://doi.org/10.1086/589937
79 M D Nguyen, W Y Shin. DBSTexC: density-based spatio–textual clustering on twitter. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017, 23–26
https://doi.org/10.1145/3110025.3110096
[1] Article highlights Download
[1] Huiyan XU, Zhongqing WANG, Yifei ZHANG, Xiaolan WENG, Zhijian WANG, Guodong ZHOU. Document structure model for survey generation using neural network[J]. Front. Comput. Sci., 2021, 15(4): 154325-.
[2] Miao CAI, Hao HUANG. A survey of operating system support for persistent memory[J]. Front. Comput. Sci., 2021, 15(4): 154207-.
[3] Dongjie CHEN, Yanyan JIANG, Chang XU, Xiaoxing MA. On interleaving space exploration of multi-threaded programs[J]. Front. Comput. Sci., 2021, 15(4): 154206-.
[4] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[5] Yunyun WANG, Jiao HAN, Yating SHEN, Hui XUE. Pointwise manifold regularization for semi-supervised learning[J]. Front. Comput. Sci., 2021, 15(1): 151303-.
[6] Qianchen YU, Zhiwen YU, Zhu WANG, Xiaofeng WANG, Yongzhi WANG. Estimating posterior inference quality of the relational infinite latent feature model for overlapping community detection[J]. Front. Comput. Sci., 2020, 14(6): 146323-.
[7] Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data[J]. Front. Comput. Sci., 2020, 14(5): 145310-.
[8] Parnika PARANJAPE, Meera DHABU, Parag DESHPANDE. A novel classifier for multivariate instance using graph class signatures[J]. Front. Comput. Sci., 2020, 14(4): 144307-.
[9] Muhammad Aminur RAHAMAN, Mahmood JASIM, Md. Haider ALI, Md. HASANUZZAMAN. Bangla language modeling algorithm for automatic recognition of hand-sign-spelled Bangla sign language[J]. Front. Comput. Sci., 2020, 14(3): 143302-.
[10] Yi LIU, Tian SONG, Lejian LIAO. TPII: tracking personally identifiable information via user behaviors in HTTP traffic[J]. Front. Comput. Sci., 2020, 14(3): 143801-.
[11] Hui XUE, Haiming XU, Xiaohong CHEN, Yunyun WANG. A primal perspective for indefinite kernel SVM problem[J]. Front. Comput. Sci., 2020, 14(2): 349-363.
[12] Xibin DONG, Zhiwen YU, Wenming CAO, Yifan SHI, Qianli MA. A survey on ensemble learning[J]. Front. Comput. Sci., 2020, 14(2): 241-258.
[13] Ratha PECH, Dong HAO, Hong CHENG, Tao ZHOU. Enhancing subspace clustering based on dynamic prediction[J]. Front. Comput. Sci., 2019, 13(4): 802-812.
[14] Hui XUE, Sen LI, Xiaohong CHEN, Yunyun WANG. A maximum margin clustering algorithm based on indefinite kernels[J]. Front. Comput. Sci., 2019, 13(4): 813-827.
[15] Gaoqi HE, Qi CHEN, Dongxu JIANG, Yubo YUAN, Xingjian LU. Physical-barrier detection based collective motion analysis[J]. Front. Comput. Sci., 2019, 13(2): 426-436.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed