Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (4) : 608-621    https://doi.org/10.1007/s11704-016-5003-y
RESEARCH ARTICLE
A parallel computing framework for big data
Guoliang CHEN1,2, Rui MAO1,2(), Kezhong LU1,2
1. Guangdong Province Key Laboratory of Popular High Performance Computers, Shenzhen 518060, China
2. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
 Download: PDF(627 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Big data has received great attention in research and application. However, most of the current efforts focus on system and application to handle the challenges of “volume” and “velocity”, and not much has been done on the theoretical foundation and to handle the challenge of “variety”. Based on metric-space indexing and computationalcomplexity theory, we propose a parallel computing framework for big data. This framework consists of three components, i.e., universal representation of big data by abstracting various data types into metric space, partitioning of big data based on pair-wise distances in metric space, and parallel computing of big data with the NC-class computing theory.

Keywords NC-computing      metric space      data partitioning      parallel computing     
Corresponding Author(s): Rui MAO   
Just Accepted Date: 29 March 2016   Online First Date: 17 March 2017    Issue Date: 26 July 2017
 Cite this article:   
Guoliang CHEN,Rui MAO,Kezhong LU. A parallel computing framework for big data[J]. Front. Comput. Sci., 2017, 11(4): 608-621.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5003-y
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I4/608
1 ZhouB, Liu W, FanC . Big Data: Strategy, Technology and Practice. Beijing: Publishing House of Electronics Industry, 2013
2 TuZ. Big Data. Guilin: Guangxi Normal University Press, 2012
3 FanW, GeertsF, NevenF. Making queries tractable on big data with preprocessing: through the eyes of complexity theory. Proceedings of the VLDB Endowment, 2013, 6(9): 685–696
https://doi.org/10.14778/2536360.2536368
4 ChenG. Design and Analysis of Parallel Algorithms. Beijing: Higher Education Press, 2011
5 CookS A. Deterministic CFL’s are accepted simultaneously in polynomial time and log squared space. In: Proceedings of the 11th Annual ACM Symposium on Theory of Computing. 1979, 338–345
https://doi.org/10.1145/800135.804426
6 MaoR, XuH L, WuW B, Li J Q, LiY , LuM H. Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication system. IEEE Communications Magazine, 2015, 53(1): 42–47
https://doi.org/10.1109/MCOM.2015.7010514
7 XiongJ, LuJ, TanF. Topology. Beijing:China Machine Press, 2013
8 ChávezE, Navarro G, Baeza-YatesR , MarroquínJ L. Searching in metric spaces. ACM Computing Surveys, 2001, 33(3): 273–321
https://doi.org/10.1145/502807.502808
9 ZezulaP, AmatoG, DohnalV, Batko M. Similarity Search: the Metric Space Approach. Springer Science & Business Media, 2006
10 MaoR, ZhangP H, LiX L, Liu X, LuM H . Pivot selection for metricspace indexing. International Journal of Machine Learning and Cybernetics, 2016, 7(2): 311–323
https://doi.org/10.1007/s13042-016-0504-4
11 MaoR, Miranker W, MirankerD P . Pivot selection: dimension reduction for distance-based indexing. Journal of Discrete Algorithms, 2012, 13: 32–46
https://doi.org/10.1016/j.jda.2011.10.004
12 ShiH M, Schaeffer J. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 1992, 14(4): 361–372
https://doi.org/10.1016/0743-7315(92)90075-X
13 ValiantL G. Parallelism in comparison problems. SIAM Journal on Computing, 1975, 4(3): 348–355
https://doi.org/10.1137/0204030
14 ShiloachY, Vishkin U. Finding the maximum. Merging and Sorting in a Parallel Computational Model, 1981, 2(1): 88–102
15 ChenG. Balanced (n, m)-selection networks. Computer Research and Development, 1984, 21(11): 9–22
16 Samet, H. Foundations of Multidimensional and Metric Data Structures. San Francisco: Morgan-Kaufmann, 2006
17 HjaltasonG R, SametH. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, 2003. 28(4): 517–580
https://doi.org/10.1145/958942.958948
18 UhlmannJ K. Satisfying general proximity/similarity queries withmetric trees. Information Processing Letter, 1991, 40(4): 175–179
https://doi.org/10.1016/0020-0190(91)90074-R
19 YianilosP N. Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms. 1993
20 MaoR, LiuS, XuH L, Zhang D, MirankerD P . On data partitioning in tree structure metric-space indexes. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications. 2014, 141–155
https://doi.org/10.1007/978-3-319-05810-8_10
21 BozkayaT, Ozsoyoglu M. Indexing large metric spaces for similarity search queries. ACM Transactions on Database Systems, 1999, 24(3): 361–404
https://doi.org/10.1145/328939.328959
22 SergeyB. Near neighbor search in large metric spaces. In: Proceedings of the 21st International Conference on Very Large Data Bases. 1995
23 CiacciaP, Patella M, ZezulaP . M-tree: an efficient access method for similarity search in metric spaces. In: proceedings of the 23rd International Conference on Very Large Data Bases. 1997
24 NavarroG. Searching in metric spaces by spatial approximation. The VLDB Journal, 2002, 11(1): 28–46
https://doi.org/10.1007/s007780200060
25 JagadishH V, OoiB C, TanK L, Yu C, ZhangR . iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 2005, 30(2): 364–397
https://doi.org/10.1145/1071610.1071612
26 ChavezE, Navarro G. Unbalancing: the key to index high dimensional metric spaces. Technical Report. 1999
27 MaoR, XuW, RamakrishnanS . NuckollsG, Miranker D P. On optimizing distance-based similarity search for biological databases. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference. 2005, 351–361
https://doi.org/10.1109/CSB.2005.42
28 MacQueenJ B. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281–297
29 EsterM, Kriegel H P, SanderJ , XuX. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD Conference. 1996. 226–231
30 CullerD, KarpR, PattersonD, Sahay A, SchauserK E , SantosE, Subramonian R, von EickenT . LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 1–12
https://doi.org/10.1145/155332.155333
31 TuB B, ZouM, ZhanJ F, Zhao X F, FanJ P . Research on parallel computation model with memory hierarchy on multi-core clusters. Chinese Journal of Computers, 2008, 31(11): 1948–1955
https://doi.org/10.3724/SP.J.1016.2008.01948
32 LadnerR E. The circuit value problem is log space complete for P. ACM SIGACT News, 1975, 7(1): 18–20
https://doi.org/10.1145/990518.990519
33 PippengerN J. On simultaneous resource bounds. In: Proceedings of the 20th IEEE Annual Symposium on Foundations of Computer Science. 1979, 307–311
https://doi.org/10.1109/sfcs.1979.29
34 ChandraA K, Stockmeyer L J. Alternation. In: Proceedings of the 17th IEEE Annual Symposium on Foundations of Computer Science. 1976, 98–108
https://doi.org/10.1109/sfcs.1976.4
35 GoldschlagerL M. Synchronous parallel computation. Dissertation for the Doctoral Degree. Toronto: University of Toronto, 1977
[1] FCS-0608-15003-RM_suppl_1 Download
[1] Wenjie LIU, Zhanhuai LI. An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment[J]. Front. Comput. Sci., 2019, 13(6): 1309-1325.
[2] Xiaoyan WANG,Tao YANG,Jinchuan CHEN,Long HE,Xiaoyong DU. RDF partitioning for scalable SPARQL query processing[J]. Front. Comput. Sci., 2015, 9(6): 919-933.
[3] Zeyao MO, Aiqing ZHANG, Xiaolin CAO, Qingkai LIU, Xiaowen XU, Hengbin AN, Wenbing PEI, Shaoping ZHU, . JASMIN: a parallel software infrastructure for scientific computing[J]. Front. Comput. Sci., 2010, 4(4): 480-488.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed