Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2015, Vol. 9 Issue (1) : 111-127    https://doi.org/10.1007/s11704-014-3480-4
RESEARCH ARTICLE
A new fragment re-allocation strategy for NoSQL database systems
Zhikun CHEN1,2(), Shuqiang YANG1, Shuang TAN1, Li HE1, Hong YIN1, Ge ZHANG3
1. Department of Computer Science, National University of Defense Technology, Changsha 410073, China
2. Unit 91655, People’s Liberation Army, Beijing 100036, China
3. Beijing Aeronautics Engineering Technology Research Center, Beijing 100076, China
 Download: PDF(736 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

NoSQL databases are famed for the characteristics of high scalability, high availability, and high faulttolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. The method uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithm is used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.

Keywords fragment allocation      NoSQL database      hypergraph partition      clustering fragments      fragment correlation     
Corresponding Author(s): Zhikun CHEN   
Issue Date: 09 February 2015
 Cite this article:   
Zhikun CHEN,Shuqiang YANG,Shuang TAN, et al. A new fragment re-allocation strategy for NoSQL database systems[J]. Front. Comput. Sci., 2015, 9(1): 111-127.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-014-3480-4
https://academic.hep.com.cn/fcs/EN/Y2015/V9/I1/111
1 R Bryant, R H Katz, E D Lazowska. Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Computing Community Consortium, 2008: 1−15
2 J Manyika, M Chui, B Brown, J Bughin, R Dobbs, C Roxburgh, A H Byers. Big data: The Next Frontier For Innovation, Competition, and Productivity. MacKinsey Global Institute. 2011
3 S Lohr. The age of big data (2012).
4 Y Noguchi. Following digital breadcrumbs to big data gold (2011)
5 Y Noguchi. The search for analysts to make sense of big data (2011)
6 K A Kumar, A Deshpande, S Khuller. Data placement and replica selection for improving co-location in distributed environments.
7 X Li. Research of data allocation strategy in distributed database. Dissertation for the Master Degree. Dalian: Dalian University of Technology, 2009
8 S B Navathe, M Ra. Vertical partitioning for database design: a graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440−450
https://doi.org/10.1145/66926.66966
9 B Andrew. Mlpart.
10 A E Caldwell, A B Kahng, I L Markov. Design and implementation of move-based heuristics for VLSI hypergraph partitioning. Journal of Experimental Algorithmics, 2000, 5: 5
https://doi.org/10.1145/351827.384247
11 G Karypis, R Aggarwal, V Kumar, S Shekhar. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999, 7(1): 69−79
12 C J Alpert, A B Kahng. Recent directions in netlist partitioning: a survey, Integration, the VLSI Journal, 1995, 19(1): 1−81
13 G Karypis, V Kumar, Multilevel k-way hypergraph partitioning. VLSI Design, 2000, 11(3): 285−300
https://doi.org/10.1155/2000/19436
14 N Selvakkumaran, G Karypis. Multiobjective hypergraph partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(3): 504−517
https://doi.org/10.1109/TCAD.2005.854637
15 D R Liu, S Shekhar. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 1996, 21(6): 475−496
https://doi.org/10.1016/0306-4379(96)00024-5
16 P L Yu, Y R Lee, A Stam. Multiple-criteria Decision Making: Concepts, Techniques, and Extensions. Plenum Press New York, 1985
https://doi.org/10.1007/978-1-4684-8395-6
17 R Cooley, B Mobasher, J Srivastava. Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence. 1997, 558−567
https://doi.org/10.1109/TAI.1997.632303
18 G Karypis, E H Han, V Kumar. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68−75
https://doi.org/10.1109/2.781637
19 A Lakshman, P Malik. Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing. 2009
https://doi.org/10.1145/1582716.1582722
20 A Lakshman, P Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35−40
https://doi.org/10.1145/1773912.1773922
21 R Sarathy, B Shetty, A Sen. A constrained nonlinear 0−1 program for data allocation. European Journal of Operational Research, 1997, 102(3): 626−647
https://doi.org/10.1016/S0377-2217(96)00234-2
22 S Menon. Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(7): 577−585
https://doi.org/10.1109/TPDS.2005.77
23 S Ram, S Narasimhan. Database allocation in a distributed environment: incorporating a concurrency control mechanism and queuing costs. Management Science, 1994, 40(8): 969−983
https://doi.org/10.1287/mnsc.40.8.969
24 K Karlapalem, N M Pun. Query-driven data allocation algorithms for distributed database systems. In: Proceeding of the Database and Expert Systems Applications. 1997, 347−356
https://doi.org/10.1007/BFb0022044
25 A R Chaturvedi, A K Choubey, J Roan. Scheduling the allocation of data fragments in a distributed database environment: a machine learning approach. IEEE Transactions on Engineering Management, 1994, 41 (2): 194−207
https://doi.org/10.1109/17.293386
26 A L Corcoran, J Hale. A genetic algorithm for fragment allocation in a distributed database system. In: Proceedings of the 1994 ACM Symposium on Applied Computing. 1994, 247−250
https://doi.org/10.1145/326619.326738
27 S T March, S Rho. Allocating data and operations to nodes in distributed database design. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(2): 305−317
https://doi.org/10.1109/69.382299
28 P M Apers. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 1988, 13(3): 263−304
https://doi.org/10.1145/44498.45063
29 T Ulus, M Uysal. Heuristic approach to dynamic data allocation in distributed database systems. Pakistan Journal of Information and Technology, 2003, 2(3): 231−239
https://doi.org/10.3923/itj.2003.231.239
30 A G Chin. Incremental data allocation and reallocation in distributed database systems. In: Proceedings of the Data Warehousing and Web Engineering. 2002, 137−160
https://doi.org/10.4018/978-1-931777-02-5.ch007
31 H I Abdalla. An efficient approach for data placement in distributed systems. In: Proceedings of the 5th IEEE FTRA International Conference on Multimedia and Ubiquitous Engineering. 2011, 297−301
32 T Wang, Z Lin, B Yang, J Gao, A Huang, D Yang, Q Zhang, S Tang, J Niu. Mba: A market-based approach to data allocation and dynamic migration for cloud database. Science China Information Sciences, 2012, 55(9): 1935−1948
https://doi.org/10.1007/s11432-011-4432-3
33 J Du, K Barker, R Alhajj. Attraction-a global affinity measure for database vertical partitioning. In: Proceedings of International Conference WWW/Internet. 2003, 538−548
34 C Curino, E Jones, Y Zhang, S Madden. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1−2): 48−57
https://doi.org/10.14778/1920841.1920853
35 A Quamar, K A Kumar, A Deshpande. Sword: scalable workloadaware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 430−441
36 A Quamar. Scaling Transactional Workloads on the Cloud. Technical report MD. 2013
[1] Supplementary Material Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed