|
|
A new fragment re-allocation strategy for NoSQL database systems |
Zhikun CHEN1,2( ), Shuqiang YANG1, Shuang TAN1, Li HE1, Hong YIN1, Ge ZHANG3 |
1. Department of Computer Science, National University of Defense Technology, Changsha 410073, China 2. Unit 91655, People’s Liberation Army, Beijing 100036, China 3. Beijing Aeronautics Engineering Technology Research Center, Beijing 100076, China |
|
|
Abstract NoSQL databases are famed for the characteristics of high scalability, high availability, and high faulttolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. The method uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithm is used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.
|
Keywords
fragment allocation
NoSQL database
hypergraph partition
clustering fragments
fragment correlation
|
Corresponding Author(s):
Zhikun CHEN
|
Issue Date: 09 February 2015
|
|
1 |
R Bryant, R H Katz, E D Lazowska. Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Computing Community Consortium, 2008: 1−15
|
2 |
J Manyika, M Chui, B Brown, J Bughin, R Dobbs, C Roxburgh, A H Byers. Big data: The Next Frontier For Innovation, Competition, and Productivity. MacKinsey Global Institute. 2011
|
3 |
S Lohr. The age of big data (2012).
|
4 |
Y Noguchi. Following digital breadcrumbs to big data gold (2011)
|
5 |
Y Noguchi. The search for analysts to make sense of big data (2011)
|
6 |
K A Kumar, A Deshpande, S Khuller. Data placement and replica selection for improving co-location in distributed environments.
|
7 |
X Li. Research of data allocation strategy in distributed database. Dissertation for the Master Degree. Dalian: Dalian University of Technology, 2009
|
8 |
S B Navathe, M Ra. Vertical partitioning for database design: a graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440−450
https://doi.org/10.1145/66926.66966
|
9 |
B Andrew. Mlpart.
|
10 |
A E Caldwell, A B Kahng, I L Markov. Design and implementation of move-based heuristics for VLSI hypergraph partitioning. Journal of Experimental Algorithmics, 2000, 5: 5
https://doi.org/10.1145/351827.384247
|
11 |
G Karypis, R Aggarwal, V Kumar, S Shekhar. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999, 7(1): 69−79
|
12 |
C J Alpert, A B Kahng. Recent directions in netlist partitioning: a survey, Integration, the VLSI Journal, 1995, 19(1): 1−81
|
13 |
G Karypis, V Kumar, Multilevel k-way hypergraph partitioning. VLSI Design, 2000, 11(3): 285−300
https://doi.org/10.1155/2000/19436
|
14 |
N Selvakkumaran, G Karypis. Multiobjective hypergraph partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(3): 504−517
https://doi.org/10.1109/TCAD.2005.854637
|
15 |
D R Liu, S Shekhar. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 1996, 21(6): 475−496
https://doi.org/10.1016/0306-4379(96)00024-5
|
16 |
P L Yu, Y R Lee, A Stam. Multiple-criteria Decision Making: Concepts, Techniques, and Extensions. Plenum Press New York, 1985
https://doi.org/10.1007/978-1-4684-8395-6
|
17 |
R Cooley, B Mobasher, J Srivastava. Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence. 1997, 558−567
https://doi.org/10.1109/TAI.1997.632303
|
18 |
G Karypis, E H Han, V Kumar. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68−75
https://doi.org/10.1109/2.781637
|
19 |
A Lakshman, P Malik. Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing. 2009
https://doi.org/10.1145/1582716.1582722
|
20 |
A Lakshman, P Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35−40
https://doi.org/10.1145/1773912.1773922
|
21 |
R Sarathy, B Shetty, A Sen. A constrained nonlinear 0−1 program for data allocation. European Journal of Operational Research, 1997, 102(3): 626−647
https://doi.org/10.1016/S0377-2217(96)00234-2
|
22 |
S Menon. Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(7): 577−585
https://doi.org/10.1109/TPDS.2005.77
|
23 |
S Ram, S Narasimhan. Database allocation in a distributed environment: incorporating a concurrency control mechanism and queuing costs. Management Science, 1994, 40(8): 969−983
https://doi.org/10.1287/mnsc.40.8.969
|
24 |
K Karlapalem, N M Pun. Query-driven data allocation algorithms for distributed database systems. In: Proceeding of the Database and Expert Systems Applications. 1997, 347−356
https://doi.org/10.1007/BFb0022044
|
25 |
A R Chaturvedi, A K Choubey, J Roan. Scheduling the allocation of data fragments in a distributed database environment: a machine learning approach. IEEE Transactions on Engineering Management, 1994, 41 (2): 194−207
https://doi.org/10.1109/17.293386
|
26 |
A L Corcoran, J Hale. A genetic algorithm for fragment allocation in a distributed database system. In: Proceedings of the 1994 ACM Symposium on Applied Computing. 1994, 247−250
https://doi.org/10.1145/326619.326738
|
27 |
S T March, S Rho. Allocating data and operations to nodes in distributed database design. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(2): 305−317
https://doi.org/10.1109/69.382299
|
28 |
P M Apers. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 1988, 13(3): 263−304
https://doi.org/10.1145/44498.45063
|
29 |
T Ulus, M Uysal. Heuristic approach to dynamic data allocation in distributed database systems. Pakistan Journal of Information and Technology, 2003, 2(3): 231−239
https://doi.org/10.3923/itj.2003.231.239
|
30 |
A G Chin. Incremental data allocation and reallocation in distributed database systems. In: Proceedings of the Data Warehousing and Web Engineering. 2002, 137−160
https://doi.org/10.4018/978-1-931777-02-5.ch007
|
31 |
H I Abdalla. An efficient approach for data placement in distributed systems. In: Proceedings of the 5th IEEE FTRA International Conference on Multimedia and Ubiquitous Engineering. 2011, 297−301
|
32 |
T Wang, Z Lin, B Yang, J Gao, A Huang, D Yang, Q Zhang, S Tang, J Niu. Mba: A market-based approach to data allocation and dynamic migration for cloud database. Science China Information Sciences, 2012, 55(9): 1935−1948
https://doi.org/10.1007/s11432-011-4432-3
|
33 |
J Du, K Barker, R Alhajj. Attraction-a global affinity measure for database vertical partitioning. In: Proceedings of International Conference WWW/Internet. 2003, 538−548
|
34 |
C Curino, E Jones, Y Zhang, S Madden. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1−2): 48−57
https://doi.org/10.14778/1920841.1920853
|
35 |
A Quamar, K A Kumar, A Deshpande. Sword: scalable workloadaware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 430−441
|
36 |
A Quamar. Scaling Transactional Workloads on the Cloud. Technical report MD. 2013
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|