Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to createmultiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.
. [J]. Frontiers of Computer Science, 2014, 8(3): 391-408.
Najme MANSOURI. Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments. Front. Comput. Sci., 2014, 8(3): 391-408.
I Foster, C Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004
2
I Foster, C Kesselman, S Tuecke. The anatomy of the grid: enabling scalable virtual organizations. International Journal of High Performance Computing Applications, 2001, 15: 200−222 https://doi.org/10.1177/109434200101500302
3
J Balasangameshwara, N Raju. A hybrid policy for fault tolerant load balancing in grid computing environments. Journal of Network and Computer Applications, 2012, 35: 412−422 https://doi.org/10.1016/j.jnca.2011.09.005
4
K Li, Z Tong, D Liu, T T Azghi, X Liao. A PTS-PGATS based approach for data-intensive scheduling in data grids. Frontiers of Computer Science in China, 2011, 5(4): 513−525 https://doi.org/10.1007/s11704-011-0970-5
5
J Jianjin, Y Guangwen. An optimal replication strategy for data grid systems. Frontiers of Computer Science in China, 2007, 1(3): 338−348 https://doi.org/10.1007/s11704-007-0033-0
6
T Amjad, M Sher, A Daud. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 2012, 28: 337−349 https://doi.org/10.1016/j.future.2011.06.009
7
M Bsoul, A Khasawneh, E Abdallah, Y Kilani. Enhanced fast spread replication strategy for data grid. Journal of Network and Computer Applications, 2011, 34: 575−580 https://doi.org/10.1016/j.jnca.2010.12.006
8
N Muthuvelua, C Vecchiola, I Chai, E Chikkannan, R Buyya. Task granularity policies for deploying bag-of-task applications on global grids. Future Generation Computer Systems, 2013, 29: 170−181 https://doi.org/10.1016/j.future.2012.03.022
9
N Mansouri, G H Dastghaibyfard. Job scheduling and dynamic data replication in data grid environment. Journal of Supercomputing, 2013, 64: 204−225 https://doi.org/10.1007/s11227-012-0850-2
10
J Zhang, B S Lee, X Tang, C K Yeo. A model to predict the optimal performance of the hierarchical data grid. Future Generation Computer Systems, 2010, 26: 1−11 https://doi.org/10.1016/j.future.2009.05.010
11
J Kolodziej, A U Khan, F Xhafa. Genetic algorithms for energy-aware scheduling in computational grids. In: Proceedings of the 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC). 2011, 17−24
12
BIRN.
13
LHC accelerator project.
14
D Cameron, J Casey, L Guy, P Kunszt, S Lemaitre, G McCance, H Stockinger, K Stockinger, G Andronico, W Bell, I Ben-Akiva, D Bosio, R Chytracek, A Domenici, F Donno, W Hoschek, E Laure, L Lucio, P Millar, L Salconi, B Segal, M Silander. Replica management in the European Data Grid Project. Journal of Grid Computer, 2004, 2(4): 341−351 https://doi.org/10.1007/s10723-004-5745-x
15
EU Data Grid project.
16
IVOA.
17
PPDG.
18
GriPhyN: the Grid physics network project.
19
CERN. Compact Muon Solenoid (CMS). ; 2011
20
K Holtman. CMS Data Grid System over view and requirements. The Compact Muon Solenoid (CMS) Experiment Note 2001/037. 2001
R McClatchey, A Anjum, H Stockinger, A Ali, I Willers, M Thomas. Data Intensive and Network Aware (DIANA) grid scheduling. Journal of Grid Computing, 2007, 5: 43−64 https://doi.org/10.1007/s10723-006-9059-z
23
N N Dang, S B Lim. Combination of replication and scheduling in data grid. International Journal of Computer Science and Network Security, 2007, 7(3): 304−308
24
C Liu, S Baskiyar. A scalable grid scheduler for real-time applications. International Journal of Computers and Their Applications, 2009, 16(1): 34−42
25
R S Chang, P H Chen. Complete and fragmented replica selection and retrieval in data grids. Future Generation Computer Systems, 2007, 23: 536−546 https://doi.org/10.1016/j.future.2006.09.006
26
N Mansouri, G H Dastghaibyfard, E Mansouri. Combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications, 2013, 36: 711−722 https://doi.org/10.1016/j.jnca.2012.12.021
27
H J Song, J Liu, D Jakobsen, X Zhang, K Taura, A Chien. The MicroGrid: a scientific tool for modeling computational grids. Scientifics Programming, 2000, 8(3): 127−141
28
A Takefusa, S Matsuoka, H Nakada, K Aida, U Nagashima. Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999, 97−104
29
H Casanova. SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001, 430−437 https://doi.org/10.1109/CCGRID.2001.923223
30
R Buyya, M Murshed. GridSim: a toolkit for modeling and simulation of distributed resource management and scheduling for grid computing. The Journal of Concurrency and Computation: Practice and Experience, 2002, 14: 1175−1200 https://doi.org/10.1002/cpe.710
31
W H Bell, D G Cameron, L Capozza, A P Millar, K Stockinger, F Zini. Optorsim: a grid simulator for studying dynamic data replication strategies. International Journal of High Performance Computing Applications, 2003, 17(4): 1−20 https://doi.org/10.1177/10943420030174005
32
K Ranganathan, I Foster. Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the 2nd International Workshop on Grid Computing, 2001, 75−86
33
S M Park, J H Kim, Y B Go, W S Yoon. Dynamic grid replication strategy based on internet hierarchy. Lecture Note in Computer Science, 2003, 1001: 1324−1331
34
K Sashi, A Thanamani. Dynamic replication in a data grid using a modified BHR region based algorithm. Future Generation Computer Systems, 2011, 27(2): 202−210 https://doi.org/10.1016/j.future.2010.08.011
35
A Horri, R Sepahvand, G H Dastghaibyfard. A hierarchical scheduling and replication strategy. International Journal of Computer Science and Network Security, 2008, 8(8): 30−35
36
R Chang, J Chang, S Lin. Job scheduling and data replication on data grids. Future Generation Computer Systems, 2007, 23(7): 846−860 https://doi.org/10.1016/j.future.2007.02.008
37
N Mansouri, G H Dastghaibyfard. A dynamic replica management strategy in data grid. Journal of Network and Computer Applications, 2012, 35(4): 1297−1303 https://doi.org/10.1016/j.jnca.2012.01.014
38
M Tang, B S Lee, C K Yao, X Y Tang. Dynamic replication algorithm for the multi-tier Data Grid. Future Generation Computer Systems, 2005, 21(5): 775−790 https://doi.org/10.1016/j.future.2004.08.001
39
M Shorfuzzaman, P Graham, R Eskicioglu. Adaptive popularity-driven replica placement in hierarchical data grids. The Journal of Supercomputing, 2010, 51: 374−392 https://doi.org/10.1007/s11227-009-0371-9
40
A Abdullah, M Othman, H Ibrahim, M N Sulaiman, A T Othman. Decentralized replication strategies for P2P based scientific data grid. In: Proceedings of the 2008 International Symposium on Information Technology. 2008, 3: 1−8
41
V Andronikou, K Mamouras, K Tserpes, D Kyriazis, T Varvarigou. Dynamic QoS-aware data replication in grid environments based on data “importance”. Future Generation Computer Systems, 2012, 28(3): 544−553 https://doi.org/10.1016/j.future.2011.02.003
42
M Shorfuzzaman, P G Rasit Eskicioglu, QoS-aware distributed replica placement in hierarchical data grids. In: Proceedings of the 2011 International Conference on Advanced Information Networking and Applications. 2011: 291−299 https://doi.org/10.1109/AINA.2011.76
43
J Taheri, Y C Lee, A Y Zomaya, H J Siegel. A bee colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Computers & Operations Research, 2012 (in press) https://doi.org/10.1016/j.cor.2011.11.012
44
J Zhang, B Lee, X Tang, C Yeo. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102−109
45
M Tang, B S Lee, X Tang, C Yeo. The impact of data replication on job scheduling performance in the data grid. Future Generation Computer System, 2006, 22(3): 254−268 https://doi.org/10.1016/j.future.2005.08.004
46
S Vazhkudai. Enabling the co-allocation of grid data transfers. In: Proceedings of the 4th International Workshop on Grid Computing. 2003, 44−51