Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2014, Vol. 8 Issue (3) : 391-408    https://doi.org/10.1007/s11704-014-3146-2
RESEARCH ARTICLE
Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments
Najme MANSOURI()
Department of Computer Science, Shahid Bahonar University of Kerman, Kerman 97175-569, Iran
 Download: PDF(1652 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to createmultiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.

Keywords data replication      data grid      optorSim      job scheduling      simulation     
Corresponding Author(s): Najme MANSOURI   
Issue Date: 24 June 2014
 Cite this article:   
Najme MANSOURI. Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments[J]. Front. Comput. Sci., 2014, 8(3): 391-408.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-014-3146-2
https://academic.hep.com.cn/fcs/EN/Y2014/V8/I3/391
1 I Foster, C Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004
2 I Foster, C Kesselman, S Tuecke. The anatomy of the grid: enabling scalable virtual organizations. International Journal of High Performance Computing Applications, 2001, 15: 200−222
https://doi.org/10.1177/109434200101500302
3 J Balasangameshwara, N Raju. A hybrid policy for fault tolerant load balancing in grid computing environments. Journal of Network and Computer Applications, 2012, 35: 412−422
https://doi.org/10.1016/j.jnca.2011.09.005
4 K Li, Z Tong, D Liu, T T Azghi, X Liao. A PTS-PGATS based approach for data-intensive scheduling in data grids. Frontiers of Computer Science in China, 2011, 5(4): 513−525
https://doi.org/10.1007/s11704-011-0970-5
5 J Jianjin, Y Guangwen. An optimal replication strategy for data grid systems. Frontiers of Computer Science in China, 2007, 1(3): 338−348
https://doi.org/10.1007/s11704-007-0033-0
6 T Amjad, M Sher, A Daud. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 2012, 28: 337−349
https://doi.org/10.1016/j.future.2011.06.009
7 M Bsoul, A Khasawneh, E Abdallah, Y Kilani. Enhanced fast spread replication strategy for data grid. Journal of Network and Computer Applications, 2011, 34: 575−580
https://doi.org/10.1016/j.jnca.2010.12.006
8 N Muthuvelua, C Vecchiola, I Chai, E Chikkannan, R Buyya. Task granularity policies for deploying bag-of-task applications on global grids. Future Generation Computer Systems, 2013, 29: 170−181
https://doi.org/10.1016/j.future.2012.03.022
9 N Mansouri, G H Dastghaibyfard. Job scheduling and dynamic data replication in data grid environment. Journal of Supercomputing, 2013, 64: 204−225
https://doi.org/10.1007/s11227-012-0850-2
10 J Zhang, B S Lee, X Tang, C K Yeo. A model to predict the optimal performance of the hierarchical data grid. Future Generation Computer Systems, 2010, 26: 1−11
https://doi.org/10.1016/j.future.2009.05.010
11 J Kolodziej, A U Khan, F Xhafa. Genetic algorithms for energy-aware scheduling in computational grids. In: Proceedings of the 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC). 2011, 17−24
12 BIRN.
13 LHC accelerator project.
14 D Cameron, J Casey, L Guy, P Kunszt, S Lemaitre, G McCance, H Stockinger, K Stockinger, G Andronico, W Bell, I Ben-Akiva, D Bosio, R Chytracek, A Domenici, F Donno, W Hoschek, E Laure, L Lucio, P Millar, L Salconi, B Segal, M Silander. Replica management in the European Data Grid Project. Journal of Grid Computer, 2004, 2(4): 341−351
https://doi.org/10.1007/s10723-004-5745-x
15 EU Data Grid project.
16 IVOA.
17 PPDG.
18 GriPhyN: the Grid physics network project.
19 CERN. Compact Muon Solenoid (CMS). ; 2011
20 K Holtman. CMS Data Grid System over view and requirements. The Compact Muon Solenoid (CMS) Experiment Note 2001/037. 2001
21 K Holtman. a model of a virtual data grid application. Lecture Notes in Computer Science, 2001, 2110: 711−720
https://doi.org/10.1007/3-540-48228-8_81
22 R McClatchey, A Anjum, H Stockinger, A Ali, I Willers, M Thomas. Data Intensive and Network Aware (DIANA) grid scheduling. Journal of Grid Computing, 2007, 5: 43−64
https://doi.org/10.1007/s10723-006-9059-z
23 N N Dang, S B Lim. Combination of replication and scheduling in data grid. International Journal of Computer Science and Network Security, 2007, 7(3): 304−308
24 C Liu, S Baskiyar. A scalable grid scheduler for real-time applications. International Journal of Computers and Their Applications, 2009, 16(1): 34−42
25 R S Chang, P H Chen. Complete and fragmented replica selection and retrieval in data grids. Future Generation Computer Systems, 2007, 23: 536−546
https://doi.org/10.1016/j.future.2006.09.006
26 N Mansouri, G H Dastghaibyfard, E Mansouri. Combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications, 2013, 36: 711−722
https://doi.org/10.1016/j.jnca.2012.12.021
27 H J Song, J Liu, D Jakobsen, X Zhang, K Taura, A Chien. The MicroGrid: a scientific tool for modeling computational grids. Scientifics Programming, 2000, 8(3): 127−141
28 A Takefusa, S Matsuoka, H Nakada, K Aida, U Nagashima. Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999, 97−104
29 H Casanova. SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001, 430−437
https://doi.org/10.1109/CCGRID.2001.923223
30 R Buyya, M Murshed. GridSim: a toolkit for modeling and simulation of distributed resource management and scheduling for grid computing. The Journal of Concurrency and Computation: Practice and Experience, 2002, 14: 1175−1200
https://doi.org/10.1002/cpe.710
31 W H Bell, D G Cameron, L Capozza, A P Millar, K Stockinger, F Zini. Optorsim: a grid simulator for studying dynamic data replication strategies. International Journal of High Performance Computing Applications, 2003, 17(4): 1−20
https://doi.org/10.1177/10943420030174005
32 K Ranganathan, I Foster. Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the 2nd International Workshop on Grid Computing, 2001, 75−86
33 S M Park, J H Kim, Y B Go, W S Yoon. Dynamic grid replication strategy based on internet hierarchy. Lecture Note in Computer Science, 2003, 1001: 1324−1331
34 K Sashi, A Thanamani. Dynamic replication in a data grid using a modified BHR region based algorithm. Future Generation Computer Systems, 2011, 27(2): 202−210
https://doi.org/10.1016/j.future.2010.08.011
35 A Horri, R Sepahvand, G H Dastghaibyfard. A hierarchical scheduling and replication strategy. International Journal of Computer Science and Network Security, 2008, 8(8): 30−35
36 R Chang, J Chang, S Lin. Job scheduling and data replication on data grids. Future Generation Computer Systems, 2007, 23(7): 846−860
https://doi.org/10.1016/j.future.2007.02.008
37 N Mansouri, G H Dastghaibyfard. A dynamic replica management strategy in data grid. Journal of Network and Computer Applications, 2012, 35(4): 1297−1303
https://doi.org/10.1016/j.jnca.2012.01.014
38 M Tang, B S Lee, C K Yao, X Y Tang. Dynamic replication algorithm for the multi-tier Data Grid. Future Generation Computer Systems, 2005, 21(5): 775−790
https://doi.org/10.1016/j.future.2004.08.001
39 M Shorfuzzaman, P Graham, R Eskicioglu. Adaptive popularity-driven replica placement in hierarchical data grids. The Journal of Supercomputing, 2010, 51: 374−392
https://doi.org/10.1007/s11227-009-0371-9
40 A Abdullah, M Othman, H Ibrahim, M N Sulaiman, A T Othman. Decentralized replication strategies for P2P based scientific data grid. In: Proceedings of the 2008 International Symposium on Information Technology. 2008, 3: 1−8
41 V Andronikou, K Mamouras, K Tserpes, D Kyriazis, T Varvarigou. Dynamic QoS-aware data replication in grid environments based on data “importance”. Future Generation Computer Systems, 2012, 28(3): 544−553
https://doi.org/10.1016/j.future.2011.02.003
42 M Shorfuzzaman, P G Rasit Eskicioglu, QoS-aware distributed replica placement in hierarchical data grids. In: Proceedings of the 2011 International Conference on Advanced Information Networking and Applications. 2011: 291−299
https://doi.org/10.1109/AINA.2011.76
43 J Taheri, Y C Lee, A Y Zomaya, H J Siegel. A bee colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Computers & Operations Research, 2012 (in press)
https://doi.org/10.1016/j.cor.2011.11.012
44 J Zhang, B Lee, X Tang, C Yeo. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102−109
45 M Tang, B S Lee, X Tang, C Yeo. The impact of data replication on job scheduling performance in the data grid. Future Generation Computer System, 2006, 22(3): 254−268
https://doi.org/10.1016/j.future.2005.08.004
46 S Vazhkudai. Enabling the co-allocation of grid data transfers. In: Proceedings of the 4th International Workshop on Grid Computing. 2003, 44−51
[1] Abdelfettah MAATOUG, Ghalem BELALEM, Saïd MAHMOUDI. A location-based fog computing optimization of energy management in smart buildings: DEVS modeling and design of connected objects[J]. Front. Comput. Sci., 2023, 17(2): 172501-.
[2] Xinghua LI, Ting CHEN, Qingfeng CHENG, Jianfeng MA. An efficient and authenticated key establishment scheme based on fog computing for healthcare system[J]. Front. Comput. Sci., 2022, 16(4): 164815-.
[3] Wenbo ZHANG. The parametric complexity of bisimulation equivalence of normed pushdown automata[J]. Front. Comput. Sci., 2022, 16(4): 164405-.
[4] Tianye ZHANG, Qi WANG, Liwen LIN, Jiazhi XIA, Xiwang XU, Yanhao HUANG, Xiaonan LUO, Wenting ZHENG, Wei CHEN. WaveLines: towards effective visualization and analysis of stability in power grid simulation[J]. Front. Comput. Sci., 2021, 15(6): 156704-.
[5] Najme MANSOURI, Mohammad Masoud JAVIDI, Behnam Mohammad Hasani ZADE. Hierarchical data replication strategy to improve performance in cloud computing[J]. Front. Comput. Sci., 2021, 15(2): 152501-.
[6] Jianpeng HU, Linpeng HUANG, Tianqi SUN, Ying FAN, Wenqiang HU, Hao ZHONG. Proactive planning of bandwidth resource using simulation-based what-if predictions forWeb services in the cloud[J]. Front. Comput. Sci., 2021, 15(1): 151201-.
[7] Daian YUE, Vania JOLOBOFF, Frédéric MALLET. TRAP: trace runtime analysis of properties[J]. Front. Comput. Sci., 2020, 14(3): 143201-.
[8] Samuel IRVING, Bin LI, Shaoming CHEN, Lu PENG, Weihua ZHANG, Lide DUAN. Computer comparisons in the presence of performance variation[J]. Front. Comput. Sci., 2020, 14(1): 21-41.
[9] Ying JIANG, Shichao LIU, Thomas EHRHARD. A fully abstract semantics for value-passing CCS for trees[J]. Front. Comput. Sci., 2019, 13(4): 828-849.
[10] Jing LIU, Tengfei LI, Zuohua DING, Yuqing QIAN, Haiying SUN, Jifeng HE. AADL+: a simulation-based methodology for cyber-physical systems[J]. Front. Comput. Sci., 2019, 13(3): 516-538.
[11] Jin HUANG, Jiong CHEN, Weiwei XU, Hujun BAO. A survey on fast simulation of elastic objects[J]. Front. Comput. Sci., 2019, 13(3): 443-459.
[12] Juan ZHANG, Fuqing DUAN, Mingquan ZHOU, Dongcan JIANG, Xuesong WANG, Zhongke WU, Youliang HUANG, Guoguang DU, Shaolong LIU, Pengbo ZHOU, Xiangang SHANG. Stable and realistic crack pattern generation using a cracking node method[J]. Front. Comput. Sci., 2018, 12(4): 777-797.
[13] Wenhao ZHOU,Juan CHEN,Chen CUI,Qian WANG,Dezun DONG,Yuhua TANG. Detailed and clock-driven simulation for HPC interconnection network[J]. Front. Comput. Sci., 2016, 10(5): 797-811.
[14] Xiaodong LI,Xiaotie DENG,Shanfeng ZHU,Feng WANG,Haoran XIE. An intelligent market making strategy in algorithmic trading[J]. Front. Comput. Sci., 2014, 8(4): 596-608.
[15] Wei DUAN, Xiaogang QIU. Fostering artificial societies using social learning and social control in parallel emergency management systems[J]. Front Comput Sci, 2012, 6(5): 604-610.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed