Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci Chin    2011, Vol. 5 Issue (4) : 513-525    https://doi.org/10.1007/s11704-011-0970-5
RESEARCH ARTICLE
A PTS-PGATS based approach for data-intensive scheduling in data grids
Kenli LI1(), Zhao TONG1, Dan LIU1, Teklay TESFAZGHI1, Xiangke LIAO2
1. College of Information Science and Engineering, Hunan University, Changsha 410082, China; 2. Computer School, National University of Defense Technology, Changsha 410072, China
 Download: PDF(521 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.

Keywords data grid      task scheduling      tabu search      genetic algorithms      parallelism     
Corresponding Author(s): LI Kenli,Email:lkl510@263.net   
Issue Date: 05 December 2011
 Cite this article:   
Kenli LI,Zhao TONG,Dan LIU, et al. A PTS-PGATS based approach for data-intensive scheduling in data grids[J]. Front Comput Sci Chin, 2011, 5(4): 513-525.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-011-0970-5
https://academic.hep.com.cn/fcs/EN/Y2011/V5/I4/513
Fig.1  A simple scheduling framework
Fig.2  An example of a logical network topology
Fig.3  An application model
SymbolMeaning
AApplication
NNumber of tasks
tiith task
CComputing resources
ciith computing resource
MNumber of computing resources
SStorage resources
siith storage resource
LNumber of storage resources
BW(s,c)Bandwidth between s and c
DtiThe set of files required by ti
KiNumber of files required by ti
djtijth file required by ti
Tab.1  Notations
Fig.4  A simple sketch map of the algorithm
Fig.5  Tabu search procedure
Fig.6  Chromosome representation
Fig.7  Evaluation with different number of files per task ( = 50, = 300 000 MI)
Fig.8  Evaluation with increasing number of tasks ( = 3, = 300 000 MI)
Fig.9  Finding maximum using TS-GATS
Fig.10  Finding maximum using PTS-PGATS
Fig.11  Evaluation with different size of tasks ( = 50, = 3)
1 Foster I, Kesselman C, eds. The Grid: Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers, 1999
2 Kenli L, Tianfang T, Feng W. Parallelization methods for implementation of discharge simulation along resin insulator surfaces. Computers & Electrical Engineering , 2011, 37(1): 30-40
3 Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S. The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications , 2000, 23(3): 187-200
4 Kim S, Weissman J B. A genetic algorithm based approach for scheduling decomposable data grid applications. In: Proceedings of the 2004 International Conference on Parallel Processing . 2004, 406-413
5 Maheswaran M, Ali S, Sieel H J, Hensgen D, Freund R F. Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Proceedings of 8th Heterogeneous Computing Workshop . 1999, 30-44
6 Tang X, Li K. A novel security-driven scheduling algorithm for precedence constrained tasks in heterogeneous distributed systems. IEEE Transactions on Computers , 2011, 60(7): 1017-1029
7 Dauzere-Peres S, Paulli J. An integrated approach for modeling and solving the general multiprocessor job-shop scheduling problem using tabu search. Annals of Operations Research , 1997, 70(0): 281-306
8 Abdelaziz A Y, Mohamed F M, Mekhamer S F. Distribution system reconfiguration using a modified tabu search algorithm. Electric Power Systems Research , 2010, 80(8): 943-953
9 Etminani K, Naghibzadeh M. A min-min max-min selective algorithm for grid task scheduling. In: Proceedings of 3rd IEEE/IFIP International Conference on Internet . 2007, 1-7
10 Schwiegelshohn U, Tchernykh A, Yahyapour R. Online scheduling in grids. In Proc. of IEEE International Symposium on Parallel and Distributed Processing . Los Alamitos: IEEE Computer Society, 2008, 1-10
11 Noriguki F, Kenichi H. A comparison among grid scheduling algorithms for independent coarse-grained tasks. In: Proceedings of the 2004 International Symposium on Applications and the Internet Workshops . 2004, 674-680
12 Casanova H, Legrand A, Zagorodnov D, Berman F. Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop . 2000, 349-363
13 Elghirani A, Subrata R, Zomaya A Y, Mazari A A. Performance enhancement through hybrid replication and genetic algorithm co-scheduling in data grids. In: Proceedings of IEEE/ACS International Conference on Computer System and Applications . 2008, 436-443
14 Elghirani A, Subrata R, Zomaya A Y. Intelligent scheduling and replication in data grids: a synergistic approach. In: Proceedings of 7th IEEE International Symposium on Cluster Computing and the Grid . 2007, 179-182
15 Dang N N, Hwang S, Lim S B. Improvement of data grid’s performance by combining job scheduling with dynamic replication strategy. In: Proceedings of 6th International Conference on Grid and Cooperative Computing . 2007, 513-520
16 Venugopal S, Buyya R. An scp-based heuristic approach for scheduling distributed data-intensive applications on global grids. Journal of Parallel and Distributed Computing , 2008, 68(4): 471-487
17 Wang Zhixin and Ju Gang. A parallel genetic algorithm in multi-objective optimization. In: Proceedings of Control and Decision Conference . 2009, 3497-3501
18 Guangyuan L, Jingjun Z, Ruizhen G, Yanmin S. An improved parallel adaptive genetic algorithm based on pareto front for multi-objective problems. In: Proceedings of 2nd International Symposium on Knowledge Acquisition and Modeling . 2009, 212-215
19 Yi H, Yuhui Q, Guangyuan L, Kaiyou L. A parallel tabu search approach based on genetic crossover operation. In: Proceedings of 19th International Conference on Advanced Information Networking and Application . 2005, 467-470
20 Czajkowski K, Fitzgerald S, Foster I, Kesselman C. Grid information services for distributed resource sharing. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing . 2001, 181-194
21 Rajasekar A, Moore R, Wan M. Mysrb & srb: components of a data grid. In: Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing . 2002, 301-310
22 Wolski R, Spring N, Hayes J. The network weather services: a distributed resource performance forcasting service for metacomputing. Journal of Future Generation Computer Systems , 1999, 15(5-6): 757-768
23 Kakarontzas G, Savvas I K. Agent-based resource discovery and selection for dynamic grids. In: Proceedings of 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises . 2006, 195-200
24 Chapman C, Musolesi M, Emmerich W, Mascolo C. Predictive resource scheduling in computational grids. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium . 2007, 1-10
25 Ranganathan K, Foster I. Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of 11th IEEE Symposium on High Performance Distributed Computing . 2002, 352-358
26 Lee W, Mcgough S, Newhouse S, Darlington J. A standard based approach to job submission through web services. In: Proceedings of the UK e-Science All Hands Meeting . 2004, 901-905
27 Srinivas M, Patnaik L M. Genetic algorithms: A survey. Computer , 1994, 27(4): 17-26
28 Schengjun X, Shaoyong G, Dongling B. The analysis and research of parallel genetic algorithm. In: Proceedings of 4th International Conference on Wireless Communications, Networking and Mobile Computing . 2008, 1-4
29 Zhang J, Lee B S, Tang X, Yeo C K. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing . 2008, 102-109
30 Buyya R, Murshed M. Gridsim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurrency and Computation: Practice and Experience , 2002, 14(13-15): 1175-1220
31 Li J, Pan Q, Liang Y. An effective hybrid tabu search algorithm for multi-objective flexible job-shop scheduling problems. Computers & Industrial Engineering , 2010, 59(4): 647-662
[1] Neng HOU, Fazhi HE, Yi ZHOU, Yilin CHEN. An efficient GPU-based parallel tabu search algorithm for hardware/software co-design[J]. Front. Comput. Sci., 2020, 14(5): 145316-.
[2] Diming ZHANG, Fei XUE, Hao HUANG, Shaodi YOU. VBMq: pursuit baremetal performance by embracing block I/O parallelism in virtualization[J]. Front. Comput. Sci., 2018, 12(5): 873-886.
[3] Najme MANSOURI. Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments[J]. Front. Comput. Sci., 2014, 8(3): 391-408.
[4] Dunwei GONG, Yan ZHANG. Generating test data for both path coverage and fault detection using genetic algorithms[J]. Front Comput Sci, 2013, 7(6): 822-837.
[5] Bo YUAN, Wenhuang LIU. Measure oriented training: a targeted approach to imbalanced classification problems[J]. Front Comput Sci, 2012, 6(5): 489-497.
[6] LONG Guilu, LIU Yang. Duality quantum computing[J]. Front. Comput. Sci., 2008, 2(2): 167-178.
[7] WANG Yuanzhuo, LIN Chuang, YANG Yang, SHAN Zhiguang. Performance analysis of a dependable scheduling strategy based on a fault-tolerant grid model[J]. Front. Comput. Sci., 2007, 1(3): 329-337.
[8] JIANG Jianjin, YANG Guangwen. An optimal replication strategy for data grid systems[J]. Front. Comput. Sci., 2007, 1(3): 338-348.
[9] ZHANG Yunquan, CHEN Guoliang, SUN Guangzhong, MIAO Qiankun. Models of parallel computation: a survey and classification[J]. Front. Comput. Sci., 2007, 1(2): 156-165.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed