Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (5) : 797-811    https://doi.org/10.1007/s11704-016-5035-3
RESEARCH ARTICLE
Detailed and clock-driven simulation for HPC interconnection network
Wenhao ZHOU1,Juan CHEN1(),Chen CUI1,Qian WANG1,Dezun DONG2,Yuhua TANG1
1. State Key Laboratory of High Performance Computing, School of Computer, National University of Defense Technology, Changsha 410073, China
2. Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, China
 Download: PDF(1186 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

Keywords high performance computing      clock-driven simulation      interconnection network      BookSim     
Corresponding Author(s): Juan CHEN   
Just Accepted Date: 14 January 2016   Online First Date: 18 July 2016    Issue Date: 07 September 2016
 Cite this article:   
Wenhao ZHOU,Juan CHEN,Chen CUI, et al. Detailed and clock-driven simulation for HPC interconnection network[J]. Front. Comput. Sci., 2016, 10(5): 797-811.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5035-3
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I5/797
1 Dongarra J J, Meuer H W, Strohmaier E. TOP500 supercomputer sites. Supercomputer, 1997, 13: 89–111
2 Pang Z B, Xie M, Zhang J, Zheng Y, Wang G B, Dong D Z, Suo G. The TH Express high performance interconnect networks. Frontiers of Computer Science, 2014, 8(3): 357–366
https://doi.org/10.1007/s11704-014-3500-9
3 Raponi P G, Petrini F, Walkup R, Checconi F. Characterization of the communication patterns of scientific applications on Blue Gene/P. In: Proceedings of 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW). 2011: 1017–1024
https://doi.org/10.1109/ipdps.2011.249
4 Kogge P M. Architectural challenges at the exascale frontier (invited talk). Simulating the Future: Using One Million Cores and Beyond, 2008
5 Abts D, Marty M R, Wells P M, Klausler P, Liu H. Energy proportional datacenter networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture. 2010, 338–347
https://doi.org/10.1145/1815961.1816004
6 Shalf J, Dosanjh S, Morrison J. Exascale Computing Technology Challenges. In: Palma JMLM, Daydé M, Marques O, Lopes J C, <Eds/>. High Performance Computing for Computational Science lCVECPAR 2010. Berkeley, CA: Springer Berlin Heidelberg, 2011, 1–25
https://doi.org/10.1007/978-3-642-19328-6_1
7 Alonso M, Coll S, Martinez J M, Santonja V, Duato J. Dynamic power saving in fat-tree interconnection networks using on/off links. In: Pro ceedings of the 20th International. IEEE Parallel and Distributed Processing Symposium. 2006
8 Raghunathan V, Srivastava M B, Gupta R K. A survey of techniques for energy ecient on-chip communication. In: Proceedings of the 40th Annual Design Automation Conference. 2003, 900–905
9 Deveci M, Rajamanickam S, Leung V J, Pedretti K, Olivier S L, Bunde D P, Çatalyürek U V, Devine K. Exploiting geometricpartitioning in task mapping for parallel computers. In: Proceedings of the 28th International IEEE Parallel and Distributed Processing Symposium. 2014, 27–36
10 Zhang P, Gao Y, Fierson J, Deng Y F. Eigenanalysis-based task mapping on parallel computers with cellular networks. Mathematics of Computation, 2014, 83(288): 1727–1756
https://doi.org/10.1090/S0025-5718-2013-02770-6
11 Jiang N, Balfour J, Becker D U, Towies B, Dally W J, Michelogiannakis G, Kim J. A detailed and flexible cycle-accurate Network-on- Chip simulator. In: Proceedings of 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2013, 86–96
https://doi.org/10.1109/ISPASS.2013.6557149
12 Agarwal N, Krishna T, Peh L S, Jha N K. GARNET: A detailed onchipnetwork model inside a full-system simulator. In: Proceedings of 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 2009, 33–42
https://doi.org/10.1109/ISPASS.2009.4919636
13 Zhai J D, Chen W G, Zheng W M. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles of Parellel Programming. 2010, 305–314
https://doi.org/10.1145/1693453.1693493
14 Denzel W E, Li J, Walker P, Jin Y. A framework for end-to-end simulation of high-performance computing systems. Simulation, 2010, 86(5–6): 331–350
https://doi.org/10.1177/0037549709340840
15 Wilke J J, Kenny J P. Using discrete event simulation for programming model exploration at extreme-scale: macroscale components for the structural simulation toolkit (SST). Sandia Report SAND2015-1027, Sandia National Laboratories, 2015
16 Binkert N, Beckmann B, Black G, Reinhardt S T, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. ACMSIGARCH Computer Architecture News, 2011, 39(2): 1–7
https://doi.org/10.1145/2024716.2024718
17 Peno B, Wagner A, Tuxen M, Rüngeler I. MPI-NeTSim: a network simulation module for MPI. In: Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2009, 464–471
18 Zheng G, Kakulapati G, Kale L V. BigSim: a parallel simulator for performance prediction of extremely large parallel machines. In: Proceedings of the 18th International IEEE Parallel and Distributed Processing Symposium. 2004
19 Dally W J, Towles B P. Principles and Practices of Interconnection Networks. San Francisco, CA: Elsevier, 2004
20 Culler D, Karp R, Patterson D, Sahay A, Schauser K E, Santos E, Subramonian R, von Eicken T. LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 1–12
https://doi.org/10.1145/155332.155333
21 Alexandrov A, Ionescu M F, Schauser K E, Scheiman C. LogGP: incorporating long messages into the LogP modela—one step closer towards a realisticmodel for parallel computation. In: Proceedings of the 7th Annual ACM symposium on Parallel Algorithms and Architectures. 1995, 95–105
https://doi.org/10.1145/215399.215427
22 Moritz C A, Frank M I. LoGPC: Modeling network contention in message-passing programs. In: Proceedings of the 1998 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 1998, 254–263
https://doi.org/10.1145/277858.277933
23 Chen W G, Zhai J D, Zhang J, Zheng W M. LogGPO: an accurate communication model for performance prediction of MPI programs. Science in China Series F: Information Sciences, 2009, 52(10): 1785–1791
https://doi.org/10.1007/s11432-009-0161-2
24 Liao X K, Xiao L Q, Yang C Q, Lu Y T. MilkyWay-2 supercomputer: system and application. Frontiers of Computer Science, 2014, 8(3): 345–356
https://doi.org/10.1007/s11704-014-3501-3
25 Kelton W D, Law A M. Simulation Modeling and Analysis. Boston: McGraw Hill, 2000
26 Varga A. The OMNeT++ discrete event simulation system. In: Proceedings of the European Simulation Multiconference. 2001
27 Gropp W. MPICH2: a new start for MPI implementations. In: Kranzlmüller D, Volkert J, Kacsuk P, Dongarra J<Eds/>. Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, 2002: 7
https://doi.org/10.1007/3-540-45825-5_5
28 Gabriel E, Fagg G E, Bosilca G, Angskun T, Dongrra J J, Squyres J M, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R H, Daniel D J, Graham R L, Woodall T S. Open MPI: goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller D, Kacsuk P, Dongarra J, <Eds/>. Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, 2004: 97–104
https://doi.org/10.1007/978-3-540-30218-6_19
29 Kim M S, Son D M, Ko Y B, Kim Y H. A simulation study of the PLC-MAC performance using network simulator-2. In: Proceedings of 2008 IEEE International Symposium on Power Line Communications and Its Applications. 2008, 99–104
30 Vetter J S, Mueller F. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. Journal of Parallel and Distributed Computing, 2003, 63(9): 853–865
https://doi.org/10.1016/S0743-7315(03)00104-7
31 Becker D, Wolf F, Frings W, Geimer M, Wylie B J N, Mohr B. Automatic trace-based performance analysis of metacomputing applications. In: Proceedings of 2007 IEEE International Parallel and Distributed Processing Symposium. 2007, 1–10
https://doi.org/10.1109/IPDPS.2007.370238
32 Nagel W E, Arnold A, Weber M, Hoppe H S, Solchenbach K. VAMPIR: visualization and analysis of MPI resources. MD5 SHA512, 1996
33 Mohr B, Wolf F. KOJAK—a tool set for automatic performance analysis of parallel programs. In: Kosch H, Böszörményi L, Hellwagner H, <Eds/>. Euro-Par 2003 Parallel Processing. Springer Berlin Heidelberg, 2003: 1301–1304
https://doi.org/10.1007/978-3-540-45209-6_177
34 Shende S S, Malony A D. The Tau parallel performance system. International Journal of High Performance Computing Applications, 2006, 20(2): 287–311
https://doi.org/10.1177/1094342006064482
35 O’Carroll F, Tezuka H, Hori A, Ishikawa Y. The design and implementation of zero copy MPI using commodity hardware with a high performance network. In: Proceedings of the 12th ACM International Conference on Supercomputing. 1998, 243–250
https://doi.org/10.1145/277830.277883
36 Padovano M. System and method for accessing a storage area network as network attached storage. US Patent, 6606690, <Date>2003-08-12</Date>
37 Hamada T, Nakasato N. InfiniBand Trade Association, InfiniBand architecture specification: release 1.0. In: Proceedings of 2005 International Conference on Field Programmable Logic and Applications. 2005
38 Xie M, Lu Y, Wang K F, Liu L, Cao H J, Yang X J. Tianhe-1A interconnect and message-passing services. IEEE Micro, 2011 (1): 8–20
39 Wu J, Liao X K, Dong D Z, Wang L, Li C L. HVCRouter: energy ecient networkon-chip router with heterogeneous virtual channels. In: Wang G J, Zomaya A, Perez G M, Li K L, <Eds/>. Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2015: 199–213
https://doi.org/10.1007/978-3-319-27119-4_14
40 Ma S, Jerger N E, Wang Z Y. DBAR: an effcient routing algorithm to support multiple concurrent applications in networks-on-chip. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). 2011, 413–424
https://doi.org/10.1145/2000064.2000113
41 Chen J, Zhou W, Ben C. Supremum of idle routers on 2D-mesh with dimension-order routing. Journal of Computational Information Systems, 2014, 10(22): 9897–9906
42 Lusk E, Huss S, Saphir B, Snir M. MPI: a message-passing interface standard. 2009
43 Li J, Huang W, Lefurgy C, Zhang L X, Denzel W E, Treumann R R, Wang K. Power shifting in thrifty interconnection network. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA). 2011, 156–167
https://doi.org/10.1109/hpca.2011.5749725
44 Liao X K. MilkyWay-2: back to the world Top 1. Frontiers of Computer Science, 2014, 8(3): 343–344
https://doi.org/10.1007/s11704-014-4901-0
45 Bailey D H, Barszcz E, Barton J T, Browning D S, Carter R L, Dagum L, Fatoohi R A, Frederickson P O, Lasinski T A, Schreiber R S, Simon H D, Venkatakrishnan V, Weeratunga S K. The NAS parallel benchmarks. International Journal of High Performance Computing Applications, 1991, 5(3): 63–73
https://doi.org/10.1177/109434209100500306
46 Initiative A S C. The ASCI sweep3d benchmark code. 1995
47 Velho P, Legrand A. Accuracy study and improvement of network simulation in the SimGrid framework. In: Proceedings of the 2nd International Conference on Simulation Tools and Techniques. 2009, 13
https://doi.org/10.4108/ICST.SIMUTOOLS2009.5592
48 Tabe T B, Stout Q F. The use of the MPI communication library in the NAS parallel benchmarks. Ann Arbor, 1999(1001): 48109
49 Matsutani H, Koibuchi M, Wang D, Amano H. Run-time power gating of on-chip routers using look-ahead routing. In: Proceedings of the 2008 Asia and South Pacific Design Automation Conference. 2008, 55–60
https://doi.org/10.1109/ASPDAC.2008.4484015
50 Mihic K, Simunic T, DeMicheli G. Reliability and power management of integrated systems. In: Proceedings of 2004 IEEE Euromicro Symposium on Digital System Design. 2004, 5–11
https://doi.org/10.1109/dsd.2004.1333252
[1]  Supplementary Material Download
[1] Xianpeng HUANGFU, Deke GUO, Honghui CHEN, Xueshan LUO. KMcube: the compound of Kautz digraph and M?bius cube[J]. Front Comput Sci, 2013, 7(2): 298-306.
[2] YANG Xuejun, YI Huizhan, QU Xiangli, ZHOU Haifang. Compiler-directed power optimization of high-performance interconnection networks for load-balancing MPI applications[J]. Front. Comput. Sci., 2007, 1(1): 94-105.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed