Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2010, Vol. 4 Issue (4) : 445-455    https://doi.org/10.1007/s11704-010-0383-x
Research articles
TH-1: China’s first petaflop supercomputer
Xuejun YANG,Xiangke LIAO,Weixia XU,Junqiang SONG,Qingfeng HU,Jinshu SU,Liquan XIAO,Kai LU,Qiang DOU,Juping JIANG,Canqun YANG,
School of Computer Science, National University of Defense Technology, Changsha 410073, China;
 Download: PDF(288 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract In recent years, heterogeneous systems and cooperative computing have become popular research directions in the field of high performance computing. With fast scaling of the size of high performance computer systems, problems such as power consumption and reliability come to the forefront. The aim of high performance computing has thus shifted from merely seeking peak performance to comprehensively pursuing high efficiency, which takes into consideration many factors including performance, cost, power, reliability and so on. A heterogeneous computing system consisting of general-purpose CPU(s) and special-purpose accelerator(s) features high performance, lower power consumption and low cost, etc. Hence, it has already become the mainstream in the field of high performance computing. However, such systems still face many challenges and problems, for example, programmability and reliability. In this paper, we firstly analyze the main challenges facing heterogeneous computing systems. Then, we introduce the architecture of the first petaflop computing system in China, the Tianhe-1 (TH-1) heterogeneous system, including its hardware/software interface and interconnect network. During development of the TH-1 system, several challenges were encountered; research into the solutions of these challenges is subsequently presented.
Keywords heterogeneous systems      cooperative computing      Tianhe-1(TH-1) system      load balancing      programming models      low power consumption      fault tolerance      
Issue Date: 05 December 2010
 Cite this article:   
Qingfeng HU,Xuejun YANG,Weixia XU, et al. TH-1: China’s first petaflop supercomputer[J]. Front. Comput. Sci., 2010, 4(4): 445-455.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-010-0383-x
https://academic.hep.com.cn/fcs/EN/Y2010/V4/I4/445
Yang X, Yan X, Xing Z, et al. A 64-bit stream processor architecture for scientificapplications. In: Proceedings of 34th AnnualInternational Symposium on Computer Architecture. 2007, 210–219
Barker K J, Davis K, Hoisie A, et al. Entering the petaflop era: the architecture and performance of Roadrunner. In: Proceedings of 2008 ACM/IEEE Conference onSupercomputing. 2008, 1–11
www.clearspeed.com/docs/resources/
Kirk D. Nvidia cuda software and gpu parallel computing architecture. In: Proceedings of 6th International Symposiumon Memory Management. 2007, 103–104

doi: 10.1145/1296907.1296909
ati.amd.com/technology/streamcomputing/AMDBrook-plus.pdf
s08.idav.ucdavis.edu/munshi-opencl.pdf
www.green500.org
Semeraro G, Magklis G, Balasubramonian R, et al. Energy-efficient processordesign using multiple clock domains with dynamic voltage and frequencyscaling. In: Proceedings of 8th InternationalSymposium on High-Performance Computer Architecture. 2002, 29–42
Luk C, Hong S, Kim H. Qilin: exploiting parallelism on heterogeneousmultiprocessors with adaptive mapping. In: Proceedings of 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 2009, 45–55

doi: 10.1145/1669112.1669121
Dimitrov M, Mantor M, Zhou H. Understanding software approachesfor GPGPU reliability. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics ProcessingUnits. 2009, 94–104

doi: 10.1145/1513895.1513907
Oh N, Shirvani P, McCluskey E J. Error detection by duplicatedinstruction in super-scalar processors. IEEE Transactions on Reliability, 2002, 51(1): 63–75

doi: 10.1109/24.994913
Norman A N, Choi S, Lin C. Compiler-generated staggered checkpointing. In: Proceedings of 7th ACM Workshop on Languages,Compilers, and Runtime Support for Scalable Systems. 2004, 1–8

doi: 10.1145/1066650.1066663
[1] Tao ZHU, Huiqi HU, Weining QIAN, Huan ZHOU, Aoying ZHOU. Fault-tolerant precise data access on distributed log-structured merge-tree[J]. Front. Comput. Sci., 2019, 13(4): 760-777.
[2] Yihong GAO, Huadong MA. StreamTune: dynamic resource scheduling approach for workload skew in video data center[J]. Front. Comput. Sci., 2018, 12(4): 669-681.
[3] Xiong FU, Juzhou CHEN, Song DENG, Junchang WANG, Lin ZHANG. Layered virtual machine migration algorithm for network resource balancing in cloud computing[J]. Front. Comput. Sci., 2018, 12(1): 75-85.
[4] Cheqing JIN, Jie CHEN, Huiping LIU. MapReduce-based entity matching with multiple blocking functions[J]. Front. Comput. Sci., 2017, 11(5): 895-911.
[5] Cui HUANG, Dakun ZHANG, Guozhi SONG. A novel mapping algorithm for three-dimensional network on chip based on quantum-behaved particle swarm optimization[J]. Front. Comput. Sci., 2017, 11(4): 622-631.
[6] Quanqing XU,Rajesh Vellore ARUMUGAM,Khai Leong YONG,Yonggang WEN,Yew-Soon ONG,Weiya XI. Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems[J]. Front. Comput. Sci., 2015, 9(6): 904-918.
[7] Yaobin HE, Haoyu TAN, Wuman LUO, Shengzhong FENG, Jianping FAN. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data[J]. Front. Comput. Sci., 2014, 8(1): 83-99.
[8] Wenchao JIANG, Matthias BAUMGARTEN, Yanhong ZHOU, Hai JIN, . A bipartite model for load balancing in grid computing environments[J]. Front. Comput. Sci., 2009, 3(4): 503-523.
[9] WANG Yuanzhuo, LIN Chuang, YANG Yang, SHAN Zhiguang. Performance analysis of a dependable scheduling strategy based on a fault-tolerant grid model[J]. Front. Comput. Sci., 2007, 1(3): 329-337.
[10] YANG Xuejun, WANG Panfeng, DU Yunfei, ZHOU Haifang. A data-distributed parallel algorithm for wavelet-based fusion of remote sensing images[J]. Front. Comput. Sci., 2007, 1(2): 231-240.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed