|
|
Exploring high-performance processor architecture beyond the exascale |
Xiang-hui XIE(), Xun JIA() |
State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214125, China |
|
|
Abstract The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system, highperformance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exascale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.
|
Keywords
High-performance computing
Beyond the exascale
Processor architecture
Application-customized hardware
Distributed computational resources
|
Corresponding Author(s):
Xiang-hui XIE,Xun JIA
|
Issue Date: 03 December 2018
|
|
1 |
Esmaeilzadeh H, Blem E, Amant RS, et al., 2011. Dark silicon and the end of multicore scaling. 38th Annual Int Symp on Computer Architecture, p.365–376.
|
2 |
Fang JR, Fu HH, Zhao WL, et al., 2017. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. 31st Int Parallel and Distributed Processing Symp, p.615–624.
|
3 |
Fu HH, Liao JF, Yang JZ, et al., 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):1–15.
|
4 |
Fu HH, He CH, Chen BW, et al., 2017. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. 30th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1–12.
|
5 |
García-Flores V, Ayguade E, Peña AJ, 2017. Efficient data sharing on heterogeneous systems. Proc 46th Int Conf on Parallel Processing, p.121–130.
|
6 |
Hemmert S, 2016. Green HPC: from nice to necessity. Comput Sci Eng, 12(6):8–10.
|
7 |
Jia X, Wu GM, Xie XH, 2017. A high-performance accelerator for floating-point matrix multiplication. 15th Int Symp on Parallel and Distributed Processing with Applicatons, p.396–402.
|
8 |
Jouppi NP, Young C, Patil N, et al., 2017. In-datacenter performance analysis of a tensor processing unit. 44th Annual Int Symp on Computer Architecture, p.1–12.
|
9 |
Lin H, Tang XC, Yu BW, et al., 2017. Scalable graph on Sunway TaihuLight with ten million cores. 31st Int Parallel and Distributed Processing Symp, p.635–645.
|
10 |
Ozdal MM, Yesil S, Kim T, et al., 2016. Energy efficient architecture for graph analytics accelerators. 43rd Int Symp on Computer Architecture, p.166–177.
|
11 |
Pedram A, Gerstlauer A, van de Geijn RA, 2011. A highperformance, low-power linear algebra core. 22nd Int Conf on Application-specific System, Architecture and Processors, p.35–42.
|
12 |
Schulte MJ, Ignatowski M, Loh GH, et al., 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 35(4):26–36.
|
13 |
Shalf JM, Leland R, 2015. Computing beyond Moore’s law. Computer, 48(12):14–23.
|
14 |
Silbertstein M, 2017. OmniX: an accelerator-centric OS for omni-programmable systems. 16th Workshop on Hot Topics in Operating Systems, p.69–75.
|
15 |
Williams RS, 2017. What’s next? [The end of Moore’s law] Comput Sci Eng, 19(2):7–13.
|
16 |
Xu ZG, Lin J, Matsuoka S, 2017. Benchmarking SW26010 many-core processor. 31st Int Conf on Parallel and Distributed Processing Symp Workshops, p.743–752.
|
17 |
Yang C, Xue W, Fu HH, et al., 2016. 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. 29th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.57–68.
|
18 |
Zhao B, Gao W, Zhao RC, et al., 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. 1st Int Conf on Big Data Computing and Communications, p.257–272.
|
19 |
Zheng F, Zhang K, Wu GM, et al., 2014. Architecture techniques of many-core processor for energy-efficient in high performance computing. Chin J Comput, 37(10):2176–2186 (in Chinese).
|
20 |
Zheng F, Li HL, Lv H, et al., 2015. Cooperative computing techniques for a deeply fused and heterogeneous manycore processor architecture. J Comput Sci Technol, 30(1):145–162.
|
[1] |
FITEE-1224-18003-XHX_suppl_1
|
Download
|
[2] |
FITEE-1224-18003-XHX_suppl_2
|
Download
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|