|
MilkyWay-2 supercomputer: system and application
Xiangke LIAO,Liquan XIAO,Canqun YANG,Yutong LU
Front. Comput. Sci.. 2014, 8 (3): 345-356.
https://doi.org/10.1007/s11704-014-3501-3
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16-core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
References |
Related Articles |
Metrics
|
|
The TH Express high performance interconnect networks
Zhengbin PANG,Min XIE,Jun ZHANG,Yi ZHENG,Guibin WANG,Dezun DONG,Guang SUO
Front. Comput. Sci.. 2014, 8 (3): 357-366.
https://doi.org/10.1007/s11704-014-3500-9
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessor communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an uppermessage passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.
References |
Related Articles |
Metrics
|
|
Hybrid hierarchy storage system in MilkyWay-2 supercomputer
Weixia XU,Yutong LU,Qiong LI,Enqiang ZHOU,Zhenlong SONG,Yong DONG,Wei ZHANG,Dengping WEI,Xiaoming ZHANG,Haitao CHEN,Jianying XING,Yuan YUAN
Front. Comput. Sci.. 2014, 8 (3): 367-377.
https://doi.org/10.1007/s11704-014-3499-6
With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability. MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Pflops, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H2FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability.
References |
Related Articles |
Metrics
|
|
Iaso: an autonomous fault-tolerant management system for supercomputers
Kai LU,Xiaoping WANG,Gen LI,Ruibo WANG,Wanqing CHI,Yongpeng LIU,Hongwei TANG,Hua FENG,Yinghui GAO
Front. Comput. Sci.. 2014, 8 (3): 378-390.
https://doi.org/10.1007/s11704-014-3503-1
With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the “reliability wall”, which is regarded as a critical problem for current and future supercomputers. To address this problem, we propose an autonomous fault-tolerant system, named Iaso, in MilkyWay-2 system. Iaso introduces the concept of autonomous management in supercomputers. By autonomous management, the computer itself, rather than manpower, takes charge of the fault management work. Iaso automatically manage the whole lifecycle of faults, including fault detection, fault diagnosis, fault isolation, and task recovery. Iaso endows the autonomous features with MilkyWay-2 system, such as self-awareness, self-diagnosis, self-healing, and self-protection. With the help of Iaso, the cost of fault handling in supercomputers reduces from several hours to a few seconds. Iaso greatly improves the usability and reliability of MilkyWay-2 system.
References |
Related Articles |
Metrics
|
|
Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments
Najme MANSOURI
Front. Comput. Sci.. 2014, 8 (3): 391-408.
https://doi.org/10.1007/s11704-014-3146-2
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to createmultiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.
References |
Related Articles |
Metrics
|
|
Achieving high throughput and TCP Reno fairness in delay-based TCP over large networks
Jingyuan WANG,Jiangtao WEN,Yuxing HAN,Jun ZHANG,Chao LI,Zhang XIONG
Front. Comput. Sci.. 2014, 8 (3): 426-439.
https://doi.org/10.1007/s11704-014-3443-9
The transport control protocol (TCP) has been widely used in wired and wireless Internet applications such as FTP, email and HTTP. Numerous congestion avoidance algorithms have been proposed to improve the performance of TCP in various scenarios, especially for high speed and wireless networks. Although different algorithms may achieve different performance improvements under different network conditions, designing a congestion algorithm that can perform well across a wide spectrum of network conditions remains a great challenge. Delay-based TCP has a potential to overcome above challenges. However, the unfairness problem of delay-based TCP with TCP Reno blocks widely the eployment of delay-based TCP over wide area networks. In this paper, we proposed a novel delay-based congestion control algorithm, named FAST-FIT, which could perform gracefully in both ultra high speed networks and wide area networks, as well as keep graceful fairness with widely deployed TCP Reno hosts. FAST-FIT uses queuing delay as a primary input for controlling TCP congestion window. Packet loss is used as a secondary signal to adaptively adjust parameters of primary control process. Theoretical analysis and experimental results show that the performance of the algorithm is significantly improved as compared to other state-of-the-art algorithms, while maintaining good fairness.
References |
Related Articles |
Metrics
|
|
HAT: an efficient buffer management method for flash-based hybrid storage systems
Yanfei LV, Bin CUI, Xuexuan CHEN, Jing LI
Front. Comput. Sci.. 2014, 8 (3): 440-455.
https://doi.org/10.1007/s11704-014-3364-7
Flash solid-state drives (SSDs) provide much faster access to data compared with traditional hard disk drives (HDDs). The current price and performance of SSD suggest it can be adopted as a data buffer between main memory and HDD, and buffer management policy in such hybrid systems has attracted more and more interest from research community recently. In this paper, we propose a novel approach to manage the buffer in flash-based hybrid storage systems, named hotness aware hit (HAT). HAT exploits a page reference queue to record the access history as well as the status of accessed pages, i.e., hot, warm, and cold. Additionally, the page reference queue is further split into hot and warm regions which correspond to the memory and flash in general. The HAT approach updates the page status and deals with the page migration in the memory hierarchy according to the current page status and hit position in the page reference queue. Compared with the existing hybrid storage approaches, the proposed HAT can manage the memory and flash cache layers more effectively. Our empirical evaluation on benchmark traces demonstrates the superiority of the proposed strategy against the state-of-the-art competitors.
References |
Related Articles |
Metrics
|
|
Construction of a key-dependent message secure symmetric encryption scheme in the ideal cipher model
Qiqi LAI,Yuan CHEN,Yupu HU,Baocang WANG,Mingming JIANG
Front. Comput. Sci.. 2014, 8 (3): 469-477.
https://doi.org/10.1007/s11704-014-3062-5
Key-dependent message (KDM) security is an important security issue that has attracted much research in recent years. In this paper, we present a new construction of the symmetric encryption scheme in the the ideal cipher model (ICM); we prove that our scheme is KDM secure against active attacks with respect to arbitrary polynomialtime challenge functions. Our main idea is to introduce a universal hash function (UHF) h as a random value for each encryption, and then use s = h(sk) as the key of the ideal cipher F, where sk is the private key of our symmetric encryption scheme. Although many other schemes that are secure against KDM attacks have already been proposed, in both the ideal standard models, the much more significance of our paper is the simplicity in which we implement KDM security against active attacks.
References |
Related Articles |
Metrics
|
|
Known-key distinguishers on type-1 Feistel scheme and near-collision attacks on its hashing modes
Le DONG,Wenling WU,Shuang WU,Jian ZOU
Front. Comput. Sci.. 2014, 8 (3): 513-525.
https://doi.org/10.1007/s11704-014-2412-7
We present some known-key distinguishers for a type-1 Feistel scheme with a permutation as the round function. To be more specific, the 29-round known-key truncated differential distinguishers are given for the 256-bit type-1 Feistel scheme with an SP (substitution-permutation) round function by using the rebound attack, where the S–boxes have perfect differential and linear properties and the linear diffusion layer has a maximum branch number. For two 128-bit versions, the distinguishers can be applied on 25-round structures. Based on these distinguishers, we construct near-collision attacks on these schemes with MMO (Matyas-Meyer-Oseas) and MP (Miyaguchi-Preneel) hashing modes, and propose the 26-round and 22-round near-collision attacks for two 256-bit schemes and two 128-bit schemes, respectively. We apply the near-collision attack on MAME and obtain a 26-round near-collision attack. Using the algebraic degree and some integral properties, we prove the correctness of the 31-round known-key integral distinguisher proposed by Sasaki et al. We show that if the round function is a permutation, the integral distinguisher is suitable for a type-1 Feistel scheme of any size.
References |
Related Articles |
Metrics
|
15 articles
|