Abstract This paper introduces the design of a hyper parallel processing (HPP) controller, which is a system controller used in heterogeneous high performance computing systems. It connects several heterogeneous processors via HyperTransport (HT) interfaces, a commercial Infiniband HCA card with PCI-express interface, and a customized global synchronization network with self-defined high-speed interface. To accelerate intra-node communication and synchronization, global address space is supported and some dedicated hardware is integrated in the HPP controller to enable intra-node memory and shared I/O resources. On the prototype system with the HPP controller, evaluation results show that the proposed design achieves high communication efficiency, and obvious acceleration to synchronization operations.
Sun N H, Li K, Chen M Y. HPP: an architecture for high performance and utilitycomputing. Chinese Journal of Computers, 2008, 31(9): 1503―1508 doi: 10.3724/SP.J.1016.2008.01503
Charlesworth A, Aneshansley N, Haakmeester M, et al. The Starfire SMP Interconnect. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, 1997 doi: 10.1145/509593.509630
Laudon J, Lenoski D. The SGI origin: a ccNUMAhighly scalable server. In: Proceedingsof the 24th annual International Symposium on Computer Architecture. 1997, 241―251
Agarwal A, Bianchini R, Chaiken D, et al. The MIT Alewife machine: architecture and performance. In: Proceedings of 22nd Annual International Symposiumon Computer architecture. 1995, 2―13
Scott S L. Synchronization and communication in the T3E multiprocessor. In: Proceedings of the 7th international conferenceon architectural support for programming languages and operating systems,Cambridge, Massachusetts, United States. 1996, 26―36
Gara A, Blumrich M A, Chen D, et al. Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development, 2005, 49(2): 195―212 doi: 10.1147/rd.492.0195
Cao Z, Xu J, Chen M, et al. HPPNetSim: a parallel simulation of large-scaleinterconnection network. In: Proceedingsof 42nd Annual Simulation Symposium. 2009
Li Q, Zhang P, Sun N. HPP-Controller: an intra-node controller designed forconnecting heterogeneous CPUs. In: Proceedingsof 2009 Cluster Computing and Workshops. 2009, 1―4 doi: 10.1109/CLUSTR.2009.5289136
Zhu W, Sreedhar V C, Hu Z, et al. Synchronization state buffer: supporting efficientfine-grain synchronization on many-core architectures.In: Proceedings of 34th International Symposium onComputer Architecture. San Diego, California, USA, 2007, 35―45
Chen D K, Su H M, Yew P C. The impact of synchronization and granularity on parallelsystems. In: Proceedings of 17th AnnualInternational Symposium on Computer Architecture. 1990, 239―248 doi: 10.1109/ISCA.1990.134531
Byrd G T, Flynn M J. Producer-consumer communicationin distributed shared memory multiprocessors. Proceedings of the IEEE, 1999, 87(3): 456―466 doi: 10.1109/5.747866
Vlassov V, Merino O, Moritz, C, et al. Support for fine-grained synchronization inshared-memory multiprocessors. In: Proceedingsof 9th International Conference on Parallel Computing Technologies. 2007, 453―467
Fide S, Jenks S. Architecture optimizationsfor synchronization and communication on chip multiprocessors. In: Proceedings of IEEE International Symposiumon Parallel and Distributed Processing. 2008 doi: 10.1109/IPDPS.2008.4536357
Yu L, Liu Z Y, Fan D R, et al. Study on fine-grained synchronization in many-corearchitecture. In: Proceedings of 10th ACISInternational Conference on Software Engineering, Artificial Intelligences,Networking and Parallel/Distributed Computing. 2009, 524―529 doi: 10.1109/SNPD.2009.61
Mellor-Crummey J, Scott M L. Algorithms for scalable synchronizationon shared-memory multiprocessors. ACM Transactionson Computer Systems, 1991, 9(1): 21―65 doi: 10.1145/103727.103729