Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2022, Vol. 16 Issue (4) : 164321    https://doi.org/10.1007/s11704-021-0445-2
RESEARCH ARTICLE
DRPS: efficient disk-resident parameter servers for distributed machine learning
Zhen SONG1, Yu GU1(), Zhigang WANG2, Ge YU1
1. School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
2. College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
 Download: PDF(14299 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident parameter server system named DRPS, which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk. To further improve the performance of DRPS, we build an efficient index structure for parameters to reduce the disk I/O cost. Based on this index structure, we propose a novel multi-objective partitioning algorithm for the parameters. Finally, a flexible workerselection parallel model of computation (WSP) is proposed to strike a right balance between the problem of inconsistent parameter versions (staleness) and that of inconsistent execution progresses (straggler). Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.

Keywords parameter servers      machine learning      disk resident      parallel model     
Corresponding Author(s): Yu GU   
Just Accepted Date: 14 April 2021   Issue Date: 01 December 2021
 Cite this article:   
Zhen SONG,Yu GU,Zhigang WANG, et al. DRPS: efficient disk-resident parameter servers for distributed machine learning[J]. Front. Comput. Sci., 2022, 16(4): 164321.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-021-0445-2
https://academic.hep.com.cn/fcs/EN/Y2022/V16/I4/164321
Fig.1  Model index example
Fig.2  Parameter partition example
Fig.3  
Fig.4  
Fig.5  Distributed index establishment
Fig.6  Parameter partitioning method based on communication cost and disk I/O cost
Fig.7  
Fig.8  The drawback of SSP
Fig.9  The execution process of WSP
Fig.10  
Dataset #Dims MSize #Examples ESize
Matrix 10M ×10K 298GB 1.3B 36.2GB
MovieLens 0.26M ×0.16M 7.8GB 24M 663MB
Classification 10B 186GB 10M 35GB
Avazu-CTR 9M 85MB 40M 6.3GB
Tab.1  Detailed information of datasets
Fig.11  Indexing and training time. The first bar of each graph represents our indexing approach; the second one stands for the basic method; the last one is the algorithm with no-index. (a) LoR on Avazu-CTR; (b) LoR on classification; (c) LMF on MovieLens; (d) LMF on matrix
Fig.12  Multiple indicators for comparison strategies. Note that the DRPS is an extension based on the existing PS, and the state-of-the-art optimazations on PS can be applied in the DRPS for further performance improvement. Therefore, we choose the DRPS as a benchmark. (a) LMF on matrix; (b) LMF on MovieLens; (c) LoR on classification; (d) LoR on Avazu-CTR
Fig.13  Convergence comparison of parallel models. (a) LMF on matrix; (b) LMF on MovieLens; (c) LoR on classification; (d) LoR on Avazu-CTR
DRPS Indexing Partitioning BSP ASP SSP WSP TMat/min TMov/min TClas/min TAza/min
1468 93 260 273
1268 82 232 238
1125 75 214 227
1135 78 232 236
979 62 212 219
864 50 182 212
Tab.2  Convergence time of optimization methods for DRPS
1 M Li, D G Andersen, J W Park, A J Smola, A Ahmed, V Josifovski, J Long, E J Shekita, B Y Su. Scaling distributed machine learning with the parameter server. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 583– 598
2 T Q Chen, M Li, Y T Li, M Lin, N Y Wang, M J Wang, T J Xiao, B Xu, C Y Zhang, Z Zhang. MXNet: a flexible and efficient machine learning library for heterogeneous distributed system. 2015, arXiv preprint arXiv: 1512.01274
3 E P Xing, Q R Ho, W Dai, J K Kim, J L Wei, S H Lee, X Zheng, P T Xie, A Kumar, Y L Yu. Petuum: a new platform for distributed machine learning on big data. In: Proceedings of ACM Conference on Knowledge Discovery and Data Mining. 2015, 1335−1344
4 M Abadi, P Barham, J M Chen, Z F Chen, A Davis, J Dean, M Devin, S Ghemawat, G Irving, M Isard, M Kudlur, J Levenberg, R Monga, S Moore, D G Murray, B Steiner, P A Tucker, V Vasudevan, P Warden, M Wicke, Y Yu, X Q Zheng. TensorFlow: a system for large-scale machine learning. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 265– 283
5 B Recht, C Re, S J Wright, F Niu. Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceeding of the 24th International Conference on Neural Information Processing Systems. 2011, 693– 701
6 D Xin, S Macke, L T Ma, J L Liu, S C Song, A G Parameswaran. Helix: holistic optimization for accelerating iterative machine learning. Proceedings of the VLDB Endowment, 2018, 12(4): 446– 460
7 Y Z Huang, T Jin, Y D Wu, Z K Cai, X Yan, F Yang, J F Li, Y Y Guo, J Cheng. FlexPS: flexible parallelism control in parameter server architecture. Proceedings of the VLDB Endowment, 2018, 11(5): 566– 579
8 Z P Zhang, B Cui, Y X Shao, L L Yu, J W Jiang, X P Miao. PS2: parameter server on spark. In: Proceedings of ACM Conference on Management of Data. 2019, 376– 388
9 M Zaharia, M Chowdhury, M J Franklin, S Shenker, I Stoica. Spark: cluster computing with working sets. In: Proceedings of USENIX Workshop on Hot Topics in Cloud Computing. 2010, 1– 7
10 M Cho, U Finkler, D S Kung, H C Hunter. BlueConnect: decomposing all-reduce for deep learning on heterogeneous network hierarchy. In: Proceedings of Conference on Machine Learning and Systems. 2019, 1– 11
11 F Yang, J F Li, J Cheng. Husky: towards a more efficient and expressive distributed computing framework. Proceedings of the VLDB Endowment, 2016, 9(5): 420– 431
12 Y M Jiang, Y B Zhu, C Lan, B Yi, Y Cui, C X Guo. A unified architecture for accelerating distributed dnn training in heterogeneous gpu/cpu clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2020, 463– 479
13 Z G Wang, Y Gu, Y B Bao, G Yu, J X Yu. Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing. In: Proceedings of ACM Conference on Management of Data. 2016, 479– 494
14 C J Qin, M Torres, F Rusu. Scalable asynchronous gradient descent optimization for out-of-core models. Proceedings of the VLDB Endowment, 2017, 10(10): 986– 997
15 M Li, D G Andersen, A J Smola. Graph partitioning via parallel submodular approximation to accelerate distributed machine learning. 2015, arXiv preprint arXiv: 1505.04636
16 A Renz-Wieland, R Gemulla, S Zeuch, V Markl. Dynamic parameter allocation in parameter servers. Proceedings of the VLDB Endowment, 2020, 13(12): 1877−1890
17 Y R Chen, Y H Peng, Y X Bao, C Wu, Y B Zhu, C X Guo. Elastic parameter server load distribution in deep learning clusters. In: Proceedings of ACM Symposium on Cloud Computing. 2020, 507– 521
18 B Gallet , M Gowanlock . Heterogeneous cpu-gpu epsilon grid joins: static and dynamic work partitioning strategies. Data Science and Engineering, 2021, 6( 1): 39– 62
19 L G Valiant . A bridging model for parallel computation. Communications of the ACM, 1990, 33( 8): 103– 111
20 Q R Ho, J Cipar, H G Cui, S H Lee, J K Kim, P B Gibbons, G A Gibson, G R Ganger, E P Xing. More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 1223−1231
21 M Li, D G Andersen, A J Smola, K Yu. Communication efficient distributed machine learning with the parameter server. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 19– 27
22 W F Fan, P Lu, X J Luo, J B Xu, Q Yin, W Y Yu, R Q Xu. Adaptive asynchronous parallelization of graph algorithms. In: Proceedings of the International Conference on Management of Data. 2018, 1141−1156
23 J W Jiang, B Cui, C Zhang, L L Yu. Heterogeneity-aware distributed parameter servers. In: Proceedings of the ACM International Conference on Management of Data. 2017, 463– 478
24 Z G Wang, L X Gao, Y Gu, Y B Bao, G Yu. FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud. In: Proceedings of the Symposium on Cloud Computing. 2017, 1– 14
[1] Zhengxiong HOU, Hong SHEN, Xingshe ZHOU, Jianhua GU, Yunlan WANG, Tianhai ZHAO. Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions[J]. Front. Comput. Sci., 2022, 16(5): 165107-.
[2] Yu OU, Lang LI. Side-channel analysis attacks based on deep learning network[J]. Front. Comput. Sci., 2022, 16(2): 162303-.
[3] Xinyu TONG, Ziao YU, Xiaohua TIAN, Houdong GE, Xinbing WANG. Improving accuracy of automatic optical inspection with machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161310-.
[4] Suyu MEI. A framework combines supervised learning and dense subgraphs discovery to predict protein complexes[J]. Front. Comput. Sci., 2022, 16(1): 161901-.
[5] Yi REN, Ning XU, Miaogen LING, Xin GENG. Label distribution for multimodal machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161306-.
[6] Xiaobing SUN, Tianchi ZHOU, Rongcun WANG, Yucong DUAN, Lili BO, Jianming CHANG. Experience report: investigating bug fixes in machine learning frameworks/libraries[J]. Front. Comput. Sci., 2021, 15(6): 156212-.
[7] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[8] Yan-Ping SUN, Min-Ling ZHANG. Compositional metric learning for multi-label classification[J]. Front. Comput. Sci., 2021, 15(5): 155320-.
[9] Jian SUN, Pu-Feng DU. Predicting protein subchloroplast locations: the 10th anniversary[J]. Front. Comput. Sci., 2021, 15(2): 152901-.
[10] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[11] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[12] Yu-Feng LI, De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
[13] Wenhao ZHENG, Hongyu ZHOU, Ming LI, Jianxin WU. CodeAttention: translating source code to comments by exploiting the code constructs[J]. Front. Comput. Sci., 2019, 13(3): 565-578.
[14] Hao SHAO. Query by diverse committee in transfer active learning[J]. Front. Comput. Sci., 2019, 13(2): 280-291.
[15] Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed