DRPS: efficient disk-resident parameter servers for distributed machine learning

doi:10.1007/s11704-021-0445-2

Frontiers of Computer Science

2022, Vol. 16

Issue (4): 164321 https://doi.org/10.1007/s11704-021-0445-2

本期目录

DRPS: efficient disk-resident parameter servers for distributed machine learning

Zhen SONG¹, Yu GU¹(

), Zhigang WANG², Ge YU¹

¹. School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
². College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China

全文: PDF(14299 KB) HTML

Abstract：

Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident parameter server system named DRPS, which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk. To further improve the performance of DRPS, we build an efficient index structure for parameters to reduce the disk I/O cost. Based on this index structure, we propose a novel multi-objective partitioning algorithm for the parameters. Finally, a flexible workerselection parallel model of computation (WSP) is proposed to strike a right balance between the problem of inconsistent parameter versions (staleness) and that of inconsistent execution progresses (straggler). Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.

Key words： parameter servers machine learning disk resident parallel model

收稿日期: 2020-09-06 出版日期: 2021-12-01

Corresponding Author(s): Yu GU

引用本文:

. [J]. Frontiers of Computer Science, 2022, 16(4): 164321.
Zhen SONG, Yu GU, Zhigang WANG, Ge YU. DRPS: efficient disk-resident parameter servers for distributed machine learning. Front. Comput. Sci., 2022, 16(4): 164321.

链接本文:

https://academic.hep.com.cn/fcs/CN/10.1007/s11704-021-0445-2
https://academic.hep.com.cn/fcs/CN/Y2022/V16/I4/164321

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Dataset	#Dims	MSize	#Examples	ESize
Matrix	10M $×$ 10K	298GB	1.3B	36.2GB
MovieLens	0.26M $×$ 0.16M	7.8GB	24M	663MB
Classification	10B	186GB	10M	35GB
Avazu-CTR	9M	85MB	40M	6.3GB

Tab.1

Fig.11

Fig.12

Fig.13

DRPS	Indexing	Partitioning	BSP	ASP	SSP	WSP	$T M a t / m i n$	$T M o v / m i n$	$T C l a s / m i n$	$T A z a / m i n$
√			√				1468	93	260	273
√	√		√				1268	82	232	238
√	√	√	√				1125	75	214	227
√	√	√		√			1135	78	232	236
√	√	√			√		979	62	212	219
√	√	√				√	864	50	182	212

Tab.2

1	M Li, D G Andersen, J W Park, A J Smola, A Ahmed, V Josifovski, J Long, E J Shekita, B Y Su. Scaling distributed machine learning with the parameter server. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 583– 598
2	T Q Chen, M Li, Y T Li, M Lin, N Y Wang, M J Wang, T J Xiao, B Xu, C Y Zhang, Z Zhang. MXNet: a flexible and efficient machine learning library for heterogeneous distributed system. 2015, arXiv preprint arXiv: 1512.01274
3	E P Xing, Q R Ho, W Dai, J K Kim, J L Wei, S H Lee, X Zheng, P T Xie, A Kumar, Y L Yu. Petuum: a new platform for distributed machine learning on big data. In: Proceedings of ACM Conference on Knowledge Discovery and Data Mining. 2015, 1335−1344
4	M Abadi, P Barham, J M Chen, Z F Chen, A Davis, J Dean, M Devin, S Ghemawat, G Irving, M Isard, M Kudlur, J Levenberg, R Monga, S Moore, D G Murray, B Steiner, P A Tucker, V Vasudevan, P Warden, M Wicke, Y Yu, X Q Zheng. TensorFlow: a system for large-scale machine learning. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 265– 283
5	B Recht, C Re, S J Wright, F Niu. Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceeding of the 24th International Conference on Neural Information Processing Systems. 2011, 693– 701
6	D Xin, S Macke, L T Ma, J L Liu, S C Song, A G Parameswaran. Helix: holistic optimization for accelerating iterative machine learning. Proceedings of the VLDB Endowment, 2018, 12(4): 446– 460
7	Y Z Huang, T Jin, Y D Wu, Z K Cai, X Yan, F Yang, J F Li, Y Y Guo, J Cheng. FlexPS: flexible parallelism control in parameter server architecture. Proceedings of the VLDB Endowment, 2018, 11(5): 566– 579
8	Z P Zhang, B Cui, Y X Shao, L L Yu, J W Jiang, X P Miao. PS2: parameter server on spark. In: Proceedings of ACM Conference on Management of Data. 2019, 376– 388
9	M Zaharia, M Chowdhury, M J Franklin, S Shenker, I Stoica. Spark: cluster computing with working sets. In: Proceedings of USENIX Workshop on Hot Topics in Cloud Computing. 2010, 1– 7
10	M Cho, U Finkler, D S Kung, H C Hunter. BlueConnect: decomposing all-reduce for deep learning on heterogeneous network hierarchy. In: Proceedings of Conference on Machine Learning and Systems. 2019, 1– 11
11	F Yang, J F Li, J Cheng. Husky: towards a more efficient and expressive distributed computing framework. Proceedings of the VLDB Endowment, 2016, 9(5): 420– 431
12	Y M Jiang, Y B Zhu, C Lan, B Yi, Y Cui, C X Guo. A unified architecture for accelerating distributed dnn training in heterogeneous gpu/cpu clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2020, 463– 479
13	Z G Wang, Y Gu, Y B Bao, G Yu, J X Yu. Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing. In: Proceedings of ACM Conference on Management of Data. 2016, 479– 494
14	C J Qin, M Torres, F Rusu. Scalable asynchronous gradient descent optimization for out-of-core models. Proceedings of the VLDB Endowment, 2017, 10(10): 986– 997
15	M Li, D G Andersen, A J Smola. Graph partitioning via parallel submodular approximation to accelerate distributed machine learning. 2015, arXiv preprint arXiv: 1505.04636
16	A Renz-Wieland, R Gemulla, S Zeuch, V Markl. Dynamic parameter allocation in parameter servers. Proceedings of the VLDB Endowment, 2020, 13(12): 1877−1890
17	Y R Chen, Y H Peng, Y X Bao, C Wu, Y B Zhu, C X Guo. Elastic parameter server load distribution in deep learning clusters. In: Proceedings of ACM Symposium on Cloud Computing. 2020, 507– 521
18	B Gallet , M Gowanlock . Heterogeneous cpu-gpu epsilon grid joins: static and dynamic work partitioning strategies. Data Science and Engineering, 2021, 6( 1): 39– 62
19	L G Valiant . A bridging model for parallel computation. Communications of the ACM, 1990, 33( 8): 103– 111
20	Q R Ho, J Cipar, H G Cui, S H Lee, J K Kim, P B Gibbons, G A Gibson, G R Ganger, E P Xing. More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 1223−1231
21	M Li, D G Andersen, A J Smola, K Yu. Communication efficient distributed machine learning with the parameter server. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 19– 27
22	W F Fan, P Lu, X J Luo, J B Xu, Q Yin, W Y Yu, R Q Xu. Adaptive asynchronous parallelization of graph algorithms. In: Proceedings of the International Conference on Management of Data. 2018, 1141−1156
23	J W Jiang, B Cui, C Zhang, L L Yu. Heterogeneity-aware distributed parameter servers. In: Proceedings of the ACM International Conference on Management of Data. 2017, 463– 478
24	Z G Wang, L X Gao, Y Gu, Y B Bao, G Yu. FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud. In: Proceedings of the Symposium on Cloud Computing. 2017, 1– 14

[1]

Highlights

Download

Viewed

Full text

Abstract

Cited

Shared

Discussed