Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2020, Vol. 14 Issue (4): 144304   https://doi.org/10.1007/s11704-019-8390-z
  本期目录
A revisit to MacKay algorithm and its application to deep network compression
Chune LI1, Yongyi MAO2, Richong ZHANG1(), Jinpeng HUAI1
1. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
2. School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa K1N6N5, Canada
 全文: PDF(473 KB)  
Abstract

An iterative procedure introduced in MacKay’s evidence framework is often used for estimating the hyperparameter in empirical Bayes. Together with the use of a particular form of prior, the estimation of the hyperparameter reduces to an automatic relevance determination model, which provides a soft way of pruning model parameters. Despite the effectiveness of this estimation procedure, it has stayed primarily as a heuristic to date and its application to deep neural network has not yet been explored. This paper formally investigates the mathematical nature of this procedure and justifies it as a well-principled algorithm framework, which we call the MacKay algorithm. As an application, we demonstrate its use in deep neural networks, which have typically complicated structure with millions of parameters and can be pruned to reduce the memory requirement and boost computational efficiency. In experiments, we adopt MacKay algorithm to prune the parameters of both simple networks such as LeNet, deep convolution VGG-like networks, and residual netowrks for large image classification task. Experimental results show that the algorithm can compress neural networks to a high level of sparsity with little loss of prediction accuracy, which is comparable with the state-of-the-art.

Key wordsdeep learning    MacKay algorithm    model compression    neural network
收稿日期: 2018-11-12      出版日期: 2020-03-11
Corresponding Author(s): Richong ZHANG   
 引用本文:   
. [J]. Frontiers of Computer Science, 2020, 14(4): 144304.
Chune LI, Yongyi MAO, Richong ZHANG, Jinpeng HUAI. A revisit to MacKay algorithm and its application to deep network compression. Front. Comput. Sci., 2020, 14(4): 144304.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-019-8390-z
https://academic.hep.com.cn/fcs/CN/Y2020/V14/I4/144304
1 C Li, Y Mao, R Zhang, J Huai. On hyper-parameter estimation in empirical Bayes: a revisit of the MacKay algorithm. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence. 2016, 477–486
2 Y LeCun, Y Bengio, G Hinton. Deep learning. Nature, 2015, 521(7553): 436
https://doi.org/10.1038/nature14539
3 V Mnih, K Kavukcuoglu, D Silver, A A Rusu, J Veness, M G Bellemare, A Graves, M Riedmiller, A K Fidjeland, G Ostrovski, et al.. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529
https://doi.org/10.1038/nature14236
4 C M Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2016
5 D J MacKay. The evidence framework applied to classification networks. Neural Computation, 1992, 4(5): 720–736
https://doi.org/10.1162/neco.1992.4.5.720
6 D J MacKay, R M Neal. Automatic relevance determination for neural networks. Technical Report in Preparation, Cambridge University, 1994
7 D J MacKay. Probable networks and plausible predictions: a review of practical Bayesian methods for supervised neural networks. Network Computation in Neural Systems, 1995, 6(3): 469–505
https://doi.org/10.1088/0954-898X_6_3_011
8 C M Bishop. PCA Bayesian. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1999, 382–388
9 M E Tipping. Sparse Bayesian learning and the relevance vector machine. The Journal of Machine Learning Research, 2001, 1: 211–244
10 V Y Tan, C Févotte. Automatic relevance determination in nonnegative matrix factorization. In: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations. 2009
11 D J MacKay. Bayesian interpolation. Neural Computation, 1992, 4(3): 415–447
https://doi.org/10.1162/neco.1992.4.3.415
12 D J MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448–472
https://doi.org/10.1162/neco.1992.4.3.448
13 J Solomon. Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. CRC Press, 2015
14 K P Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012
15 K Cho, B Van Merriënboer, C Gulcehre, D Bahdanau, F Bougares, H Schwenk, Y Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1724–1734
https://doi.org/10.3115/v1/D14-1179
16 Y Kim. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014, 1746–1751
https://doi.org/10.3115/v1/D14-1181
17 A Krizhevsky, I Sutskever, G E Hinton. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1097–1105
18 K Simonyan, A Zisserman. Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv:1409.1556
19 K He, X Zhang, S Ren, J Sun. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
https://doi.org/10.1109/CVPR.2016.90
20 D Tran, L Bourdev, R Fergus, L Torresani, M Paluri. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 4489–4497
https://doi.org/10.1109/ICCV.2015.510
21 N Srivastava, E Mansimov, R Salakhudinov. Unsupervised learning of video representations using LSTMs. In: Proceedings of the International Conference on Machine Learning. 2015, 843–852
22 L Deng, D Yu. Deep learning: methods and applications. Foundations and Trends in Signal Processing, 2014, 7(3–4): 197–387
https://doi.org/10.1561/2000000039
23 O Russakovsky, J Deng, H Su, J Krause, S Satheesh, S Ma, Z Huang, A Karpathy, A Khosla, M Bernstein, A C Berg, L Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252
https://doi.org/10.1007/s11263-015-0816-y
24 S Han, H Mao, W J Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. 2015, arXiv preprint arXiv:1510.00149
25 H Li, A Kadav, I Durdanovic, H Samet, H P Graf. Pruning filters for efficient convnets. 2016, arXiv preprint arXiv:1608.08710
26 Z Liu, J Li, Z Shen, G Huang, S Yan, C Zhang. Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2755–2763
https://doi.org/10.1109/ICCV.2017.298
27 C Louizos, M Welling, D P Kingma. Learning sparse neural networks through l0 regularization. In: Proceedings of International Conference on Learning Representations. 2018
28 D Molchanov, A Ashukha, D Vetrov. Variational dropout sparsifies deep neural networks. In: Proceedings of the International Conference on Machine Learning. 2017, 2498–2507
29 K Neklyudov, D Molchanov, A Ashukha, D P Vetrov. Structured Bayesian pruning via log-normal multiplicative noise. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6775–6784
30 B Dai, C Zhu, B Guo, D Wipf. Compressing neural networks using the variational information bottleneck. In: Proceedings of the International Conference on Machine Learning. 2018, 1143–1152
31 C Louizos, K Ullrich, M Welling. Bayesian compression for deep learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 3290–3300
32 T Karaletsos, G Rätsch. Automatic relevance determination for deep generative models. 2015, arXiv preprint arXiv:1505.07765
33 S P Chatzis. Sparse Bayesian recurrent neural networks. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2015, 359–372
https://doi.org/10.1007/978-3-319-23525-7_22
34 A Krizhevsky, G Hinton. Learning multiple layers of features from tiny images. Technical Report, Citeseer, 2009
35 Y LeCun, L Bottou, Y Bengio, P Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
https://doi.org/10.1109/5.726791
36 K He, X Zhang, S Ren, J Sun. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034
https://doi.org/10.1109/ICCV.2015.123
37 D P Kingma, J Ba. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv:1412.6980
38 X Dong, J Huang, Y Yang, S Yan. More is less: a more complicated network with less inference complexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5840–5848
https://doi.org/10.1109/CVPR.2017.205
39 Y He, X Zhang, J Sun. Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017
https://doi.org/10.1109/ICCV.2017.155
40 Y He, G Kang, X Dong, Y Fu, Y Yang. Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 2234–2240
https://doi.org/10.24963/ijcai.2018/309
41 A A Alemi, I Fischer, J V Dillon, K Murphy. Deep variational information bottleneck. 2016, arXiv preprint arXiv:1612.00410
[1] FCS-0004-18390-CL_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed