Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (1) : 67-83    https://doi.org/10.1007/s11704-019-7290-6
RESEARCH ARTICLE
Ordinal factorization machine with hierarchical sparsity
Shaocheng GUO1, Songcan CHEN1(), Qing TIAN2
1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210000, China
2. School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
 Download: PDF(822 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Ordinal regression (OR) or classification is a machine learning paradigm for ordinal labels. To date, there have been a variety of methods proposed including kernel based and neural network based methods with significant performance. However, existing OR methods rarely consider latent structures of given data, particularly the interaction among covariates, thus losing interpretability to some extent. To compensate this, in this paper, we present a new OR method: ordinal factorization machine with hierarchical sparsity (OFMHS), which combines factorization machine and hierarchical sparsity together to explore the hierarchical structure behind the input variables. For the sake of optimization, we formulate OFMHS as a convex optimization problem and solve it by adopting the efficient alternating directions method of multipliers (ADMM) algorithm. Experimental results on synthetic and real datasets demonstrate the superiority of our method in both performance and significant variable selection.

Keywords ordinal regression      factorization machine      hierarchical sparsity      interaction modelling     
Corresponding Author(s): Songcan CHEN   
Just Accepted Date: 19 December 2018   Online First Date: 26 March 2019    Issue Date: 24 September 2019
 Cite this article:   
Shaocheng GUO,Songcan CHEN,Qing TIAN. Ordinal factorization machine with hierarchical sparsity[J]. Front. Comput. Sci., 2020, 14(1): 67-83.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-7290-6
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I1/67
1 T Y Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 2009, 3(3): 225–331
https://doi.org/10.1561/1500000016
2 S K Lee, Y H Cho, S H Kim. Collaborative filtering with ordinal scalebased implicit ratings for mobile music recommendations. Information Sciences, 2010, 180(11): 2142–2155
https://doi.org/10.1016/j.ins.2010.02.004
3 M Kim, V Pavlovic. Structured output ordinal regression for dynamic facial emotion intensity prediction. In: Proceedings of European Conference on Computer Vision. 2010, 649–662
https://doi.org/10.1007/978-3-642-15558-1_47
4 O Rudovic, V Pavlovic, M Pantic. Multi-output laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. In: Proceedings of the the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2634–2641
https://doi.org/10.1109/CVPR.2012.6247983
5 S Kramer, G Widmer, B Pfahringer, M De Groeve. Prediction of ordinal classes using regression trees. Fundamenta Informaticae, 2001, 14(1–2): 1–13
6 S Kotsiantis, P Pintelas. A cost sensitive technique for ordinal classifi- cation problems. In: Proceedings of the Hellenic Conference on Artifi- cial Intelligence. 2004, 220–229
7 H T Lin, L Li. Reduction from cost-sensitive ordinal ranking to weighted binary classification. Neural Computation, 2012, 24(5): 1329–1367
https://doi.org/10.1162/NECO_a_00265
8 W Waegeman, L Boullart. An ensemble of weighted support vector machines for ordinal regression. Transactions on Engineering, Computing and Technology, 2006, 12(3): 71–75
9 K Y Chang, C S Chen, Y P Hung. Ordinal hyperplanes ranker with cost sensitivities for age estimation. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. 2011, 585–592
https://doi.org/10.1109/CVPR.2011.5995437
10 W Chu, S Keerthi. Support vector ordinal regression. Neural Computation, 2007, 19(3): 792–815
https://doi.org/10.1162/neco.2007.19.3.792
11 B Y Sun, J, Li D Wu Dash, X M Zhang, W B Li. Kernel discriminant learning for ordinal regression. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(6): 906–910
https://doi.org/10.1109/TKDE.2009.170
12 W Chu, Z Ghahramani. Gaussian processes for ordinal regression. Journal of Machine Learning Research, 2005, 6(7): 1019–1041
13 R Duda, P Hart, D Stork. Pattern Classification. John Wiley & Sons, 2012
14 S Rendle. Factorization machines. In: Proceedings of the 10th International Conference on Data Mining. 2010, 995–1000
https://doi.org/10.1109/ICDM.2010.127
15 M Yamada, W Lian, A Goyal, J Chen, K Wimalawarne, S Khan, S Kaski, H Mamitsuka, Y Chang. Convex factorization machine for regression. 2015, arXiv preprint arXiv:1507.01073
16 M Blondel, A Fujino, N Ueda. Convex factorization machines. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2015, 19–35
https://doi.org/10.1007/978-3-319-23525-7_2
17 K Fukunaga. Introduction to Statistical Pattern Recognition. Elsevier, 2013
18 J Bien, J Taylor, R Tibshirani. A lasso for hierarchical interactions. Annals of Statistics, 2012, 41(3): 1111–1141
https://doi.org/10.1214/13-AOS1096
19 X Yan, J Bien. Hierarchical sparse modeling: a choice of two regularizers. 2015, arXiv preprint arXiv:1512.01631
20 M Yuan, V R Joseph, H Zou. Structured variable selection and estimation. The Annals of Applied Statistics, 2009, 3(4): 1738–1757
https://doi.org/10.1214/09-AOAS254
21 A Haris, D Witten, N Simon. Convex modeling of interactions with strong heredity. Journal of Computational and Graphical Statistics, 2016, 25(4): 981–1004
https://doi.org/10.1080/10618600.2015.1067217
22 P Zhao, G Rocha, B Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 2009, 37(6A): 3468–3497
https://doi.org/10.1214/07-AOS584
23 P Radchenko, G M James. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association, 2011, 105(492): 1541–1553
https://doi.org/10.1198/jasa.2010.tm10130
24 S Boyd, N Parikh, E Chu, B Peleato , J Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011, 3(1): 1–122
https://doi.org/10.1561/2200000016
25 M Blondel, A Fujino, N Ueda, M Ishihata. Higher-order factorization machines. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, 3359–3368
26 M Blondel, M Ishihata, A Fujino, N Ueda. Polynomial networks and factorization machines: new insights and efficient training algorithms. In: Proceedings of the International Conference on Machine Learning. 2016, 850–858
27 L Jacob, G Obozinski, J P Vert. Group lasso with overlap and graph lasso. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 433–440
https://doi.org/10.1145/1553374.1553431
28 Y She, Z Wang, H Jiang. Group regularized estimation under structural hierarchy. Journal of the American Statistical Association, 2018, 113(521): 445–454
https://doi.org/10.1080/01621459.2016.1260470
29 M Lim, T Hastie. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 2015, 24(3): 627–654
https://doi.org/10.1080/10618600.2014.938812
30 F Bach, R Jenatton, J Mairal, G Obozinski. Structured sparsity through convex optimization. Statistical Science, 2012, 27(4): 450–468
https://doi.org/10.1214/12-STS394
31 A Beck, M Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2009, 2(1): 183–202
https://doi.org/10.1137/080716542
32 C Lu, C Zhu, C Xu, S Yan, Z Lin. Generalized singular value thresholding. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 1805–1811
33 J F Cai, E J Candès, Z Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010, 20(4): 1956–1982
https://doi.org/10.1137/080738970
34 P Gutierrez, M Perezortiz, J Sanchezmonedero, F Fernandeznavarro, C Hervasmartinez. Ordinal regression methods: survey and experimental study. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 127–146
https://doi.org/10.1109/TKDE.2015.2457911
35 B Xu, J Bu, C Chen, D Cai. An exploration of improving collaborative recommender systems via user-item subgroups. In: Proceedings of the International Conference on World Wide Web. 2012, 21–30
https://doi.org/10.1145/2187836.2187840
36 S Rendle. Factorization machines with libFM. ACM Transactions on Intelligent Systems and Technology (TIST), 2012, 3(3): 57
https://doi.org/10.1145/2168752.2168771
37 S Y Rhee, J Taylor, G Wadhera, A Benhur, D L Brutlag, R W Shafer. Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proceedings of the National Academy of Sciences, 2006, 103(46): 17355–17360
https://doi.org/10.1073/pnas.0607274103
38 Z Kang, C Peng, Q Cheng. Robust PCA via nonconvex rank approximation. In: Proceedings of the International Conference on Data Mining (ICDM). 2015, 211–220
https://doi.org/10.1109/ICDM.2015.15
[1] Article highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed