Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng Chin    2010, Vol. 5 Issue (3) : 261-273    https://doi.org/10.1007/s11460-010-0106-y
RESEARCH ARTICLE
Orthogonal nonnegative learning for sparse feature extraction and approximate combinatorial optimization
Erkki OJA(), Zhirong YANG()
Department of Information and Computer Science, Aalto Univer-sity, FI-00076 Aalto, Espoo, Finland
 Download: PDF(320 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Nonnegativity has been shown to be a powerful principle in linear matrix decompositions, leading to sparse component matrices in feature analysis and data compression. The classical method is Lee and Seung’s Nonnegative Matrix Factorization. A standard way to form learning rules is by multiplicative updates, maintaining nonnegativity. Here, a generic principle is presented for forming multiplicative update rules, which integrate an orthonormality constraint into nonnegative learning. The principle, called Orthogonal Nonnegative Learning (ONL), is rigorously derived from the Lagrangian technique. As examples, the proposed method is applied for transforming Nonnegative Matrix Factorization (NMF) and its variant, Projective Nonnegative Matrix Factorization (PNMF), into their orthogonal versions. In general, it is well-known that orthogonal nonnegative learning can give very useful approximative solutions for problems involving non-vectorial data, for example, binary solutions. Combinatorial optimization is replaced by continuous-space gradient optimization which is often computationally lighter. It is shown how the multiplicative updates rules obtained by using the proposed ONL principle can find a nonnegative and highly orthogonal matrix for an approximated graph partitioning problem. The empirical results on various graphs indicate that our nonnegative learning algorithms not only outperform those without the orthogonality condition, but also surpass other existing partitioning approaches.

Keywords Keywords nonnegative factorization      sparse feature extraction      orthogonal learning      clustering     
Corresponding Author(s): OJA Erkki,Email:erkki.oja@tkk.fi; YANG Zhirong,Email:zhirong.yang@tkk.fi   
Issue Date: 05 September 2010
 Cite this article:   
Erkki OJA,Zhirong YANG. Orthogonal nonnegative learning for sparse feature extraction and approximate combinatorial optimization[J]. Front Elect Electr Eng Chin, 2010, 5(3): 261-273.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-010-0106-y
https://academic.hep.com.cn/fee/EN/Y2010/V5/I3/261
1 Paatero P, Tapper U. Positive matrix factorization: A nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics , 1994, 5(2): 111-126
doi: 10.1002/env.3170050203
Paatero P, Tapper U. Positive matrix factorization:A nonnegative factor model with optimal utilization of error estimatesof data values. Environmetrics, 1994, 5(2): 111―126

doi: 10.1002/env.3170050203
2 Lee D D, Seung H S. Learning the parts of objects by nonnegative matrix factorization. Nature , 1999, 401(6755): 788-791
doi: 10.1038/44565
Lee D D, Seung H S. Learning the parts of objectsby nonnegative matrix factorization. Nature, 1999, 401(6755): 788―791

doi: 10.1038/44565
3 Cichocki A, Zdunek R, Phan A H, Amari S-I. Nonnegative Matrix and Tensor Factorizations. Singapore: Wiley, 2009
doi: 10.1002/9780470747278
Cichocki A, Zdunek R, Phan A H, Amari S-I. NonnegativeMatrix and Tensor Factorizations. Singapore: Wiley, 2009

doi: 10.1002/9780470747278
4 Ding C, Li T, Peng W, Park H. Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 2006, 126-135
Ding C, Li T, Peng W, Park H. Orthogonalnonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining. 2006, 126―135
Yang Z, Laaksonen J. Multiplicative updates fornonnegative projections. Neurocomputing, 2007, 71(1―3): 363―373

doi: 10.1016/j.neucom.2006.11.023
5 Yang Z, Laaksonen J. Multiplicative updates for nonnegative projections. Neurocomputing , 2007, 71(1-3): 363-373
doi: 10.1016/j.neucom.2006.11.023
Choi S. Algorithmsfor orthogonal nonnegative matrix factorization. In: Proceedings of IEEE International Joint Conference on Neural Networks. 2008, 1828―1832
6 Choi S. Algorithms for orthogonal nonnegative matrix factorization. In: Proceedings of IEEE International Joint Conference on Neural Networks . 2008, 1828-1832
Ding C, He X, Simon H D. On the equivalence of nonnegative matrix factorizationand spectral clustering. In: Proceedingsof SIAM International Conference of Data Mining. 2005, 606―610
7 Ding C, He X, Simon H D. On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM International Conference of Data Mining . 2005, 606-610
Yuan Z, Oja E. Projective nonnegative matrixfactorization for image compression and feature extraction. In: Proceedings of the 14th Scandinavian Conferenceon Image Analysis (SCIA 2005). Joensuu, Finland, 2005, 333―342
8 Yuan Z, Oja E. Projective nonnegative matrix factorization for image compression and feature extraction. In: Proceedings of the 14th Scandinavian Conference on Image Analysis (SCIA 2005). Joensuu, Finland , 2005, 333-342
Oja E. Principalcomponents, minor components, and linear neural networks. Neural Networks, 1992, 5(6): 927―935

doi: 10.1016/S0893-6080(05)80089-9
9 Oja E. Principal components, minor components, and linear neural networks. Neural Networks , 1992, 5(6): 927-935
doi: 10.1016/S0893-6080(05)80089-9
Dhillon I, Guan Y, Kulis B. Kernel k-means, spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. Seattle, WA, USA, 2004, 551―556
10 Dhillon I, Guan Y, Kulis B. Kernel k-means, spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Seattle, WA, USA, 2004, 551-556
11 Yu S X, Shi J. Multiclass spectral clustering. In: Proceedings of the 9th IEEE International Conference on Computer Vision . 2003, 2: 313-319
Yu S X, Shi J. Multiclass spectral clustering. In: Proceedings of the 9th IEEE International Conferenceon Computer Vision. 2003, 2: 313―319
Jolliffe I T. Principal Component Analysis. Berlin: Springer-Verlag, 2002
12 Jolliffe I T. Principal Component Analysis. Berlin: Springer-Verlag, 2002
Diamantaras K, Kung S Y. Principal Component NeuralNetworks. New York: Wiley, 1996
13 Diamantaras K, Kung S Y. Principal Component Neural Networks. New York: Wiley, 1996
Oja E. SubspaceMethods of Pattern Recognition. Letchworth: Research Studies Press, 1983
14 Oja E. Subspace Methods of Pattern Recognition. Letchworth: Research Studies Press, 1983
15 Yang Z, Oja E. Linear and nonlinear projective nonnegative matrix factorization. IEEE Transactions on Neural Networks , 2009, accepted, to appear
Yang Z, Oja E. Linear and nonlinear projectivenonnegative matrix factorization. IEEETransactions on Neural Networks, 2009, accepted, to appear
Lloyd S P. Least square quantization in PCM. IEEETransactions on Information Theory, 1982, 28(2): 129―137

doi: 10.1109/TIT.1982.1056489
16 Lloyd S P. Least square quantization in PCM. IEEE Transactions on Information Theory , 1982, 28(2): 129-137
doi: 10.1109/TIT.1982.1056489
Lee D D, Seung H S. Algorithms for non-negativematrix factorization. In: Proceedings ofthe Conference on Advances in Neural information Processing Systems. 2000, 556―562
17 Lee D D, Seung H S. Algorithms for non-negative matrix factorization. In: Proceedings of the Conference on Advances in Neural information Processing Systems . 2000, 556-562
18 Hoyer P O. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research , 2004, 5: 1457-1469
Hoyer P O. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 2004, 5: 1457―1469
Xu W, Liu X, Gong Y. Document clustering based on nonnegative matrix factorization. In: Proceedings of the 26th Annual InternationalACM SIGIR Conference on Research and Development in Informaion Retrieval. 2003, 267―273
19 Xu W, Liu X, Gong Y. Document clustering based on nonnegative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval . 2003, 267-273
20 Févotte C, Bertin N, Durrieu J-L. Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis. Neural Computation , 2009, 21(3):793-830
doi: 10.1162/neco.2008.04-08-771
Févotte C, Bertin N, Durrieu J-L. Nonnegative matrix factorization with the itakura-saitodivergence: With application to music analysis. Neural Computation, 2009, 21(3):793―830

doi: 10.1162/neco.2008.04-08-771
21 Drakakis K, Rickard S, de Fréin R, Cichocki A. Analysis of financial data using non-negative matrix factorization. International Mathematical Forum , 2008, 3: 1853-1870
Drakakis K, Rickard S, de Fréin R, Cichocki A. Analysisof financial data using non-negative matrix factorization. International Mathematical Forum, 2008, 3: 1853―1870
22 Young S S, Fogel P, HawkinsD M. Clustering scotch whiskies using non-negative matrix factorization. Joint Newsletter for the Section on Physical and Engineering Sciences and the Quality and Productivity Section of the American Statistical Association , 2006, 14(1): 11-13
Young S S, Fogel P, Hawkins D M. Clustering scotch whiskies using non-negative matrixfactorization. Joint Newsletter for theSection on Physical and Engineering Sciences and the Quality and ProductivitySection of the American Statistical Association, 2006, 14(1): 11―13
23 Brunet J-P, Tamayo P, Golub T R, Mesirov J P. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences of the United States of America , 2004, 101(12): 4164-4169
doi: 10.1073/pnas.0308531101
Brunet J-P, Tamayo P, Golub T R, Mesirov J P. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciencesof the United States of America, 2004, 101(12): 4164―4169

doi: 10.1073/pnas.0308531101
24 Ding C, Li T, Jordan M I. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010, 32(1): 45-55
doi: 10.1109/TPAMI.2008.277
Ding C, Li T, Jordan M I. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and MachineIntelligence, 2010, 32(1): 45―55

doi: 10.1109/TPAMI.2008.277
25 Sha F, Saul L K, Lee D D. Multiplicative updates for large margin classifiers. In: Proceedings of the 16th Annual Conference on Learning Theory and the 7th Kernel Workshop, COLT . 2003, 188-202
Sha F, Saul L K, Lee D D. Multiplicative updates for large margin classifiers. In: Proceedings of the 16th Annual Conference onLearning Theory and the 7th Kernel Workshop, COLT. 2003, 188―202
26 Cichocki A, Lee H, Kim Y-D, Choi S. Non-negative matrix factorization with α-divergence. Pattern Recognition Letters , 2008, 29(9): 1433-1440
doi: 10.1016/j.patrec.2008.02.016
Cichocki A, Lee H, Kim Y-D, Choi S. Non-negativematrix factorization with α-divergence. Pattern Recognition Letters, 2008, 29(9): 1433―1440

doi: 10.1016/j.patrec.2008.02.016
27 Kivinen J, Warmuth M K. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation , 1997, 132(1): 1-63
doi: 10.1006/inco.1996.2612
Kivinen J, Warmuth M K. Exponentiated gradient versusgradient descent for linear predictors. Information and Computation, 1997, 132(1): 1―63

doi: 10.1006/inco.1996.2612
28 Yang Z, Yuan Z, Laaksonen J. Projective non-negative matrix factorization with applications to facial image processing. International Journal of Pattern Recognition and Artificial Intelligence , 2007, 21(8): 1353-1362
doi: 10.1142/S0218001407005983
Yang Z, Yuan Z, Laaksonen J. Projective non-negative matrix factorization with applicationsto facial image processing. InternationalJournal of Pattern Recognition and Artificial Intelligence, 2007, 21(8): 1353―1362

doi: 10.1142/S0218001407005983
29 Chung F R K. Spectral Graph Theory. American Methematical Society , 1997
Chung F R K. Spectral Graph Theory. American MethematicalSociety, 1997
30 Newman M E J. Finding community structure in networks using the eigenvectors of matrices. Physical Review E , 2006, 74(3): 036104
doi: 10.1103/PhysRevE.74.036104
Newman M E J. Finding community structure in networks using the eigenvectors ofmatrices. Physical Review E, 2006, 74(3): 036104

doi: 10.1103/PhysRevE.74.036104
31 Leicht E A, Holme P, Newman M E J. Vertex similarity in networks. Physical Review E , 2006, 73(2): 026120
doi: 10.1103/PhysRevE.73.026120
Leicht E A, Holme P, Newman M E J. Vertex similarity in networks. Physical Review E, 2006, 73(2): 026120

doi: 10.1103/PhysRevE.73.026120
32 Ye J, Zhao Z, Wu M. Discriminative K-means for clustering. In: Proceedings of the Conference on Advances in Neural Information Processing Systems . 2007, 1649-1656
Ye J, Zhao Z, Wu M. Discriminative K-means for clustering. In: Proceedings of the Conference on Advances in Neural Information ProcessingSystems. 2007, 1649―1656
[1] Jianhuang LAI, Changdong WANG. Kernel and graph: Two approaches for nonlinear competitive learning clustering[J]. Front Elect Electr Eng, 2012, 7(1): 134-146.
[2] Liang LIAO, Yanning ZHANG. MRI image segmentation based on fast kernel clustering analysis[J]. Front Elect Electr Eng Chin, 2011, 6(2): 363-373.
[3] Lei XU. Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology[J]. Front Elect Electr Eng Chin, 2011, 6(1): 86-119.
[4] Rongyan WANG, Gang LIU, Jun GUO, Yu FANG, . Multi-class classifier of non-speech audio based on Fisher kernel[J]. Front. Electr. Electron. Eng., 2010, 5(1): 72-76.
[5] Yun ZHANG, Boqin FENG, Shouqiang MA, Lianmeng LIU. Text clustering based on fusion of ant colony and genetic algorithms[J]. Front Elect Electr Eng Chin, 2009, 4(1): 15-19.
[6] NIU Kun, CHEN Junliang, ZHANG Shubo. Subspace clustering through attribute clustering[J]. Front. Electr. Electron. Eng., 2008, 3(1): 44-48.
[7] GUO Qing-lin, LIU Chang-an, FAN Xiao-zhong. The research and realization about automatic abstracting based on text clustering and natural language understanding[J]. Front. Electr. Electron. Eng., 2006, 1(4): 460-464.
[8] ZHAO Hai, SU Wei-ji, XU Ye, ZHANG Xin. The effect of Internet separation degree time sensitivity on transmission[J]. Front. Electr. Electron. Eng., 2006, 1(3): 325-329.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed