Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front. Electr. Electron. Eng.    2010, Vol. 5 Issue (3) : 241-260    https://doi.org/10.1007/s11460-010-0101-3
Research articles
Information geometry in optimization, machine learning and statistical inference
Shun-ichi AMARI,
RIKENBrainScienceInstitute,Saitama351-0198,Japan;
 Download: PDF(374 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract The present article gives an introduction to information geometry and surveys its applications in the area of machine learning, optimization and statistical inference. Information geometry is explained intuitively by using divergence functions introduced in a manifold of probability distributions and other general manifolds. They give a Riemannian structure together with a pair of dual flatness criteria. Many manifolds are dually flat. When a manifold is dually flat, a generalized Pythagorean theorem and related projection theorem are introduced. They provide useful means for various approximation and optimization problems. We apply them to alternative minimization problems, YingYang machines and belief propagation algorithm in machine learning.
Keywords information geometry      machine learning      optimization      statistical inference      divergence      graphical model      Ying-Yang machine      
Issue Date: 05 September 2010
 Cite this article:   
Shun-ichi AMARI. Information geometry in optimization, machine learning and statistical inference[J]. Front. Electr. Electron. Eng., 2010, 5(3): 241-260.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-010-0101-3
https://academic.hep.com.cn/fee/EN/Y2010/V5/I3/241
Amari S, Nagaoka H. Methods of Information Geometry. New York: Oxford University Press, 2000
Csiszár I. Information-typemeasures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 1967, 2: 299―318
Bregman L. Therelaxation method of finding the common point of convex sets and itsapplication to the solution of problems in convex programming. USSR Computational Mathematics and MathematicalPhysics, 1967, 7(3): 200―217

doi: 10.1016/0041-5553(67)90040-7
Eguchi S. Secondorder efficiency of minimum contrast estimators in a curved exponentialfamily. The Annals of Statistics, 1983, 11(3): 793―803

doi: 10.1214/aos/1176346246
Chentsov N N. Statistical Decision Rules and Optimal Inference. Rhode Island, USA: American Mathematical Society, 1982 (originally published in Russian, Moscow: Nauka, 1972)
Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. SeriesB, 1977, 39(1): 1―38
Csiszár I, Tusnády G. Information geometry andalternating minimization procedures. Statisticsand Decisions, 1984, Supplement Issue 1: 205―237
Amari S. Informationgeometry of the EM and em algorithms for neural networks. Neural Networks, 1995, 8(9): 1379―1408

doi: 10.1016/0893-6080(95)00003-8
Xu L. BayesianYing-Yang machine, clustering and number of clusters. Pattern Recognition Letters, 1997, 18(11―13): 1167―1178

doi: 10.1016/S0167-8655(97)00121-9
Xu L. RBFnets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing, 1998, 19(1―3): 223―257

doi: 10.1016/S0925-2312(97)00091-X
Xu L. BayesianKullback Ying-Yang dependence reduction theory. Neurocomputing, 1998, 22(1―3): 81―111

doi: 10.1016/S0925-2312(98)00051-4
Xu L. BYYharmony learning, independent state space, and generalized APT financialanalyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822―849

doi: 10.1109/72.935094
Xu L. Bestharmony, unified RPCL and automated model selection for unsupervisedand supervised learning on Gaussian mixtures, three-layer nets andME-RBF-SVM models. International Journalof Neural Systems, 2001, 11(1): 43―69

doi: 10.1016/S0129-0657(01)00049-7
Xu L. BYYharmony learning, structural RPCL, and topological self-organizingon mixture models. Neural Networks, 2002, 15(8―9): 1125―1151

doi: 10.1016/S0893-6080(02)00084-9
Pearl J. ProbabilisticReasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988
Ikeda S, Tanaka T, Amari S. Information geometry of turbo and low-density parity-checkcodes. IEEE Transactions on InformationTheory, 2004, 50(6): 1097―1114

doi: 10.1109/TIT.2004.828072
Ikeda S, Tanaka T, Amari S. Stochastic reasoning, free energy, and information geometry. Neural Computation, 2004, 16(9): 1779―1810

doi: 10.1162/0899766041336477
Csiszár I. Informationmeasures: A critical survey. In: Transactionsof the 7th Prague Conference. 1974, 83―86
Csiszár I. Axiomaticcharacterizations of information measures. Entropy, 2008, 10(3): 261―273

doi: 10.3390/e10030261
Ali M S, Silvey S D. A general class of coefficientsof divergence of one distribution from another. Journal of the Royal Statistical Society. Series B, 1966, 28(1): 131―142
Amari S. α-divergence is unique, belonging toboth f-divergence and Bregman divergenceclasses. IEEE Transactions on InformationTheory, 2009, 55(11): 4925―4931

doi: 10.1109/TIT.2009.2030485
Cichocki A, Adunek R, Phan A H, Amari S. NonnegativeMatrix and Tensor Factorizations. JohnWiley, 2009

doi: 10.1002/9780470747278
Havrda J, Charvát F. Quantification method ofclassification process: Concept of structural α-entropy. Kybernetika, 1967, 3: 30―35
Chernoff H. Ameasure of asymptotic efficiency for tests of a hypothesis based onthe sum of observations. Annals of MathematicalStatistics, 1952, 23(4): 493―507

doi: 10.1214/aoms/1177729330
Matsuyama Y. The α-EM algorithm: Surrogate likelihoodmaximization using α-logarithmicinformation measures. IEEE Transactionson Information Theory, 2002, 49(3): 672―706
Amari S. Integrationof stochastic models by minimizing α-divergence. Neural Computation, 2007, 19(10): 2780―2796

doi: 10.1162/neco.2007.19.10.2780
Amari S. Informationgeometry and its applications: Convex function and dually flat manifold. In: Nielsen F ed. Emerging Trends in Visual Computing. LectureNotes in Computer Science, Vol 5416. Berlin: Springer-Verlag, 2009, 75―102

doi: 10.1007/978-3-642-00826-9_4
Eguchi S, Copas J. A class of logistic-typediscriminant functions. Biometrika, 2002, 89(1): 1―22

doi: 10.1093/biomet/89.1.1
Murata N, Takenouchi T, Kanamori T, Eguchi S. Informationgeometry of U-boost and Bregmandivergence. Neural Computation, 2004, 16(7): 1437―1481

doi: 10.1162/089976604323057452
Minami M, Eguchi S. Robust blind source separationby beta-divergence. Neural Computation, 2002, 14(8): 1859―1886

doi: 10.1162/089976602760128045
Byrne W. Alternatingminimization and Boltzmann machine learning. IEEE Transactions on Neural Networks, 1992, 3(4): 612―620

doi: 10.1109/72.143375
Amari S, Kurata K, Nagaoka H. Information geometry of Boltzmann machines. IEEE Transactions on Neural Networks, 1992, 3(2): 260―271

doi: 10.1109/72.125867
Amari S. Naturalgradient works efficiently in learning. Neural Computation, 1998, 10(2): 251―276

doi: 10.1162/089976698300017746
Amari S, Takeuchi A. Mathematical theory on formationof category detecting nerve cells. BiologicalCybernetics, 1978, 29(3): 127―136

doi: 10.1007/BF00337348
Jordan M I. Learning in Graphical Models. Cambridge,MA: MIT Press, 1999
Yuille A L. CCCP algorithms to minimize the Bethe and Kikuchi free energies:Convergent alternatives to belief propagation. Neural Computation, 2002, 14(7): 1691―1722

doi: 10.1162/08997660260028674
Yuille A L, Rangarajan A. The concave-convex procedure. Neural Computation, 2003, 15(4): 915―936

doi: 10.1162/08997660360581958
Opper M, Saad D. Advanced Mean Field Methods―Theoryand Practice. Cambridge, MA: MIT Press, 2001
Tanaka T. Informationgeometry of mean-field approximation. NeuralComputation, 2000, 12(8): 1951―1968

doi: 10.1162/089976600300015213
Amari S, Ikeda S, Shimokawa H. Information geometry and mean field approximation: The α-projection approach. In: Opper M, Saad D, eds. Advanced Mean Field Methods―Theory and Practice. Cambridge, MA: MIT Press, 2001, 241―257
[1] Lei CHEN, Hongjun WANG, Guangguo BI, Min ZHANG. An improved cooperative spectrum detection algorithm for cognitive radio[J]. Front Elect Electr Eng, 2012, 7(4): 367-373.
[2] Khalil Gorgani FIROUZJAH, Abdolreza SHEIKHOLESLAMI, Taghi BARFOROUSHI. Multi-objective allocation of measuring system based on binary particle swarm optimization[J]. Front Elect Electr Eng, 2012, 7(4): 399-415.
[3] Xinbo GAO, Xiumei WANG. Dimensionality reduction with latent variable model[J]. Front Elect Electr Eng, 2012, 7(1): 116-126.
[4] Chun-Hung CHEN, Leyuan SHI, Loo Hay LEE. Stochastic systems simulation optimization[J]. Front Elect Electr Eng Chin, 2011, 6(3): 468-480.
[5] Chen SONG, Xiaohong GUAN, Qianchuan ZHAO, Qing-Shan JIA. Remanufacturing planning based on constrained ordinal optimization[J]. Front Elect Electr Eng Chin, 2011, 6(3): 443-452.
[6] Xi-Ren CAO. Stochastic learning and optimization — Ideas vs mathematics?[J]. Front Elect Electr Eng Chin, 2011, 6(3): 398-411.
[7] Lijun ZHANG, Zhengguang CHEN, Miao ZHENG, Xiaofei HE. Robust non-negative matrix factorization[J]. Front Elect Electr Eng Chin, 2011, 6(2): 192-200.
[8] Zhi-Hua Zhou. When semi-supervised learning meets ensemble learning[J]. Front Elect Electr Eng Chin, 2011, 6(1): 6-16.
[9] Xiaofei HE, Binbin LIN. Tangent space learning and generalization[J]. Front Elect Electr Eng Chin, 2011, 6(1): 27-42.
[10] Kazushi IKEDA, Daisuke KOMAZAWA, Hiroyuki FUNAYA. Information geometry in neural spike sequences[J]. Front Elect Electr Eng Chin, 2011, 6(1): 146-150.
[11] Zhiyao MA, Wei CHEN, Zhigang CAO, Khaled Ben LETAIEF. Location-aware downlink and uplink power control and throughput optimization in cellular cognitive radio networks[J]. Front Elect Electr Eng Chin, 2010, 5(4): 441-448.
[12] Jeffrey Zhi J. ZHENG, Christian H. ZHENG, . A framework to express variant and invariant functional spaces for binary logic[J]. Front. Electr. Electron. Eng., 2010, 5(2): 163-172.
[13] Siyang SUN, Yinghua LU, Jinling ZHANG, Fangming RUAN, . Genetic algorithm optimization of broadband microstrip antenna[J]. Front. Electr. Electron. Eng., 2010, 5(2): 185-187.
[14] Tong ZHANG, Yahui WANG, Anli YE, Jian WANG, Jianchao ZENG. A new research of identification strategy based on particle swarm optimization and least square[J]. Front Elect Electr Eng Chin, 2009, 4(3): 313-317.
[15] Fei YU, Luxi YANG. Joint optimization in MIMO multiple relay channels[J]. Front Elect Electr Eng Chin, 2009, 4(2): 155-160.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed