|
|
The new AI is general and mathematically rigorous |
Jürgen SCHMIDHUBER, |
IDSIA, University
of Lugano and SUPSI, Galleria 2, 6928 Manno-Lugano, Switzerland; |
|
|
Abstract Most traditional artificial intelligence (AI) systems of the past decades are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam’s razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse’s thesis of the computer-generated universe. We first briefly review the history of AI since Gödel’s 1931 paper, then discuss recent post-2000 approaches that are currently transforming general AI research into a formal science.
|
Keywords
prediction
search
inductive inference
Occam’s razor
Speed Prior
super-Omega
limitcomputability
generalizations of Kolmogorov complexity
|
Issue Date: 05 September 2010
|
|
|
Solomonoff R J. A formal theory of inductive inference. Part I. Information and Control, 1964, 7(1): 1―22
doi: 10.1016/S0019-9958(64)90223-2
|
|
Kolmogorov A N. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1965, 1(1): 1―11
|
|
Newell A, Simon H. GPS, a program that simulateshuman thought. In: Feigenbaum E, Feldman J, eds. Computers and Thought. New York: McGraw-Hill, 1963, 279―293
|
|
Rosenbloom P S, Laird J E, Newell A. The Soar Papers: Research on Integrated Intelligence. MIT Press, 1993
|
|
Mitchell T. MachineLearning. New York: McGraw Hill, 1997
|
|
Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 1996, 4: 237―285
|
|
Schmidhuber J. Ultimatecognition à la G?del. CognitiveComputation, 2009, 1(2): 177―193
doi: 10.1007/s12559-009-9014-y
|
|
Schmidhuber J. Towardssolving the grand problem of AI. In: Quaresma P, Dourado A, Costa E, Costa J F, eds. Soft Computing and Complex Systems. Coimbra: Centro Internacionalde Mathematica, 2003, 77―97
|
|
G?del K. überformal unentscheidbare S?tze der Principia Mathematica und verwandter. Systeme I. Monatshefte für Mathematik undPhysik, 1931, 38: 173―198
|
|
Schmidhuber J. Newmillennium AI and the convergence of history. In: Duch W, Mandziuk J, eds. Challengesto Computational Intelligence. Studiesin Computational Intelligence. Berlin: Springer, 2007, 63: 15―36
|
|
Schmidhuber J. 2006:Celebrating 75 years of AI — history and outlook: The next 25years. In: Lungarella M, Iida F, Bongard J, Pfeifer R, eds. 50 Years of Artificial Intelligence. Lecture Notes in ArtificialIntelligence, 2007, 4850: 29―41
|
|
Turing A M. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society,Series 2, 1936, 42: 230―265
doi: 10.1112/plms/s2-42.1.230
|
|
Shannon C E. A mathematical theory of communication (Parts I and II). Bell System Technical Journal, 1948, 27(3&4): 379―423, 623―656
|
|
Solomonoff R J. Complexity-based induction systems. IEEETransactions on Information Theory, 1978, IT-24(4): 422―432
doi: 10.1109/TIT.1978.1055913
|
|
Hutter M. UniversalArtificial Intelligence: Sequential Decisions Based on AlgorithmicProbability. Berlin: Springer, 2004
|
|
Rechenberg I. Evolutionsstrategie:Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart Fromman-Holzboog, 1973
|
|
Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor: University of MichiganPress, 1975
|
|
Minsky M, Papert S. Perceptrons. Cambridge: MIT Press, 1969
|
|
Kohonen T. Self-Organizationand Associative Memory. 2nd ed. Berlin: Springer, 1988
|
|
Werbos P J. Beyond regression: New tools for prediction and analysis in the behavioralsciences. . Cambridge: Harvard University, 1974
|
|
Rissanen J. Modelingby shortest data description. Automatica, 1978, 14(5): 465―471
doi: 10.1016/0005-1098(78)90005-5
|
|
Brooks R A. Intelligence without reason. In: Proceedingsof the Twelfth International Joint Conference on Artificial Intelligence. 1991, 569―595
|
|
Dorigo M, Di Caro G, Gambardella L M. Ant algorithms for discrete optimization. Artificial Life, 1999, 5(2): 137―172
doi: 10.1162/106454699568728
|
|
Vapnik V. TheNature of Statistical Learning Theory. New York: Springer-Verlag, 1995
|
|
Dickmanns E D, Behringer R, Dickmanns D, Hildebrandt T, Maurer M, Thomanek F, Schiehlen J. Theseeing passenger car ‘VaMoRs-P’. In: Proceedings of the International Symposium on Intelligent Vehicles1994. 1994, 68―73
|
|
Nilsson N J. Principles of Artificial Intelligence. San Francisco: Morgan Kaufmann, 1980
|
|
Pfeifer R, Scheier C. Understanding Intelligence. Cambridge: MIT Press, 2001
|
|
Zvonkin A K, Levin L A. The complexity of finiteobjects and the algorithmic concepts of information and randomness. Russian Mathematical Surveys, 1970, 25(6): 83―124
doi: 10.1070/RM1970v025n06ABEH001269
|
|
Gács P. Onthe relation between descriptional complexity and algorithmic probability. Theoretical Computer Science, 1983, 22(1―2): 71―93
|
|
Li M, Vitányi PM B. An Introductionto Kolmogorov Complexity and Its Applications. 2nd ed. Berlin: Springer-Verlag, 1997
|
|
Merhav N, Feder M. Universal prediction. IEEE Transactions on Information Theory, 1998, 44(6): 2124―2147
doi: 10.1109/18.720534
|
|
Schmidhuber J. Algorithmictheories of everything. Technical ReportIDSIA-20-00, quant-ph/0011122, 2000
|
|
Schmidhuber J. Hierarchiesof generalized Kolmogorov complexities and nonenumerable universalmeasures computable in the limit. InternationalJournal of Foundations of Computer Science, 2002, 13(4): 587―612
doi: 10.1142/S0129054102001291
|
|
Brouwer L E J. Over de Grondslagen der Wiskunde. . Amsterdam: University of Amsterdam, 1907
|
|
Beeson M. Foundationsof Constructive Mathematics. Heidelberg: Springer-Verlag, 1985
|
|
Gold E M. Limiting recursion. Journal of SymbolicLogic, 1965, 30(1): 28―48
doi: 10.2307/2270580
|
|
Putnam H. Trialand error predicates and the solution to a problem of Mostowski. Journal of Symbolic Logic, 1965, 30(1): 49―57
doi: 10.2307/2270581
|
|
Freyvald R V. Functions and functionals computable in the limit. Transactions of Latvijas Vlasts Univ. Zinatn. Raksti, 1977, 210: 6―19
|
|
Rogers H Jr. Theory of Recursive Functions and Effective Computability. New York: McGraw-Hill, 1967
|
|
Chaitin G J. Algorithmic Information Theory. Cambridge: Cambridge University Press, 1987
doi: 10.1017/CBO9780511608858
|
|
L?wenheim L. überM?glichkeiten im Relativkalkül. Mathematische Annalen, 1915, 76: 447―470
|
|
Skolem T. Logisch-kombinatorischeUntersuchungen über die Erfüllbarkeit oder Beweisbarkeitmathematischer S?tze nebst einem Theorem über dichte Mengen. Skrifter utgit av Videnskapsselskapet in KristianiaI, Matematicsk-Naturvidenskabelig, 1920, 4: 1―36
|
|
Cajori F. Historyof Mathematics. 2nd ed. New York: Macmillan, 1919
|
|
Cantor G. übereine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen. Crelle’s Journal für Mathematik, 1874, 77: 258―262
|
|
Wallace C S, Boulton D M. An information theoreticmeasure for classification. The ComputerJournal, 1968, 11(2): 185―194
|
|
Schmidhuber J. TheSpeed Prior: A new simplicity measure yielding near-optimal computablepredictions. In: Kivinen J, Sloan R H, eds. Proceedings of the 15th AnnualConference on Computational Learning Theory. Lecture Notes in ArtificialIntelligence, 2002, 2375: 216―228
|
|
Levin L A. Laws of information (nongrowth) and aspects of the foundation ofprobability theory. Problems of InformationTransmission, 1974, 10(3): 206―210
|
|
Schmidhuber J. Discoveringsolutions with low Kolmogorov complexity and high generalization capability. In: Prieditis A, Russell S, eds. Proceedings of Machine Learning: Proceedings of the Twelfth InternationalConference. 1995, 488―496
|
|
Schmidhuber J. Discoveringneural nets with low Kolmogorov complexity and high generalizationcapability. Neural Networks, 1997, 10(5): 857―873
doi: 10.1016/S0893-6080(96)00127-X
|
|
Popper K R. The Logic of Scientific Discovery. London: Hutchinson, 1934
|
|
Green M B, Schwarz J H, Witten E. Superstring Theory. Cambridge: Cambridge University Press, 1987
|
|
Everett H III. ‘Relative State’ formulation of quantum mechanics. Reviews of Modern Physics, 1957, 29(3): 454―462
doi: 10.1103/RevModPhys.29.454
|
|
Zuse K. Rechnenderraum. Elektronische Datenverarbeitung, 1967, 8: 336―344
|
|
Zuse K. RechnenderRaum. Braunschweig: Friedrich Vieweg &Sohn, 1969. English translation: Calculating Space. MIT TechnicalTranslation AZT-70–164-GEMIT. Cambridge: Massachusetts Instituteof Technology (Proj. MAC), 1970
|
|
Schmidhuber J. Randomnessin physics. Nature, 2006, 439(7075): 392
doi: 10.1038/439392d
|
|
Ulam S. Randomprocesses and transformations. In: Proceedingsof the International Congress on Mathematics. 1950, 2: 264―275
|
|
von Neumann J. Theoryof Self-Reproducing Automata. Champaign: University of Illinois Press, 1966
|
|
Bennett C H, DiVicenzo D P. Quantum information and computation. Nature, 2000, 404(6775): 256―259
doi: 10.1038/35005001
|
|
Deutsch D. TheFabric of Reality. New York: Allen Lane, 1997
|
|
Penrose R. TheEmperor’s New Mind. Oxford: Oxford University Press, 1989
|
|
Bell J S. On the problem of hidden variables in quantum mechanics. Reviews of Modern Physics, 1966, 38(3): 447―452
doi: 10.1103/RevModPhys.38.447
|
|
xxx.lanl.gov/abs/gr-qc/9903084
|
|
Schmidhuber J. Acomputer scientist’s view of life, the universe, and everything. In: Freksa C, Jantzen M, Valk R, eds. Foundations of Computer Science: Potential-Theory-Cognition. LectureNotes in Computer Science, 1997, 1337: 201―208
|
|
Erber T, Putterman S. Randomness in quantum mechanics— nature’s ultimate cryptogram? Nature, 1985, 318(6041): 41―43
doi: 10.1038/318041a0
|
|
Fredkin E F, Toffoli T. Conservative logic. International Journal of Theoretical Physics, 1982, 21(3/4): 219―253
doi: 10.1007/BF01857727
|
|
Levin L A. Universal sequential search problems. Problems of Information Transmission, 1973, 9(3): 265―266
|
|
Hutter M. Thefastest and shortest algorithm for all welldefined problems. International Journal of Foundations of ComputerScience, 2002, 13(3): 431―443
doi: 10.1142/S0129054102001199
|
|
Schmidhuber J, Zhao J, Wiering M. Shifting inductive bias with success-story algorithm,adaptive Levin search, and incremental self-improvement. Machine Learning, 1997, 28(1): 105―130
doi: 10.1023/A:1007383707642
|
|
Solomonoff R J. An application of algorithmic probability to problems in artificialintelligence. In: Kanal L N, Lemmer J F, eds. Uncertainty in ArtificialIntelligence. Amsterdam: Elsevier Science Publishers, 1986, 473―491
|
|
Solomonoff R J. A system for incremental learning based on algorithmic probability. In: Proceedings of the Sixth Israeli Conferenceon Artificial Intelligence, Computer Vision and Pattern Recognition. 1989, 515―527
|
|
Wiering M A, Schmidhuber J. Solving POMDPs with Levinsearch and EIRA. In: Saitta L, ed. Machine Learning: Proceedings of the ThirteenthInternational Conference. SanFrancisco: Morgan Kaufmann Publishers, 1996, 534―542
|
|
Schmidhuber J. Optimalordered problem solver. Machine Learning, 2004, 54(3): 211―254
doi: 10.1023/B:MACH.0000015880.99707.b2
|
|
Schmidhuber J. Bias-optimalincremental problem solving. In: Becker S, Thrun S, Obermayer K, eds. Advances in Neural Information Processing Systems 15. Cambridge: MIT Press, 2003, 1571―1578
|
|
Chaitin G J. A theory of program size formally identical to information theory. Journal of the Association for Computing Machinery, 1975, 22(3): 329―340
|
|
Moore C H, Leach G C. FORTH — A Languagefor Interactive Computing. Amsterdam: Mohasco Industries Inc, 1970
|
|
Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation. In: Rumelhart D E, McClelland J L, eds. Parallel Distributed Processing. Camridge: MIT Press, 1986, 1: 318―362
|
|
Bishop C M. Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1995
|
|
Hochreiter S, Younger A S, Conwell P R. Learning to learn using gradient descent.In: Proceedings of the International Conference onArtificial Neural Networks. 2001, 87―94
|
|
Schmidhuber J. G?delmachines: Fully self-referential optimal universal self-improvers. In: Goertzel B, Pennachin C, eds. Artificial General Intelligence. Berlin: Springer-Verlag, 2006, 199―226
|
|
Schmidhuber J. Completelyself-referential optimal reinforcement learners. In: Duch W, Kacprzyk J, Oja E, Zadrozny S, eds. Proceedings of the International Conference on Artificial NeuralNetworks. 2005, 223―233
|
|
Schmidhuber J. Dynamischeneuronale Netze und das fundamentale raumzeitliche Lernproblem. . München: Institut für Informatik, Technische Universit?tMünchen, 1990
|
|
Schmidhuber J. Apossibility for implementing curiosity and boredom in model-buildingneural controllers. In: Meyer J A, Wilson S W, eds. Proceedings ofthe International Conference on Simulation of Adaptive Behavior: FromAnimals to Animats. 1991, 222―227
|
|
Schmidhuber J. Adaptivecuriosity and adaptive confidence. TechnicalReport FKI-149-91, Institut fürInformatik, Technische Universit?t München, 1991
|
|
Schmidhuber J. Curiousmodel-building control systems. In: Proceedingsof the International Joint Conference on Neural Networks. 1991, 2: 1458―1463
|
|
Storck J, Hochreiter S, Schmidhuber J. Reinforcement driven information acquisition in non-deterministicenvironments. In: Proceedings of the InternationalConference on Artificial Neural Networks. 1995, 2: 159―164
|
|
|
Schmidhuber J. Exploringthe predictable. In: Ghosh A, Tsuitsui S, eds. Advances in Evolutionary Computing. Berlin: Springer, 2002, 579―612
|
|
Schmidhuber J. Developmentalrobotics, optimal artificial curiosity, creativity, music, and thefine arts. Connection Science, 2006, 18(2): 173―187
doi: 10.1080/09540090600768658
|
|
Schmidhuber J. Simplealgorithmic principles of discovery, subjective beauty, selectiveattention, curiosity & creativity. In: Proceedings of the 10th International Conference on Discovery Science.Lecture Notes in Computer Science, 2007, 4755: 26―38
doi: 10.1007/978-3-540-75488-6_3
|
|
Schmidhuber J. Drivenby compression progress. In: Lovrek I, Howlett R J, Jain L C, eds. Proceedings of the 12th International Conference on Knowledge-BasedIntelligent Information and Engineering Systems. Lecture Notes inComputer Science, 2008, 5177: 11
|
|
Schmidhuber J. Simplealgorithmic theory of subjective beauty, novelty, surprise, interestingness,attention, curiosity, creativity, art, science, music, jokes. SICE Journal of the Society of Instrument and ControlEngineers, 2009, 48(1): 21―32
|
|
Schmidhuber J. Drivenby compression progress: A simple principle explains essential aspectsof subjective beauty, novelty, surprise, interestingness, attention,curiosity, creativity, art, science, music, jokes. In: Pezzulo G, Butz M V, Sigaud O, Baldassarre G, eds. Anticipatory Behavior in Adaptive Learning Systems. From PsychologicalTheories to Artificial Cognitive Systems. Lecture Notes in ComputerScience, 2009, 5499: 48―76
|
|
Schmidhuber J. Art& science as by-products of the search for novel patterns, ordata compressible in unknown yet learnable ways. In: Botta M, ed. Multiple Ways toDesign Research. Research Cases That Reshape the Design Discipline.Swiss Design Network — Et al. Edizioni, 2009, 98―112
|
|
Schmidhuber J. Artificialscientists & artists based on the formal theory of creativity. In: Hutter M, Servedio R A, Takimoto E, eds. Proceedings of the Third Conference on Artificial General Intelligence. 2010
|
|
Schmidhuber J. Anon-line algorithm for dynamic reinforcement learning and planningin reactive environments. In: Proceedingsof IEEE/INNS International Joint Conference on Neural Networks. 1990, 2: 253―258
|
|
Sutton R S, Barto A G. Reinforcement Learning: AnIntroduction. Cambridge: MIT Press, 1998
|
|
Schmidhuber J. Low-complexityart. Leonardo, Journal of the InternationalSociety for the Arts, Sciences, and Technology, 1997, 30(2): 97―103
|
|
Kolmogorov A N. Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer-Verlag, 1933
|
|
Utgoff P. Shiftof bias for inductive concept learning. In: Michalski R, Carbonell J, Mitchell T, eds. Machine Learning. Los Altos: Morgan Kaufmann, 1986, 2: 163―190
|
|
Schmidhuber J. Thenew AI: General & sound & relevant for physics.In: Goertzel B, Pennachin C, eds. Artificial General Intelligence. Berlin: Springer-Verlag, 2006, 175―198
|
|
CERN-TH/2000-316,CERN, Theory Division, 2000. xxx.lanl.gov/abs/hep-th/0011065
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|