Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng    2012, Vol. 7 Issue (1) : 94-106    https://doi.org/10.1007/s11460-012-0170-6
RESEARCH ARTICLE
Probabilistic models of vision and max-margin methods
Alan YUILLE1,2(), Xuming HE1,3
1. Department of Statistics, University of California at Los Angeles, Los Angeles, CA 90095, USA; 2. Department of Brain and Cognitive Engineering, Korea University, Seoul 136-701, Korea; 3. NICTA Canberra Research Laboratory, Canberra, ACT 2601, Australia
 Download: PDF(606 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

It is attractive to formulate problems in computer vision and related fields in term of probabilistic estimation where the probability models are defined over graphs, such as grammars. The graphical structures, and the state variables defined over them, give a rich knowledge representation which can describe the complex structures of objects and images. The probability distributions defined over the graphs capture the statistical variability of these structures. These probability models can be learnt from training data with limited amounts of supervision. But learning these models suffers from the difficulty of evaluating the normalization constant, or partition function, of the probability distributions which can be extremely computationally demanding. This paper shows that by placing bounds on the normalization constant we can obtain computationally tractable approximations. Surprisingly, for certain choices of loss functions, we obtain many of the standard max-margin criteria used in support vector machines (SVMs) and hence we reduce the learning to standard machine learning methods. We show that many machine learning methods can be obtained in this way as approximations to probabilistic methods including multi-class max-margin, ordinal regression, max-margin Markov networks and parsers, multipleinstance learning, and latent SVM. We illustrate this work by computer vision applications including image labeling, object detection and localization, and motion estimation. We speculate that better results can be obtained by using better bounds and approximations.

Keywords structured prediction      max-margin learning      probabilistic models      loss function     
Corresponding Author(s): YUILLE Alan,Email:yuille@stat.ucla.edu   
Issue Date: 05 March 2012
 Cite this article:   
Alan YUILLE,Xuming HE. Probabilistic models of vision and max-margin methods[J]. Front Elect Electr Eng, 2012, 7(1): 94-106.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-012-0170-6
https://academic.hep.com.cn/fee/EN/Y2012/V7/I1/94
1 Pearl J. Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufmann, 1988
2 Manning C D, Schüutze H. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999
3 Heckerman D. A tutorial on learning with Bayesian networks. Learning in Graphical Model , 1999, 301-354
4 Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 2nd ed. Prentice Hall, 2003
5 Zhu S C, Mumford D. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision , 2006, 2(4): 259-362
doi: 10.1561/0600000018
6 Tenenbaum J B, Griffiths T L, Kemp C. Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences , 2006, 10(7): 309-318
doi: 10.1016/j.tics.2006.05.009
7 Smith N A. Linguistic Structure Prediction. Synthesis Lectures on Human Language Technologies . Morgan and Claypool, 2011
8 Grenander U. Pattern Synthesis: Lectures in Pattern Theory 1. New York, NY: Springer, 1976
9 Grenander U. Pattern Analysis: Lectures in Pattern Theory 2. New York, NY: Springer, 1978
10 Tenenbaum J B, Yuille A L. IPAM Summer School: The Mathematics of the Mind. IPAM, UCLA, 2007
11 Jin Y, Geman S. Context and hierarchy in a probabilistic image model. In: Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition . 2006, 2145-2152
12 Vapnik V N. The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 1995
13 Sch?lkopf B, Smola A. Leaning with Kernels. MIT Press, 2001
14 Franc V, Zien A, Sch?lkopf B. Support vector machines as probabilistic models. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11) . 2011, 665-672
15 Hastie T, Tibshirani R, Feldman J. The Elements of Statistical Learning. 1st ed. Springer, 2001
16 Lebanon G, Lafferty J. Boosting and maximum likelihood for exponential models. In: Proceedings of Neural Information Processing Systems (NIPS 2001) . 2001
17 R?tsch G, Onoda T, Müller K R. Soft margins for AdaBoost. Machine Learning , 2001, 42(3): 287-320
doi: 10.1023/A:1007618119488
18 R?tsch G, Mika S, Sch?lkopf B, Müller K R. Constructing boosting algorithms from SVMs: An application to one-class classification. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2002, 24(9): 1184-1199
doi: 10.1109/TPAMI.2002.1033211
19 Wahba G. Spline Models for Observational Data. SIAM CBMS-NSF Regional Conference Series in Applied Mathematics. Volume 59. SIAM, Philadelphia, PA , 1990
20 Seeger M. Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Proceedings of Neural Information Processing Systems (NIPS 1999) . 1999, 603-609
21 Poggio T, Smale S. The mathematics of learning: Dealing with data. American Mathematical Society, 2003
22 Zhu J, Xing E P. Maximum entropy discrimination Markov networks. Journal of Machine Learning Research , 2009, 10: 2531-2569
23 Zhu L, Chen Y, Lin Y, Yuille A L. A hierarchical image model for polynomial-time 2D parsing. In: Proceedings of Neural Information Processing Systems (NIPS 2008) . 2008
24 Zhu L, Chen Y, Lu Y, Lin C, Yuille A L. Max margin AND/OR graph learning for parsing the human body. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2008
25 Zhu L, Chen Y, Yuille A L. Learning a hierarchical deformable template for rapid deformable object parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010, 32(6): 1029-1043
doi: 10.1109/TPAMI.2009.65
26 Zhu L, Chen Y, Yuille A L, Freeman W. Latent hierarchical structure learning for object detection. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition . 2010
doi: 10.1109/CVPR.2010.5540096
27 Wu S, He X, Lu H, Yuille A L. A unified model of shortrange and long-range motion perception. In: Proceedings of Neural Information Processing Systems (NIPS 2010) . 2010
28 Fischler M A, Elschlager R A. The representation and matching of pictorial structures. IEEE Transactions on Computers , 1973, 22(1): 67-92
doi: 10.1109/T-C.1973.223602
29 Yuille A L, Hallinan P W, Cohen D S. Feature extraction from faces using deformable templates. International Journal of Computer Vision , 1992, 8(2): 99-111
doi: 10.1007/BF00127169
30 Cootes T F, Edwards G J, Taylor C J. Active appearance models. Lecture Notes in Computer Science , 1998, 1407: 484-498
doi: 10.1007/BFb0054760
31 Tu Z, Chen X, Yuille A L, Zhu S C. Image parsing: Unifying segmentation, detection, and recognition. In: Proceedings of the 9th IEEE International Conference on Computer Vision . 2003, 1: 18-25
32 Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning . 2001, 282-289
33 Kumar S, Hebert M. A hierarchical field framework for unified context-based classification. In: Proceedings of the 10th IEEE International Conference on Computer Vision . 2005, 2: 1284-1291
34 He X, Zemel R S, Carreira-Perpi?án M á. Multiscale conditional random fields for image labeling. In: Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition . 2004, 2: 695-702
35 Winn J M, Jojic N. LOCUS: Learning object classes with unsupervised segmentation. In: Proceedings of the 10th IEEE International Conference on Computer Vision . 2005, 1: 756-763
36 Sudderth E B, Torralba A B, Freeman W T, Willsky A S. Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the 10th IEEE International Conference on Computer Vision . 2005, 2: 1331-1338
37 Felzenszwalb P, Mcallester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition . 2008
doi: 10.1109/CVPR.2008.4587597
38 Shilman M, Liang P, Viola P A. Learning non-generative grammatical models for document analysis. In: Proceedings of the 10th IEEE International Conference on Computer Vision . 2005, 2: 962-969
39 Ahuja N, Todorovic S. Learning the taxonomy and models of categories present in arbitrary image. In: Proceedings of the 11th IEEE International Conference on Computer Vision . 2007
doi: 10.1109/ICCV.2007.4409039
40 Todorovic S, Ahuja N. Region-based hierarchical image matching. International Journal of Computer Vision , 2008, 78(1): 47-66
doi: 10.1007/s11263-007-0077-5
41 Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation , 2006, 18(7): 1527-1554
doi: 10.1162/neco.2006.18.7.1527
42 Amit Y, Geman D, Fan X. A coarse-to-fine strategy for multiclass shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2004, 26(12): 1606-1621
doi: 10.1109/TPAMI.2004.111
43 Kokkinos I, Yuille A L. Hop: Hierarchical object parsing. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition . 2009, 802-809
doi: 10.1109/CVPR.2009.5206639
44 Kokkinos I, Yuille A L. Inference and learning with hierarchical shape models. International Journal of Computer Vision , 2011, 93(2): 201-225
doi: 10.1007/s11263-010-0398-7
45 Tipping M E. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research , 2001, 1: 211-244
46 Pletscher P, Ong C S, Buhmann J M. Entropy and margin maximization for structured output learning. In: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases . 2010
doi: 10.1007/978-3-642-15939-8_6
47 Taskar B, Guestrin C, Koller D. Max-margin Markov networks. In: Proceedings of Neural Information Processing Systems (NIPS 2003) . 2003
48 Taskar B, Klein D, Collins M, Koller D, Manning C. Maxmargin parsing. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing . 2004
49 Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research , 2006, 6: 1453-1484
50 Yu C N, Joachims T. Learning structural SVMs with latent variables. In: Proceedings of the 26th International Conference on Machine Learning . 2009
51 Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. In: Proceedings of Neural Information Processing Systems (NIPS 2002) . 2002
52 Xu L, Wilkinson D, Schuurmans D. Discriminative unsupervised learning of structured predictors. In: Proceedings of the 23rd International Conference on Machine Learning . 2006
53 Collins M. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the ACL02 Conference on Empirical Methods in Natural Language Processing . 2002
54 Yedidia J S, Freeman W T, Weiss Y. Constructing freeenergy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory , 2005, 51(7): 2282-2312
doi: 10.1109/TIT.2005.850085
55 Yuille A L, Rangarajan A. The concave-convex procedure (CCCP). In: Proceedings of Neural Information Processing Systems (NIPS 2001) . 2001
56 Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B: Methodological , 1977, 39(1): 1-38
57 Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining . 2006
doi: 10.1145/1150402.1150429
58 Shotton J, Winn J, Rother C, Criminisi A. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of European Conference on Computer Vision . 2006
59 Baker S, Roth S, Scharstein D, Black M J, Lewis J P, Szeliski R. A database and evaluation methodology for optical flow. In: Proceedings of the 11th IEEE International Conference on Computer Vision . 2007
doi: 10.1109/ICCV.2007.4408903
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed