Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (6) : 146315    https://doi.org/10.1007/s11704-019-8249-3
RESEARCH ARTICLE
Extreme vocabulary learning
Hanze DONG1, Zhenfeng SUN2, Yanwei FU1(), Shi ZHONG2, Zhengjun ZHANG3, Yu-Gang JIANG2
1. School of Data Science, Fudan University, Shanghai 200433, China
2. School of Computer Science, Fudan University, Shanghai 201203, China
3. Department of Statistics, University of Wisconsin, Madison 53706, USA
 Download: PDF(1241 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.

Keywords vocabulary-informed learning      zero-shot learning      extreme value theory     
Corresponding Author(s): Yanwei FU   
Just Accepted Date: 27 August 2019   Issue Date: 20 January 2020
 Cite this article:   
Hanze DONG,Zhenfeng SUN,Yanwei FU, et al. Extreme vocabulary learning[J]. Front. Comput. Sci., 2020, 14(6): 146315.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8249-3
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I6/146315
1 I Biederman. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115
https://doi.org/10.1037/0033-295X.94.2.115
2 W J Scheirer, L P Jain, T E Boult. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324
https://doi.org/10.1109/TPAMI.2014.2321392
3 S A Rebuff, A Kolesnikov, C H Lampert. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010
https://doi.org/10.1109/CVPR.2017.587
4 A Opelt, A Pinz, A Zisserman. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10
5 Q Da, Y Yu, Z H Zhou. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766
6 W J Scheirer, A de Rezende Rocha, A Sapkota, T E Boult. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772
https://doi.org/10.1109/TPAMI.2012.256
7 E M Rudd, L P Jain, W J Scheirer, T E Boult. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768
https://doi.org/10.1109/TPAMI.2017.2707495
8 A Bendale, T Boult. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902
https://doi.org/10.1109/CVPR.2015.7298799
9 H Sattar, S Muller, M Fritz, A Bulling. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990
https://doi.org/10.1109/CVPR.2015.7298700
10 C H Lampert, H Nickisch, S Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465
https://doi.org/10.1109/TPAMI.2013.140
11 A Frome, G S Corrado, J Shlens, S Bengio, J Dean, M Ranzato, T Mikolov. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129
12 M Norouzi, T Mikolov, S Bengio, Y Singer, J Shlens, A Frome, G S Corrado, J Dean. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650
13 T Mikolov, I Sutskever, K Chen, G Corrado, J Dean. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119
14 V Kumar Verma, G Arora, A Mishra, P Rai. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289
https://doi.org/10.1109/CVPR.2018.00450
15 T Long, X Xu, Y Li, F M Shen, J K Song, H T Shen. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289
https://doi.org/10.1145/3240508.3240715
16 Y Long, L Liu, F M Shen, L Shao, X L Li. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512
https://doi.org/10.1109/TPAMI.2017.2762295
17 Y Q Xian, T Lorenz, B Schiele, Z Akata. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551
https://doi.org/10.1109/CVPR.2018.00581
18 Y W Fu, L Sigal. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346
https://doi.org/10.1109/CVPR.2016.576
19 Y W Fu, X M Wang, H Z Dong, Y G Jiang, M Wang, X Y Xue, L Sigal. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
https://doi.org/10.1109/TPAMI.2019.2922175
20 X Bai, C Rao, X G Wang. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949
https://doi.org/10.1109/TIP.2014.2336542
21 X G Wang, B Y Wang, X Bai, W Y Liu, Z W Tu. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854
22 L Zhang, T Xiang, S G Gong. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030
https://doi.org/10.1109/CVPR.2017.321
23 S J Pan, Q Yang. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359
24 R Vilalta, Y Drissi. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95
https://doi.org/10.1023/A:1019956318069
25 S Thrun, L Pratt. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998
https://doi.org/10.1007/978-1-4615-5529-2
26 M Rohrbach, M Stark, B Schiele. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648
https://doi.org/10.1109/CVPR.2011.5995627
27 T Tommasi, B Caputo. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009
https://doi.org/10.5244/C.23.80
28 F F Li, R Fergus, P Perona. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141
29 E Bart, S Ullman. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679
30 T Hertz, A Hillel, D Weinshall. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408
31 F Fleuret, G Blanchard. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378
32 Y Amit, M Fink, N Srebro, S Ullman. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24
https://doi.org/10.1145/1273496.1273499
33 L Wolf, I Martin. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364
34 A Torralba, K Murphy, W Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869
https://doi.org/10.1109/TPAMI.2007.1055
35 M Rohrbach, S Ebert, B Schiele. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54
36 M Rohrbach, M Stark, G Szarvas, I Gurevych, B Schiele. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917
https://doi.org/10.1109/CVPR.2010.5540121
37 A Torralba, K P Murphy, W T Freeman. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114
https://doi.org/10.1145/1666420.1666446
38 Z Akata, S Reed, D Walter, H Lee, B Schiele. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936
https://doi.org/10.1109/CVPR.2015.7298911
39 J Weston, S Bengio, N Usunier. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
40 Z Akata, F Perronnin, Z Harchaoui, C Schmid. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826
https://doi.org/10.1109/CVPR.2013.111
41 Y W Fu, T M Hospedales, T Xiang, Z Y Fu, S G Gong. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599
https://doi.org/10.1007/978-3-319-10605-2_38
42 A Farhadi, I Endres, D Hoiem, D Forsyth. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785
https://doi.org/10.1109/CVPR.2009.5206772
43 G Koch, R Zemel, R Salakhutdinov. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015
44 S Kotz, S Nadarajah. Extreme Value Distributions: Theory and Applications. World Scientific, 2000
https://doi.org/10.1142/p191
45 P Bartlett, Y Freund, W S Lee, R E Schapire. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686
https://doi.org/10.1214/aos/1024691352
46 S Coles. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001
https://doi.org/10.1007/978-1-4471-3675-0
47 Y W Fu, T M Hospedales, T Xiang, S G Gong. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345
https://doi.org/10.1109/TPAMI.2015.2408354
48 Z Y Fu, T Xiang, E Kodirov, S Gong. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644
https://doi.org/10.1109/CVPR.2015.7298879
49 L V D Maaten, G Hinton. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605
[1] Article highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed