Extreme vocabulary learning

doi:10.1007/s11704-019-8249-3

Front. Comput. Sci.

2020, Vol. 14

Issue (6) : 146315 https://doi.org/10.1007/s11704-019-8249-3

RESEARCH ARTICLE

Extreme vocabulary learning

Hanze DONG¹, Zhenfeng SUN², Yanwei FU¹(

), Shi ZHONG², Zhengjun ZHANG³, Yu-Gang JIANG²

¹. School of Data Science, Fudan University, Shanghai 200433, China
². School of Computer Science, Fudan University, Shanghai 201203, China
³. Department of Statistics, University of Wisconsin, Madison 53706, USA

Download: PDF(1241 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.

Keywords vocabulary-informed learning zero-shot learning extreme value theory

Corresponding Author(s): Yanwei FU

Just Accepted Date: 27 August 2019 Issue Date: 20 January 2020

Cite this article:

Hanze DONG,Zhenfeng SUN,Yanwei FU, et al. Extreme vocabulary learning[J]. Front. Comput. Sci., 2020, 14(6): 146315.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8249-3
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I6/146315

1	I Biederman. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115 https://doi.org/10.1037/0033-295X.94.2.115
2	W J Scheirer, L P Jain, T E Boult. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324 https://doi.org/10.1109/TPAMI.2014.2321392
3	S A Rebuff, A Kolesnikov, C H Lampert. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010 https://doi.org/10.1109/CVPR.2017.587
4	A Opelt, A Pinz, A Zisserman. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10
5	Q Da, Y Yu, Z H Zhou. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766
6	W J Scheirer, A de Rezende Rocha, A Sapkota, T E Boult. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772 https://doi.org/10.1109/TPAMI.2012.256
7	E M Rudd, L P Jain, W J Scheirer, T E Boult. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768 https://doi.org/10.1109/TPAMI.2017.2707495
8	A Bendale, T Boult. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902 https://doi.org/10.1109/CVPR.2015.7298799
9	H Sattar, S Muller, M Fritz, A Bulling. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990 https://doi.org/10.1109/CVPR.2015.7298700
10	C H Lampert, H Nickisch, S Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465 https://doi.org/10.1109/TPAMI.2013.140
11	A Frome, G S Corrado, J Shlens, S Bengio, J Dean, M Ranzato, T Mikolov. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129
12	M Norouzi, T Mikolov, S Bengio, Y Singer, J Shlens, A Frome, G S Corrado, J Dean. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650
13	T Mikolov, I Sutskever, K Chen, G Corrado, J Dean. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119
14	V Kumar Verma, G Arora, A Mishra, P Rai. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289 https://doi.org/10.1109/CVPR.2018.00450
15	T Long, X Xu, Y Li, F M Shen, J K Song, H T Shen. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289 https://doi.org/10.1145/3240508.3240715
16	Y Long, L Liu, F M Shen, L Shao, X L Li. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512 https://doi.org/10.1109/TPAMI.2017.2762295
17	Y Q Xian, T Lorenz, B Schiele, Z Akata. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551 https://doi.org/10.1109/CVPR.2018.00581
18	Y W Fu, L Sigal. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346 https://doi.org/10.1109/CVPR.2016.576
19	Y W Fu, X M Wang, H Z Dong, Y G Jiang, M Wang, X Y Xue, L Sigal. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019 https://doi.org/10.1109/TPAMI.2019.2922175
20	X Bai, C Rao, X G Wang. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949 https://doi.org/10.1109/TIP.2014.2336542
21	X G Wang, B Y Wang, X Bai, W Y Liu, Z W Tu. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854
22	L Zhang, T Xiang, S G Gong. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030 https://doi.org/10.1109/CVPR.2017.321
23	S J Pan, Q Yang. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359
24	R Vilalta, Y Drissi. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95 https://doi.org/10.1023/A:1019956318069
25	S Thrun, L Pratt. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998 https://doi.org/10.1007/978-1-4615-5529-2
26	M Rohrbach, M Stark, B Schiele. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648 https://doi.org/10.1109/CVPR.2011.5995627
27	T Tommasi, B Caputo. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009 https://doi.org/10.5244/C.23.80
28	F F Li, R Fergus, P Perona. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141
29	E Bart, S Ullman. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679
30	T Hertz, A Hillel, D Weinshall. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408
31	F Fleuret, G Blanchard. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378
32	Y Amit, M Fink, N Srebro, S Ullman. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24 https://doi.org/10.1145/1273496.1273499
33	L Wolf, I Martin. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364
34	A Torralba, K Murphy, W Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869 https://doi.org/10.1109/TPAMI.2007.1055
35	M Rohrbach, S Ebert, B Schiele. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54
36	M Rohrbach, M Stark, G Szarvas, I Gurevych, B Schiele. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917 https://doi.org/10.1109/CVPR.2010.5540121
37	A Torralba, K P Murphy, W T Freeman. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114 https://doi.org/10.1145/1666420.1666446
38	Z Akata, S Reed, D Walter, H Lee, B Schiele. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936 https://doi.org/10.1109/CVPR.2015.7298911
39	J Weston, S Bengio, N Usunier. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
40	Z Akata, F Perronnin, Z Harchaoui, C Schmid. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826 https://doi.org/10.1109/CVPR.2013.111
41	Y W Fu, T M Hospedales, T Xiang, Z Y Fu, S G Gong. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599 https://doi.org/10.1007/978-3-319-10605-2_38
42	A Farhadi, I Endres, D Hoiem, D Forsyth. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785 https://doi.org/10.1109/CVPR.2009.5206772
43	G Koch, R Zemel, R Salakhutdinov. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015
44	S Kotz, S Nadarajah. Extreme Value Distributions: Theory and Applications. World Scientific, 2000 https://doi.org/10.1142/p191
45	P Bartlett, Y Freund, W S Lee, R E Schapire. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686 https://doi.org/10.1214/aos/1024691352
46	S Coles. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001 https://doi.org/10.1007/978-1-4471-3675-0
47	Y W Fu, T M Hospedales, T Xiang, S G Gong. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345 https://doi.org/10.1109/TPAMI.2015.2408354
48	Z Y Fu, T Xiang, E Kodirov, S Gong. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644 https://doi.org/10.1109/CVPR.2015.7298879
49	L V D Maaten, G Hinton. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605

[1]

Article highlights

Download

Viewed

Full text

Abstract

Cited

Shared

Discussed