|
|
Extreme vocabulary learning |
Hanze DONG1, Zhenfeng SUN2, Yanwei FU1( ), Shi ZHONG2, Zhengjun ZHANG3, Yu-Gang JIANG2 |
1. School of Data Science, Fudan University, Shanghai 200433, China 2. School of Computer Science, Fudan University, Shanghai 201203, China 3. Department of Statistics, University of Wisconsin, Madison 53706, USA |
|
|
Abstract Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.
|
Keywords
vocabulary-informed learning
zero-shot learning
extreme value theory
|
Corresponding Author(s):
Yanwei FU
|
Just Accepted Date: 27 August 2019
Issue Date: 20 January 2020
|
|
1 |
I Biederman. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115
https://doi.org/10.1037/0033-295X.94.2.115
|
2 |
W J Scheirer, L P Jain, T E Boult. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324
https://doi.org/10.1109/TPAMI.2014.2321392
|
3 |
S A Rebuff, A Kolesnikov, C H Lampert. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010
https://doi.org/10.1109/CVPR.2017.587
|
4 |
A Opelt, A Pinz, A Zisserman. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10
|
5 |
Q Da, Y Yu, Z H Zhou. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766
|
6 |
W J Scheirer, A de Rezende Rocha, A Sapkota, T E Boult. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772
https://doi.org/10.1109/TPAMI.2012.256
|
7 |
E M Rudd, L P Jain, W J Scheirer, T E Boult. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768
https://doi.org/10.1109/TPAMI.2017.2707495
|
8 |
A Bendale, T Boult. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902
https://doi.org/10.1109/CVPR.2015.7298799
|
9 |
H Sattar, S Muller, M Fritz, A Bulling. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990
https://doi.org/10.1109/CVPR.2015.7298700
|
10 |
C H Lampert, H Nickisch, S Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465
https://doi.org/10.1109/TPAMI.2013.140
|
11 |
A Frome, G S Corrado, J Shlens, S Bengio, J Dean, M Ranzato, T Mikolov. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129
|
12 |
M Norouzi, T Mikolov, S Bengio, Y Singer, J Shlens, A Frome, G S Corrado, J Dean. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650
|
13 |
T Mikolov, I Sutskever, K Chen, G Corrado, J Dean. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119
|
14 |
V Kumar Verma, G Arora, A Mishra, P Rai. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289
https://doi.org/10.1109/CVPR.2018.00450
|
15 |
T Long, X Xu, Y Li, F M Shen, J K Song, H T Shen. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289
https://doi.org/10.1145/3240508.3240715
|
16 |
Y Long, L Liu, F M Shen, L Shao, X L Li. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512
https://doi.org/10.1109/TPAMI.2017.2762295
|
17 |
Y Q Xian, T Lorenz, B Schiele, Z Akata. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551
https://doi.org/10.1109/CVPR.2018.00581
|
18 |
Y W Fu, L Sigal. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346
https://doi.org/10.1109/CVPR.2016.576
|
19 |
Y W Fu, X M Wang, H Z Dong, Y G Jiang, M Wang, X Y Xue, L Sigal. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
https://doi.org/10.1109/TPAMI.2019.2922175
|
20 |
X Bai, C Rao, X G Wang. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949
https://doi.org/10.1109/TIP.2014.2336542
|
21 |
X G Wang, B Y Wang, X Bai, W Y Liu, Z W Tu. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854
|
22 |
L Zhang, T Xiang, S G Gong. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030
https://doi.org/10.1109/CVPR.2017.321
|
23 |
S J Pan, Q Yang. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359
|
24 |
R Vilalta, Y Drissi. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95
https://doi.org/10.1023/A:1019956318069
|
25 |
S Thrun, L Pratt. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998
https://doi.org/10.1007/978-1-4615-5529-2
|
26 |
M Rohrbach, M Stark, B Schiele. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648
https://doi.org/10.1109/CVPR.2011.5995627
|
27 |
T Tommasi, B Caputo. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009
https://doi.org/10.5244/C.23.80
|
28 |
F F Li, R Fergus, P Perona. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141
|
29 |
E Bart, S Ullman. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679
|
30 |
T Hertz, A Hillel, D Weinshall. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408
|
31 |
F Fleuret, G Blanchard. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378
|
32 |
Y Amit, M Fink, N Srebro, S Ullman. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24
https://doi.org/10.1145/1273496.1273499
|
33 |
L Wolf, I Martin. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364
|
34 |
A Torralba, K Murphy, W Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869
https://doi.org/10.1109/TPAMI.2007.1055
|
35 |
M Rohrbach, S Ebert, B Schiele. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54
|
36 |
M Rohrbach, M Stark, G Szarvas, I Gurevych, B Schiele. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917
https://doi.org/10.1109/CVPR.2010.5540121
|
37 |
A Torralba, K P Murphy, W T Freeman. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114
https://doi.org/10.1145/1666420.1666446
|
38 |
Z Akata, S Reed, D Walter, H Lee, B Schiele. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936
https://doi.org/10.1109/CVPR.2015.7298911
|
39 |
J Weston, S Bengio, N Usunier. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
|
40 |
Z Akata, F Perronnin, Z Harchaoui, C Schmid. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826
https://doi.org/10.1109/CVPR.2013.111
|
41 |
Y W Fu, T M Hospedales, T Xiang, Z Y Fu, S G Gong. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599
https://doi.org/10.1007/978-3-319-10605-2_38
|
42 |
A Farhadi, I Endres, D Hoiem, D Forsyth. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785
https://doi.org/10.1109/CVPR.2009.5206772
|
43 |
G Koch, R Zemel, R Salakhutdinov. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015
|
44 |
S Kotz, S Nadarajah. Extreme Value Distributions: Theory and Applications. World Scientific, 2000
https://doi.org/10.1142/p191
|
45 |
P Bartlett, Y Freund, W S Lee, R E Schapire. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686
https://doi.org/10.1214/aos/1024691352
|
46 |
S Coles. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001
https://doi.org/10.1007/978-1-4471-3675-0
|
47 |
Y W Fu, T M Hospedales, T Xiang, S G Gong. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345
https://doi.org/10.1109/TPAMI.2015.2408354
|
48 |
Z Y Fu, T Xiang, E Kodirov, S Gong. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644
https://doi.org/10.1109/CVPR.2015.7298879
|
49 |
L V D Maaten, G Hinton. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|