Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (4) : 725-735    https://doi.org/10.1007/s11704-017-6543-5
RESEARCH ARTICLE
Instance selection method for improving graph-based semi-supervised learning
Hai WANG1,2, Shao-Bo WANG1,2, Yu-Feng LI1,2()
1. National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
2. Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210023, China
 Download: PDF(346 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Graph-based semi-supervised learning is an important semi-supervised learning paradigm. Although graphbased semi-supervised learning methods have been shown to be helpful in various situations, they may adversely affect performance when using unlabeled data. In this paper, we propose a new graph-based semi-supervised learning method based on instance selection in order to reduce the chances of performance degeneration. Our basic idea is that given a set of unlabeled instances, it is not the best approach to exploit all the unlabeled instances; instead, we should exploit the unlabeled instances that are highly likely to help improve the performance, while not taking into account the ones with high risk. We develop both transductive and inductive variants of our method. Experiments on a broad range of data sets show that the chances of performance degeneration of our proposed method are much smaller than those of many state-of-the-art graph-based semi-supervised learning methods.

Keywords graph-based semi-supervised learning      performance degeneration      instance selection     
Corresponding Author(s): Yu-Feng LI   
Just Accepted Date: 28 March 2017   Online First Date: 06 March 2018    Issue Date: 14 June 2018
 Cite this article:   
Hai WANG,Shao-Bo WANG,Yu-Feng LI. Instance selection method for improving graph-based semi-supervised learning[J]. Front. Comput. Sci., 2018, 12(4): 725-735.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-017-6543-5
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I4/725
1 Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2004, 321–328
2 Zhu X. Semi-supervised learning literature survey. Technical Report, 2007
3 Zhu X, Goldberg A B. Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 3(1): 1–130
https://doi.org/10.2200/S00196ED1V01Y200906AIM006
4 Chapelle O, Schölkopf B, Zien A. Semi-Supervised Learning. Cambridge: MIT Press, 2006
https://doi.org/10.7551/mitpress/9780262033589.001.0001
5 Blum A, Mitchell T. Combining labeled and unlabeled data with cotraining. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100
6 Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209
7 Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine learning. 2003, 912–919
8 Zhu X, Lafferty J, Rosenfeld R. Semi-supervised learning with graphs. Dissertation for the Doctoral Degree. Pittsburgh: CarnegieMellon University, 2005
9 Cai X F, Wen G H, Wei J, Yu Z W. Relative manifold based semisupervised dimensionality reduction. Frontiers of Computer Science, 2014, 8(6): 923–932
https://doi.org/10.1007/s11704-014-3193-8
10 Liu W, Wang J, Chang S F. Robust and scalable graph-based semisupervised learning. Proceedings of the IEEE, 2012, 100(9): 2624–2638
https://doi.org/10.1109/JPROC.2012.2197809
11 Joachims T. Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 290–297
12 Zha Z J, Mei T, Wang J, Wang Z, Hua X S. Graph-based semisupervised learning with multiple labels. Journal of Visual Communication and Image Representation, 2009, 20(2): 97–103
https://doi.org/10.1016/j.jvcir.2008.11.009
13 Camps-Valls G, Marsheva T V B, Zhou D. Semi-supervised graphbased hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2007, 45(10): 3044–3054
https://doi.org/10.1109/TGRS.2007.895416
14 Belkin M, Niyogi P. Semi-supervised learning on riemannian manifolds. Machine Learning, 2004, 56(1–3): 209–239
https://doi.org/10.1023/B:MACH.0000033120.25363.1e
15 Karlen M, Weston J, Erkan A, Collobert R. Large scale manifold transduction. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 775–782
https://doi.org/10.1145/1390156.1390213
16 Wang F, Zhang C. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55–67
https://doi.org/10.1109/TKDE.2007.190672
17 Li Y F, Wang S B, Zhou Z H. Graph quality judgement: a large margin expedition. In: Proceedings of the 25th International Joint Confernece on Artificial Intelligence. 2016, 1725–1731
18 Li Y F, Zhou Z H. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175–188
https://doi.org/10.1109/TPAMI.2014.2299812
19 Li Y F, Kwok J T, Zhou Z H. Towards safe semi-supervised learning for multivariate performance measures. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1816–1822
20 Balsubramani A, Freund Y. Optimally combining classifiers using unlabeled data. In: Proceedings of the 28th International Conference on Learning Theory. 2015, 211–225
21 Bennett K P, Demiriz A. Semi-supervised support vector machines. In: Proceedings of the Conference on Advances in Neural Information Processing Systems II. 1999, 368–374
22 Li Y F, Kwok J T, Zhou Z H. Semi-supervised learning using label mean. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 633–640
https://doi.org/10.1145/1553374.1553456
23 Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26
24 Chapelle O, Weston J, Schölkopf B. Cluster kernels for semisupervised learning. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. 2003, 601–608
25 Szummer M, Jaakkola T. Partially labeled classification with Markov random walks. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2002, 945–952
26 Kemp C, Griffiths T L, Stromsten S, Tenenbaum J B. Semi-supervised learning with trees. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2004, 257–264
27 Wang H, Wang S B, Li Y F. Instance selection method for improving graph-based semi-supervised learning. In: Proceedings of the 14th Pacific Rim International Conference on Artificial Intelligence. 2016, 565–573
https://doi.org/10.1007/978-3-319-42911-3_47
28 Jebara T, Wang J, Chang S F. Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 441–448
https://doi.org/10.1145/1553374.1553432
29 Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2002, 585–591
30 Kuncheva L I, Whitaker C J, Shipp C A, Duin R P. Limits on the majority vote accuracy in classifier fusion. Pattern Analysis and Applications, 2003, 6(1): 22–31
https://doi.org/10.1007/s10044-002-0173-7
31 Delalleau O, Bengio Y, Roux N L. Efficient non-parametric function induction in semi-supervised learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 96–103
32 Li Y F, Zhou Z H. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 386–391
33 Yang Y, Nie F P, Xu D, Luo J B. Zhuang Y T, Pan Y H. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 723–742
https://doi.org/10.1109/TPAMI.2011.170
34 Yang Y, Ma Z G, Nie F P, Chang X J, Hauptmann A G. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 2015, 113(2): 113–127
https://doi.org/10.1007/s11263-014-0781-x
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed