Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (4) : 669-676    https://doi.org/10.1007/s11704-019-8452-2
REVIEW ARTICLE
Safe semi-supervised learning: a brief introduction
Yu-Feng LI1,2(), De-Ming LIANG1,2
1. National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
2. Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210023, China
 Download: PDF(467 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Semi-supervised learning constructs the predictive model by learning from a few labeled training examples and a large pool of unlabeled ones. It has a wide range of application scenarios and has attracted much attention in the past decades. However, it is noteworthy that although the learning performance is expected to be improved by exploiting unlabeled data, some empirical studies show that there are situations where the use of unlabeled data may degenerate the performance. Thus, it is advisable to be able to exploit unlabeled data safely. This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality;model uncertainty, where the learning algorithm fails to handle the uncertainty during training; measure diversity, where the safe performance could be adapted to diverse measures.

Keywords machine learning      semi-supervised learning      safe     
Corresponding Author(s): Yu-Feng LI   
Just Accepted Date: 20 March 2019   Online First Date: 07 May 2019    Issue Date: 29 May 2019
 Cite this article:   
Yu-Feng LI,De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8452-2
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I4/669
1 IGoodfellow, YBengio, ACourville. Deep Learning. MA: MIT Press, 2016
2 OChapelle, B Schölkopf, AZien. Semi-supervised Learning. MA: MIT Press, 2006
https://doi.org/10.7551/mitpress/9780262033589.001.0001
3 D JMiller, H SUyar. A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Proceedings of the 10th Annual Conference on Neural Information Processing Systems. 1996, 571–577
4 KNigam, A McCallum, SThrun, T MMitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2–3): 103–134
https://doi.org/10.1023/A:1007692713085
5 TJoachims. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209
6 K PBennett, A Demiriz. Semi-supervised support vector machines. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1998, 368–374
7 X JZhu, Z Ghahramani, J DLafferty. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 912–919
8 MBelkin, PNiyogi, VSindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7(Nov): 2399–2434
9 ABlum, SChawla. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26
10 WLiu, JWang, S FChang. Robust and scalable graph-based semisupervised learning. Proceedings of the IEEE, 2012, 100(9): 2624–2638
https://doi.org/10.1109/JPROC.2012.2197809
11 DZhou, O Bousquet, T NLal, JWeston, B Schölkopf. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321–328
12 ABlum, T M Mitchell. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100
https://doi.org/10.1145/279943.279962
13 Z HZhou, MLi. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529–1541
https://doi.org/10.1109/TKDE.2005.186
14 ASingh, R DNowak, XZhu. Unlabeled data: now it helps, now it doesn’t. In: Proceedings of the 21st International Conference on Neural Information Processing Systems. 2008, 1513–1520
15 TYang, C EPriebe. The effect of model misspecification on semisupervised classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(10): 2093–2103
https://doi.org/10.1109/TPAMI.2011.45
16 OChapelle, V Sindhwani, S SKeerthi. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research, 2008, 9: 203–233
17 N VChawla, G I Karakoulas. Learning from labeled and unlabeled data: an empirical study across techniques and domains. Journal of Artificial Intelligence Research, 2005, 23: 331–366
https://doi.org/10.1613/jair.1509
18 KChen, SWang. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129–143
https://doi.org/10.1109/TPAMI.2010.92
19 F GCozman, ICohen, M CCirelo. Semi-supervised learning of mixture models. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 99–106
20 YGrandvalet. Semi-supervised learning by entropy minimization. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 529–536
21 F GCozman, ICohen, MCirelo. Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference. 2002, 327–331
22 Y FLi, S BWang, Z HZhou. Graph quality judgement: a large margin expedition. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1725–1731
23 HWang, S BWang, Y FLi. Instance selection method for improving graph-based semi-supervised learning. Frontiers of Computer Science, 2018, 12(4): 725–735
https://doi.org/10.1007/s11704-017-6543-5
24 Y FLi, Z HZhou. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 386–391
25 Y FLi, Z HZhou. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175–188
https://doi.org/10.1109/TPAMI.2014.2299812
26 Y FLi, J TKwok, Z HZhou. Towards safe semi-supervised learning for multivariate performance measures. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1816–1822
27 TJebara, JWang, S FChang. Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 441–448
https://doi.org/10.1145/1553374.1553432
28 M ÁCarreira-Perpi nán, R SZemel. Proximity graphs for clustering and manifold learning. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 225–232
29 XZhu. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2006, 2(3): 4
30 FWang, CZhang. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55–67
https://doi.org/10.1109/TKDE.2007.190672
31 MBelkin, PNiyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, 2004, 56(1–3): 209–239
https://doi.org/10.1023/B:MACH.0000033120.25363.1e
32 MKarlen, JWeston, AErkan, R Collobert. Large scale manifold transduction. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 448–455
https://doi.org/10.1145/1390156.1390213
33 D MLiang, Y FLi. Learning safe graph construction from multiple graphs. In: Proceedings of the International CCF Conference on Artificial Intelligence. 2018, 41–54
https://doi.org/10.1007/978-981-13-2122-1_4
34 L ZGuo, S BWang, Y FLi. Large margin graph construction for semisupervised learning. In: Proceedings of the International Workshop on Large Scale Graph Representation Learning and Applications. 2018, 1030–1033
https://doi.org/10.1109/ICDMW.2018.00148
35 Z HZhou, MLi. Semi-supervised learning by disagreement. Knowledge and Information Systems, 2010, 24(3): 415–439
https://doi.org/10.1007/s10115-009-0209-z
36 Y FLi, Z HZhou. Towards making unlabeled data never hurt. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1081–1088
37 YWang, SChen, Z HZhou. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(5): 689–702
https://doi.org/10.1109/TNNLS.2012.2186825
38 YWang, YMeng, ZFu, HXue. Towards safe semi-supervised classification: adjusted cluster assumption via clustering. Neural Processing Letters, 2017, 46(3): 1031–1042
https://doi.org/10.1007/s11063-017-9607-5
39 ABalsubramani, YFreund. Optimally combining classifiers using unlabeled data. In: Proceedings of the 28th Conference on Learning Theory. 2015, 211–225
40 GNiu, d M C Plessis, TSakai, YMa, M Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 1207–1215
41 MKawakita, J Takeuchi. Safe semi-supervised learning based on weighted likelihood. Neural Networks, 2014, 53: 146–164
https://doi.org/10.1016/j.neunet.2014.01.016
42 Y FLi, H WZha, Z HZhou. Learning safe prediction for semisupervised regression. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2217–2223
43 Z HZhou. A brief introduction to weakly supervised learning. National Science Review, 2017, 5(1): 44–53
https://doi.org/10.1093/nsr/nwx106
44 BFrénay, M Verleysen. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 845–869
https://doi.org/10.1109/TNNLS.2013.2292894
45 S JPan, QYang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359
https://doi.org/10.1109/TKDE.2009.191
46 L ZGuo, Y FLi. A general formulation for safely exploiting weakly supervised data. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 3126–3133
47 TWei, L ZGuo, Y FLi, W Gao. Learning safe multi-label prediction for weakly labeled data. Machine Learning, 2018, 107(4): 703–725
https://doi.org/10.1007/s10994-017-5675-z
48 TWei, Y FLi. Does tail label help for large-scale multi-label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2847–2853
https://doi.org/10.24963/ijcai.2018/395
49 TWei, Y FLi. Learning from semi-supervised weak-label data. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019
50 FLi, YQian, JWang, C Dang, BLiu. Cluster’s quality evaluation and selective clustering ensemble. ACM Transactions on Knowledge Discovery from Data, 2018, 12(5): 60
https://doi.org/10.1145/3211872
51 YQian, FLi, JLiang, B Liu, CDang. Space structure and clustering of categorical data. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2047–2059
https://doi.org/10.1109/TNNLS.2015.2451151
52 QYao, MWang, YChen, W Dai, Y QHu, Y FLi, W WTu, QYang, Y Yu. Taking human out of learning applications: a survey on automated machine learning. 2018, arXiv preprint arXiv: 1810.13306
53 MFeurer, AKlein, KEggensperger, J TSpringenberg, M Blum, FHutter. Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2755–2763
54 Y FLi, HWang, TWei, W W Tu. Towards automated semi-supervised learning. In: Proceedings of the 33rd Conference on Artificial Intelligence. 2019
55 QDa, YYu, Z HZhou. Learning with augmented class by exploiting unlabeled data. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1760–1766
56 YZhu, K MTing, Z HZhou. New class adaptation via instance generation in one-pass class incremental learning. In: Proceedings of the IEEE International Conference on Data Mining. 2017, 1207–1212
https://doi.org/10.1109/ICDM.2017.163
57 YZhu, K MTing, Z HZhou. Multi-label learning with emerging new labels. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(10): 1901–1914
https://doi.org/10.1109/TKDE.2018.2810872
[1] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[2] Yan-Ping SUN, Min-Ling ZHANG. Compositional metric learning for multi-label classification[J]. Front. Comput. Sci., 2021, 15(5): 155320-.
[3] Jian SUN, Pu-Feng DU. Predicting protein subchloroplast locations: the 10th anniversary[J]. Front. Comput. Sci., 2021, 15(2): 152901-.
[4] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[5] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[6] Wenhao ZHENG, Hongyu ZHOU, Ming LI, Jianxin WU. CodeAttention: translating source code to comments by exploiting the code constructs[J]. Front. Comput. Sci., 2019, 13(3): 565-578.
[7] Hao SHAO. Query by diverse committee in transfer active learning[J]. Front. Comput. Sci., 2019, 13(2): 280-291.
[8] Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
[9] Xiangyu GUO, Wei WANG. Towards making co-training suffer less from insufficient views[J]. Front. Comput. Sci., 2019, 13(1): 99-105.
[10] Ruochen HUANG, Xin WEI, Liang ZHOU, Chaoping LV, Hao MENG, Jiefeng JIN. A survey of data-driven approach on multimedia QoE evaluation[J]. Front. Comput. Sci., 2018, 12(6): 1060-1075.
[11] Qiang LV, Yixin CHEN, Zhaorong LI, Zhicheng CUI, Ling CHEN, Xing ZHANG, Haihua SHEN. Achieving data-driven actionability by combining learning and planning[J]. Front. Comput. Sci., 2018, 12(5): 939-949.
[12] Ashish Kumar DWIVEDI, Anand TIRKEY, Santanu Kumar RATH. Software design pattern mining using classification-based techniques[J]. Front. Comput. Sci., 2018, 12(5): 908-922.
[13] Hai WANG, Shao-Bo WANG, Yu-Feng LI. Instance selection method for improving graph-based semi-supervised learning[J]. Front. Comput. Sci., 2018, 12(4): 725-735.
[14] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[15] Min-Ling ZHANG, Yu-Kun LI, Xu-Ying LIU, Xin GENG. Binary relevance for multi-label learning: an overview[J]. Front. Comput. Sci., 2018, 12(2): 191-202.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed