Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (1) : 99-105    https://doi.org/10.1007/s11704-018-7138-5
RESEARCH ARTICLE
Towards making co-training suffer less from insufficient views
Xiangyu GUO, Wei WANG()
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
 Download: PDF(728 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Co-training is a famous semi-supervised learning algorithm which can exploit unlabeled data to improve learning performance. Generally it works under a two-view setting (the input examples have two disjoint feature sets in nature), with the assumption that each view is sufficient to predict the label. However, in real-world applications due to feature corruption or feature noise, both views may be insufficient and co-training will suffer from these insufficient views. In this paper, we propose a novel algorithm named Weighted Co-training to deal with this problem. It identifies the newly labeled examples that are probably harmful for the other view, and decreases their weights in the training set to avoid the risk. The experimental results show that Weighted Co-training performs better than the state-of-art co-training algorithms on several benchmarks.

Keywords semi-supervised learning      co-training      insufficient views     
Corresponding Author(s): Wei WANG   
Just Accepted Date: 04 September 2017   Online First Date: 04 September 2018    Issue Date: 31 January 2019
 Cite this article:   
Xiangyu GUO,Wei WANG. Towards making co-training suffer less from insufficient views[J]. Front. Comput. Sci., 2019, 13(1): 99-105.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7138-5
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I1/99
1 D JMiller, H SUyar. A mixture of experts classifier with learning based on both labelled and unlabelled data. Advances in Neural Information Processing Systems, 1997, 571–577
2 KNigam, A McCallum, SThrun, TMitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2/3): 103–134
https://doi.org/10.1023/A:1007692713085
3 K PBennett, A Demiriz. Semi-supervised support vector machines. Advances in Neural Information Processing Systems, 1998, 368–374
4 TJoachims. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 200–209
5 ABlum, SChawla. Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 19–26
6 XZhu, Z Ghahramani, JLafferty. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning. 2003, 912–919
7 DZhou, O Bousquet, T NLal, JWeston, B Schölkopf. Learning with local and global consistency. Advances in Neural Information Processing Systems, 2003, 321–328
8 ABlum, T Mitchell. Combining labeled and unlabeled data with cotraining. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100
9 Z HZhou, MLi. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529–1541
https://doi.org/10.1109/TKDE.2005.186
10 Z HZhou , MLi. Semi-supervised learning by disagreement. Knowledge and Information System, 2010, 24(3): 415–439
https://doi.org/10.1007/s10115-009-0209-z
11 KNigam, RGhani. Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2000, 86–93
https://doi.org/10.1145/354756.354805
12 S AGoldman, YZhou. Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning. 2000, 327–334
13 SKiritchenko, SMatwin. Email classification with co-training. In: Proceedings of the 2001 Conference of the Centre for Advanced Studies on Collaborative Research. 2001, 301–312
14 BMaeireizo, DLitman, RHwa. Co-training for predicting emotions with spoken dialogue data. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. 2004, 28
https://doi.org/10.3115/1219044.1219072
15 XWan. Co-training for cross-lingual sentiment classification. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009, 235–243
https://doi.org/10.3115/1687878.1687913
16 RLiu, JCheng, HLu. A robust boosting tracker with minimum error bound in a co-training framework. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 1459–1466
17 S PAbney. Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002, 360–367
18 M FBalcan, ABlum, KYang. Co-training and expansion: towards bridging theory and practice. Advances in Neural Information Processing Systems, 2004, 89–96
19 WWang, Z HZhou. A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 1135–1142
20 WWang, Z HZhou. Analyzing co-training style algorithms. In: Proceedings of the 18th European Conference on Machine Learning. 2007, 454–465
https://doi.org/10.1007/978-3-540-74958-5_42
21 WWang, Z HZhou. Co-training with insufficient views. In: Proceedings of the 5th Asian Conference on Machine Learning. 2013, 467–482
22 JXu, HHe, HMan. DCPE co-training for classification. Neurocomputing, 2012, 86: 75–85
https://doi.org/10.1016/j.neucom.2012.01.006
23 NKushmerick . Learning to remove internet advertisements. In: Proceedings of the 3rd Annual Conference on Autonomous Agents. 1999, 175–181
https://doi.org/10.1145/301136.301186
24 C LGiles, K D Bollacker, SLawrence. Citeseer: an automatic citation indexing system. In: Proceedings of the 3rd ACM International Conference on Digital Libraries. 1998, 89–98
https://doi.org/10.1145/276675.276685
25 GBisson, CGrimal. Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 12th IEEE International Conference on Data Mining. 2012, 828–833
https://doi.org/10.1109/ICDM.2012.93
26 MLichman. UCI machine learning repository. 2013
[1] Yu-Feng LI, De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
[2] Hai WANG, Shao-Bo WANG, Yu-Feng LI. Instance selection method for improving graph-based semi-supervised learning[J]. Front. Comput. Sci., 2018, 12(4): 725-735.
[3] Xianfa CAI,Guihua WEN,Jia WEI,Zhiwen YU. Relative manifold based semi-supervised dimensionality reduction[J]. Front. Comput. Sci., 2014, 8(6): 923-932.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed