Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci    2012, Vol. 6 Issue (5) : 489-497    https://doi.org/10.1007/s11704-012-2943-8
RESEARCH ARTICLE
Measure oriented training: a targeted approach to imbalanced classification problems
Bo YUAN(), Wenhuang LIU
Division of Informatics, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
 Download: PDF(462 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques including sampling and cost sensitive learning are often employed to improve the performance of classifiers in such situations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between themeasure according to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard threelayer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently favorable outcomes in comparison with a commonly used sampling technique. The effectiveness of multi-objective optimization in handling imbalanced problems is also demonstrated.

Keywords imbalanced datasets      genetic algorithms (GAs)      neural networks      G-mean      synthetic minority over-sampling technique (SMOTE)     
Corresponding Author(s): YUAN Bo,Email:yuanb@sz.tsinghua.edu.cn   
Issue Date: 01 October 2012
 Cite this article:   
Bo YUAN,Wenhuang LIU. Measure oriented training: a targeted approach to imbalanced classification problems[J]. Front Comput Sci, 2012, 6(5): 489-497.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-012-2943-8
https://academic.hep.com.cn/fcs/EN/Y2012/V6/I5/489
1 Chawla N V. Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers . New York: Springer, 2005, 853-867
doi: 10.1007/0-387-25465-X_40
2 Han S, Yuan B, Liu W. Rare class mining: progress and prospect. In: Proceedings of the 2009 Chinese Conference on Pattern Recognition . 2009, 137-141
doi: 10.1109/CCPR.2009.5344137
3 Qu X, Yuan B, Liu W. A predictive model for identifying possible MCI to AD conversions in the ADNI database. In: Proceeding of the 2nd International Symposium on Knowledge Acquisition and Modeling, Vol 3 . 2009, 102-105
doi: 10.1109/KAM.2009.36
4 Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning . 1996, 148-156
5 Chawla N V, Lazarevic A, Hall L O, Bowyer K W. SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases . 2003, 107-119
6 Fan W, Stolfo S J, Zhang J, Chan P K. AdaCost: misclassification costsensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning . 1999, 97-105
7 Hoens T R, Chawla N V. Generating diverse ensembles to counter the problem of class imbalance. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part II . 2010, 488-499
8 Yuan B, Liu W. A measure oriented training scheme for imbalanced classification problems. In: Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop on Biologically Inspired Techniques for Data Mining . 2011, 293-303
9 Kubat M, Matwin S. Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of the 14th Interactional Conference on Machine Learning . 1997, 179-186
10 Liu X, Wu J, Zhou Z. Exploratory under-sampling for class-imbalance learning. In: Proceedings of the 6th International Conference on Data Mining . 2006, 965-969
11 Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research , 2002, 16(1): 321-357
12 Yao X. Evolving artificial neural networks. Proceedings of the IEEE , 1999, 87(9): 1423-1447
doi: 10.1109/5.784219
13 Goldberg D. Genetic Algorithms in Search, Optimization, and Machine Learning. Boston: Addison Wesley , 1989
14 Frank A, Asuncion A. UCI machine learning repository. 2010, http://archive.ics.uci.edu/ml
15 Mangasarian O L, Setiono R, Wolberg W H. Pattern recognition via linear programming: theory and application to medical diagnosis. In: Coleman T F, Li Y, eds. Large-Scale Numerical Optimization . 1990, 22-30
16 Cortez P, Cerdeira A, Almeida F, Matos T, Reis J. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems , 2009, 47(4): 547-553
doi: 10.1016/j.dss.2009.05.016
17 Horton P, Nakai K. A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology . 1996, 109-115
18 Jin Y, Sendhoff B. Pareto-based multiobjective machine learning: an overview and case studies. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews , 2008, 38(3): 397-415
doi: 10.1109/TSMCC.2008.919172
19 Bhowan U, Zhang M, Johnston M. Multi-objective genetic programming for classification with unbalanced data. In: Proceedings of the 22nd Australasian Conference on Artificial Intelligence . 2009, 370-380
20 Ducange P, Lazzerini B, Marcelloni F. Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft Computing , 2010, 14(7): 713-728
doi: 10.1007/s00500-009-0460-y
21 García S, Aler R, Galván I. Using evolutionary multiobjective techniques for imbalanced classification data. In: Proceedings of the 20th International Conference on Artificial Neural Networks . 2010, 422-427
[1] Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[2] Wangli HAO, Ian Max ANDOLINA, Wei WANG, Zhaoxiang ZHANG. Biologically inspired visual computing: the state of the art[J]. Front. Comput. Sci., 2021, 15(1): 151304-.
[3] Yongzhong HE, Endalew Elsabeth ALEM, Wei WANG. Hybritus: a password strength checker by ensemble learning from the query feedbacks of websites[J]. Front. Comput. Sci., 2020, 14(3): 143802-.
[4] Zhiqian ZHANG, Chenliang LI, Zhiyong WU, Aixin SUN, Dengpan YE, Xiangyang LUO. NEXT: a neural network framework for next POI recommendation[J]. Front. Comput. Sci., 2020, 14(2): 314-333.
[5] Anna ZHU, Seiichi UCHIDA. Scene word recognition from pieces to whole[J]. Front. Comput. Sci., 2019, 13(2): 292-301.
[6] Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU. Robust feature learning for online discriminative tracking without large-scale pre-training[J]. Front. Comput. Sci., 2018, 12(6): 1160-1172.
[7] Qianjun ZHANG, Lei ZHANG. Convolutional adaptive denoising autoencoders for hierarchical feature extraction[J]. Front. Comput. Sci., 2018, 12(6): 1140-1148.
[8] Lili HUANG, Jiefeng PENG, Ruimao ZHANG, Guanbin LI, Liang LIN. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Front. Comput. Sci., 2018, 12(5): 840-857.
[9] Zhen LI, Yuqing WANG, Tian ZHI, Tianshi CHEN. A survey of neural network accelerators[J]. Front. Comput. Sci., 2017, 11(5): 746-761.
[10] Jian-Hao LUO,Wang ZHOU,Jianxin WU. Image categorization with resource constraints: introduction, challenges and advances[J]. Front. Comput. Sci., 2017, 11(1): 13-26.
[11] Feifei ZHANG,Yongbin YU,Qirong MAO,Jianping GOU,Yongzhao ZHAN. Pose-robust feature learning for facial expression recognition[J]. Front. Comput. Sci., 2016, 10(5): 832-844.
[12] Samir ZEGHLACHE,Djamel SAIGAA,Kamel KARA. Fault tolerant control based on neural network interval type-2 fuzzy sliding mode controller for octorotor UAV[J]. Front. Comput. Sci., 2016, 10(4): 657-672.
[13] Yi ZHENG,Qi LIU,Enhong CHEN,Yong GE,J. Leon ZHAO. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification[J]. Front. Comput. Sci., 2016, 10(1): 96-112.
[14] Dabin ZHANG, Lean YU, Shouyang WANG, Haibin XIE. Neural network methods for forecasting turning points in economic time series: an asymmetric verification to business cycles[J]. Front Comput Sci Chin, 2010, 4(2): 254-262.
[15] Heping PAN, Imad HAIDAR, Siddhivinayak KULKARNI. Daily prediction of short-term trends or crude oil prices using neural networks exploiting multimarket dynamics[J]. Front Comput Sci Chin, 2009, 3(2): 177-191.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed