Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng Chin    0, Vol. Issue () : 56-71    https://doi.org/10.1007/s11460-011-0127-1
RESEARCH ARTICLE
Learning from imbalanced data sets with a Min-Max modular support vector machine
Bao-Liang LU1,2(), Xiao-Lin WANG1, Yang YANG3, Hai ZHAO1,2
1. Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. MOE-Microsoft Key Laboratory for Intelligent Computing and Intelligent Systems, Shanghai Jiao Tong University, Shanghai 200240, China; 3. Department of Computer Science and Engineering, Shanghai Maritime University, Shanghai 201306, China
 Download: PDF(590 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Imbalanced data sets have significantly unequal distributions between classes. This between-class imbalance causes conventional classification methods to favor majority classes, resulting in very low or even no detection of minority classes. A Min-Max modular support vector machine (M3-SVM) approaches this problem by decomposing the training input sets of the majority classes into subsets of similar size and pairing them into balanced two-class classification subproblems. This approach has the merits of using general classifiers, incorporating prior knowledge into task decomposition and parallel learning. Experiments on two real-world pattern classification problems, international patent classification and protein subcellar localization, demonstrate the effectiveness of the proposed approach.

Keywords imbalanced data      Min-Max modular network (M3-network)      prior knowledge      parallel learning      support vector machine (SVM)     
Corresponding Author(s): LU Bao-Liang,Email:bllu@sjtu.edu.cn   
Issue Date: 05 March 2011
 Cite this article:   
Bao-Liang LU,Xiao-Lin WANG,Yang YANG, et al. Learning from imbalanced data sets with a Min-Max modular support vector machine[J]. Front Elect Electr Eng Chin, 0, (): 56-71.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-011-0127-1
https://academic.hep.com.cn/fee/EN/Y0/V/I/56
1 He H B, Garcia E A. Learning from imbalanced data. IEEE Transaction on Knowledge and Data Engineering , 2009, 21(9): 1263-1284
doi: 10.1109/TKDE.2008.239
2 Japkowicz N. Learning from imbalanced data sets. In: Proceedings of Workshops at the 17th National Conference on Artificial Intelligence , 2000
3 Chawla N V, Japkowicz N, Kolcz A. Workshop Learning from Imbalanced Data Sets II, Machine Learning , 2003
4 Chawla N V, Japkowicz N, Kolcz A. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter , 2004, 6(1): 1-6
doi: 10.1145/1007730.1007733
5 Lu B L, Wang K A, Utiyama M, Isahara H. A part-versuspart method for massively parallel training of support vector machines. In: Proceedings of IEEE/INNS International Joint Conference on Neural Networks . 2004, 735-740
6 Lu B L, Ito M. Task decomposition based on class relations: a modular neural network architecture for pattern classification. Lecture Notes in Computer Science , 1997, 1240: 330-339
doi: 10.1007/BFb0032491
7 Lu B L, Ito M. Task decomposition and module combination based on class relations: a modular neural network for pattern classification. IEEE Transactions on Neural Networks , 1999, 10(5): 1244-1256
doi: 10.1109/72.788664
8 Ye Z F, Wen Y M, Lu B L. A survey of imbalanced pattern classification problems. CAAI Transactions on Intelligent Systems , 2009: 148-156 (in Chinese)
9 Estabrooks A, Jo T, Japkowicz N. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence , 2004, 20(1): 18-36
doi: 10.1111/j.0824-7935.2004.t01-1-00228.x
10 Laurikkala J. Improving identification of difficult small classes by balancing class distribution. In: Proceedings of the Conference on Artificial Intelligence in Medicine in Europe . 2001, 63-66
doi: 10.1007/3-540-48229-6_9
11 Weiss G M, Provost F. The effect of class distribution on classifier learning: an empirical study. Technical Report MLTR-43 . 2001
12 Batista G E A P A, Prati R C, Monard M C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter , 2004, 6(1): 20-29
doi: 10.1145/1007730.1007735
13 Kubat M, Matwin S. Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of International Conference on Machine Learning . 1997, 179-186
14 Zhang J, Mani I. KNN approach to unbalanced data distributions: a case study involving information extraction. In: Prceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets . 2003, 1-7
15 Liu X Y, Wu J, Zhou Z H. Exploratory under sampling for class imbalance learning. In: Proceedings of International Conference on Data Mining . 2006, 965-969
16 Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research , 2002, 16(3): 321-357
17 Jo T, Japkowicz N. Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter , 2004, 6(1): 40-49
doi: 10.1145/1007730.1007737
18 Chawla N V, Lazarevic A, Hall L O, Bowyer K W. SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases . 2003, 107-119
19 Guo H, Viktor H L. Learning from imbalanced data sets with boosting and data generation: the dataBoost IM approach. ACM SIGKDD Explorations Newsletter , 2004, 6(1): 30-39
doi: 10.1145/1007730.1007736
20 Mease D, Wyner A J, Buja A. Boosted classification trees and class probability/quantile estimation. Machine Learning Research , 2007, 8: 409-439
21 Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown. In: Proceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets II . 2003, 1-8
22 Weiss G M. Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter , 2004, 6(1): 7-19
doi: 10.1145/1007730.1007734
23 Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: An empirical study. In: Proceedings of International Conference on Data Mining . 2006, 970-974
24 Liu X Y, Zhou Z H. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering , 2006, 18(1): 63-77
doi: 10.1109/TKDE.2006.17
25 McCarthy K, Zabar B, Weiss G M. Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of International Workshop Utility-Based Data Mining . 2005, 69-77
doi: 10.1145/1089827.1089836
26 Fan W, Stolfo S J, Zhang J, Chan P K. AdaCost: misclassification cost-sensitive boosting. In: Proceedings of International Conference on Machine Learning . 1999, 97-105
27 Sun Y, Kamel M S, Wong A K C, Wang Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition , 2007, 40(12): 3358-3378
doi: 10.1016/j.patcog.2007.04.009
28 Ting K M. A comparative study of cost-sensitive boosting algorithms. In: Proceedings of International Conference on Machine Learning . 2000, 983-990
29 Haykin S. Neural Networks: A Comprehensive Foundation. 2nd ed. New Jersey: Prentice-Hall, 1999
30 Kukar M Z, Kononenko I. Cost-sensitive learning with neural networks. In: Proceedings of the 13th European Conference on Artificial Intelligence . 1998, 445-449
31 Domingos P, Pazzani M. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the International Conference on Machine Learning . 1996, 105-112
32 Gama J. Iterative bayes. Theoretical Computer Science , 2003, 292(2): 417-430
doi: 10.1016/S0304-3975(02)00179-2
33 Kohavi R, Wolpert D. Bias plus variance decomposition for zero-one loss functions. In: Proceedings of International Conference on Machine Learning . 1996, 275-283
34 Webb G R I, Pazzani M J. Adjusted probability naive Bayesian induction. In: Proceedings of the 11th Australian Joint Conference on Artificial Intelligence . 1998, 285-295
35 Drummond C, Holte R C. Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: Proceedings of the International Conference on Machine Learning . 2000, 239-246
36 Vapnik V N. The Nature of Statistical Learning Theory. Berlin: Springer, 1995
37 Joachims T. Making large-scale support vector machine learning practical. Advances in Kernel Methods: Support Vector Learning . Cambridge: MIT Press, 1998, 169-184
38 Joachims T. A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning . 2005, 377-384
39 Fan R E, Chen P H, Lin C J. LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/cjlin/libsvm/
40 Liu T Y, Yang Y M, Wan H, Zeng H J, Chen Z, Ma W Y. Support vector machines classification with a very largescale taxonomy. Journal of ACM Special Interest Group on Discovery and Data Mining Explorations , 2005, 7(1): 36-43
41 Yang Y M, Pedersen J O. A comparattive study on feature selection in text categorization. In: Proceedings of International Conference on Machine Learning . 1997, 187-196
42 Wu G, Chang E. Class-boundary alignment for imbalanced data set learning. In: Proceedings of International Conference on Data Mining, Workshop Learning from Imbalanced Data Sets II . 2003, 1-8
43 Wu G, Chang E Y. Aligning boundary in kernel space for learning imbalanced data set. In: Proceedings of International Conference on Data Mining . 2004, 265-272
44 Wu G, Chang E Y. KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Transactions on Knowledge and Data Engineering , 2005, 17(6): 786-795
doi: 10.1109/TKDE.2005.95
45 Kang P, Cho S. EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. Lecture Notes in Computer Science , 2006, 4232: 837-846
doi: 10.1007/11893028_93
46 Liu Y, An A, Huang X. Boosting prediction accuracy on imbalanced data sets with SVM ensembles. Lecture Notes in Artificial Intelligence , 2006, 3918: 107-118
47 Vilarino F, Spyridonos P, Radeva P, Vitria J. Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. Lecture Notes in Computer Science , 2005, 3687: 783-791
doi: 10.1007/11552499_86
48 Wang B X, Japkowicz N. Boosting support vector mMachines for imbalanced data sets. Lecture Notes in Artificial Intelligence , 2008, 4994: 38-47
49 Abe N. Sampling approaches to learning from imbalanced data sets: active learning, cost sensitive learning and deyond. In: Proceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets II . 2003
50 Ertekin S, Huang J, Bottou L, Giles L. Learning on the border: active learning in imbalanced data classification. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management . 2007, 127-136
51 Ertekin S, Huang J, Giles C L. Active learning for class imbalance problem. In: Proceedings of International SIGIR Conference on Research and Development in Information Retrieval . 2007, 823-824
52 Provost F. Machine learning from imbalanced data sets 101. In: Proceedings of American Association Artificial Intelligence Workshop on Imbalanced Data Sets . 2000, 1-3
53 Lu B L, Wang X L, Utiyama M. Incorporating prior knowledge into learning by dividing training data. Frontiers of Computer Science in China , 2009, 3(1): 109-122
doi: 10.1007/s11704-009-0013-7
54 Lu B L, Ichikawa M. A Gaussian zero-crossing discriminant function for Min-Max modular neural networks. In: Proceedings of the 5th International Conference on Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies . 2001, 298-302
55 Lu B L, Ichikawa M. Emergent on-line learning with a Gaussian zero-crossing discriminant function. In: Proceedings of IEEE/INNS International Joint Conference on Neural Networks . 2002, 2: 1263-1268
56 Lu B L, Li J. A Min-Max modular network with Gaussianzero-crossing function. In: Chen K, Wang L, eds. Trends in Neural Computation . Berlin: Springer, 2007, 285-313
doi: 10.1007/978-3-540-36122-0_12
57 Wang K A, Zhao H, Lu B L. Task decomposition using geometric relation for Min-Max modular SVMs. Lecture Notes in Computer Science , 2005, 3496: 887-892
doi: 10.1007/11427391_142
58 Wen Y M, Lu B L, Zhao H. Equal clustering makes Min-Max modular support vector machine more efficient. In: Proceedings of the 12th International Conference on Neural Information Processing . 2005, 77-82
59 Cong C, Lu B L. Partition of sample space with perceptrons. Computer simulation , 2008, 25(2): 96-99 (in Chinese)
60 Ma C, Lu B L, Utiyama M. Incorporating prior knowledge into task decomposition for large-scale patent classification. In: Proceedings of 6th International Symposium on Neural Networks: Advances in Neural Network-Part II . 2009, 784-793
61 Zhao H, Lu B L. A modular k-nearest neighbor classification method for massively parallel text categorization. Lecture Notes in Computer Science , 2004, 3314: 867-872
doi: 10.1007/978-3-540-30497-5_134
62 Zhao H, Lu B L. Improvement on response performance of Min-Max modular classifier by symmetric module selection. Lecture Notes in Computer Science , 2005, 3497: 39-44
doi: 10.1007/11427445_7
63 Lu B L, Wang X L. A parallel and modular pattern classification framework for large-scale problems. In: Chen C H, ed. Handbook of Pattern Recognition and Computer Vision . 4th ed. Singapore: World Scientific, 2009, 725-746
doi: 10.1142/9789814273398_0032
64 Fall C J, T?rcsvári A, Benzineb K, Karetka G. Automated categorization in the international patent classification. ACM SIGIR Forum , 2003, 37(1): 10-25
doi: 10.1145/945546.945547
65 Fujii A, Iwayama M, Kando N. Introduction to the special issue on patent processing. Information Processing and Management , 2007, 43(5): 1149-1153
doi: 10.1016/j.ipm.2006.11.004
66 Chu X L, Ma C, Li J, Lu B L, Utiyama M, Isahara H. Largescale patent classification with Min-Max modular support vector machines. In: Proceedings of IEEE/INNS International Joint Conference on Neural Networks . 2008, 3973-3980
67 Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys , 2002, 34(1): 1-47
doi: 10.1145/505282.505283
68 Cedano J, Aloy P, Pérez-Pons J A, Querol E. Relation between amino acid composition and cellular location of proteins. Journal of Molecular Biology , 1997, 266(3): 594-600
doi: 10.1006/jmbi.1996.0804
69 Chou K C, Shen H B. Review: recent progresses in protein subcellular location prediction. Analytical Biochemistry , 2007, 370(1): 1-16
doi: 10.1016/j.ab.2007.07.006
70 Cai Y D, Chou K C. Predicting 22 protein localizations in budding yeast. Biochemical and Biophysical Research Communications , 2004, 323(2): 425-428
doi: 10.1016/j.bbrc.2004.08.113
71 Yang Y, Lu B L. Prediction of protein subcellular multilocalization by using a Min-Max modular support vector machine. Advances in Computational Intelligence, Ascvances in Soft Computing , 2009, 116: 133-143
doi: 10.1007/978-3-642-03156-4_14
72 Zhang S, Xia X, Shen J, Zhou Y, Sun Z. DBMLoc: a data base of protein swith multiple subcellular localizations. BMC Bioinformatics , 2008, 9(1): 127
doi: 10.1186/1471-2105-9-127
73 Gene Ontology Consortium. gene ontology: tool for the unification of biology. Nature Genetics , 2000, 25(1): 25-29
doi: 10.1038/75556
74 Chou K C, Cai Y D. Predicting protein localization in budding yeast. Bioinformatics , 2005, 21(7): 944-950
doi: 10.1093/bioinformatics/bti104
75 Wang J Z, Du Z, Payattakool R, Yu P S, Chen C F. A new method to measure the semantic similarity of GO terms. Bioinformatics , 2007, 23(10): 1274-1281
doi: 10.1093/bioinformatics/btm087
76 Karypis G. CLUTO-A Clustering Toolkit, Technical Report 02-017. 2002
77 Huh W K, Falvo J V, Gerke L C, Carroll A S, Howson R W, Weissman J S, O’Shea E K. Global analysis of protein localization in budding yeast. Nature , 2003, 425(6959): 686-691
doi: 10.1038/nature02026
[1] Yafeng WANG, Fuchun SUN, Huaping LIU, Dongfang YANG. Maximal terminal region approach for MPC using subsets sequence[J]. Front Elect Electr Eng, 2012, 7(2): 270-278.
[2] Zhiyuan LIU, Maosong SUN. Can prior knowledge help graph-based methods for keyword extraction?[J]. Front Elect Electr Eng, 2012, 7(2): 242-253.
[3] Lubin WANG, Hui SHEN, Baojuan LI, Dewen HU. Classification of schizophrenic patients and healthy controls using multiple spatially independent components of structural MRI data[J]. Front Elect Electr Eng Chin, 2011, 6(2): 353-362.
[4] Huanjun LIU. Empty glass bottle inspection method based on fuzzy support vector machine neural network and machine vision[J]. Front Elect Electr Eng Chin, 2010, 5(4): 430-440.
[5] Na SUN, Yajian ZHOU, Yixian YANG. Consistency of weighted feature set and polyspectral kernels in individual communication transmitter identification[J]. Front Elect Electr Eng Chin, 2010, 5(4): 488-492.
[6] Ying CAO, Xin HAO, Xiaoen ZHU, Shunren XIA, . An adaptive region growing algorithm for breast masses in mammograms[J]. Front. Electr. Electron. Eng., 2010, 5(2): 128-136.
[7] Rongyan WANG, Gang LIU, Jun GUO, Yu FANG, . Multi-class classifier of non-speech audio based on Fisher kernel[J]. Front. Electr. Electron. Eng., 2010, 5(1): 72-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed