Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (1) : 171337    https://doi.org/10.1007/s11704-022-2154-x
LETTER
Max-difference maximization criterion: a feature selection method for text categorization
Lingbin JIN, Li ZHANG(), Lei ZHAO
School of Computer Science and Technology & Advanced Data Analysis Research Center, Soochow University, Suzhou 215006, China
 Download: PDF(324 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Corresponding Author(s): Li ZHANG   
Just Accepted Date: 28 October 2022   Issue Date: 11 January 2023
 Cite this article:   
Lingbin JIN,Li ZHANG,Lei ZHAO. Max-difference maximization criterion: a feature selection method for text categorization[J]. Front. Comput. Sci., 2023, 17(1): 171337.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2154-x
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I1/171337
Classifier Dataset OR MI MMR NDM RDC EFS TCM TRDC TNDM TMMR TTCM MDMC
SVM K1a 47.28 58.23 72.35 68.12 64.25 69.32 70.44 68.40 73.38 74.65 74.33 75.94
K1b 79.41 82.49 92.47 87.26 86.80 93.62 92.39 89.03 92.70 93.80 93.85 95.15
La2 53.49 43.48 71.13 63.69 68.30 76.26 69.33 70.82 71.74 76.42 74.43 78.31
LR K1a 48.11 58.54 72.50 68.79 64.31 69.33 71.08 68.32 73.65 74.64 74.63 75.67
K1b 81.17 82.06 92.91 88.02 87.15 93.78 93.03 89.24 93.10 93.90 94.23 95.46
La2 54.93 44.85 72.26 65.93 68.88 76.66 70.78 71.31 73.00 77.41 76.15 78.73
Tab.1  Comparison of average Micro-F1 (%) for filter algorithms
Classifier Dataset OR MI MMR NDM RDC EFS TCM TRDC TNDM TMMR TTCM MDMC
SVM K1a 28.94 42.15 55.39 49.97 47.64 52.94 52.68 52.24 56.22 57.53 57.19 60.54
K1b 52.97 68.83 84.89 73.41 76.48 86.83 79.84 80.25 80.76 84.15 83.71 90.62
La2 37.57 32.66 61.29 52.81 62.85 69.56 58.97 65.68 63.23 70.50 68.30 73.63
LR K1a 28.95 41.85 55.29 50.16 47.49 52.56 52.97 52.07 55.90 57.27 57.62 59.83
K1b 53.38 68.69 85.33 74.34 76.97 87.18 80.61 81.02 81.57 84.48 83.96 91.06
La2 38.49 35.17 62.77 55.24 63.84 70.36 60.92 66.34 64.83 71.95 70.49 74.26
Tab.2  Comparison of average Macro-F1 (%) for filter algorithms
Classifier Dataset ISCA PCA NMF MDMC
SVM K1a 81.67 82.52 74.87 87.14
K1b 94.94 94.44 92.65 98.29
La2 83.35 85.79 77.88 87.77
LR K1a 82.95 84.66 73.93 86.84
K1b 97.61 96.15 92.31 98.42
La2 84.68 87.61 80.03 89.40
Tab.3  Comparison of max Micro-F1 (%) for more methods
Classifier Dataset ISCA PCA NMF MDMC
SVM K1a 66.68 70.48 51.55 76.08
K1b 96.71 87.97 76.17 97.27
La2 80.52 83.56 71.86 85.58
LR K1a 67.81 72.21 47.51 74.53
K1b 96.01 90.49 74.14 97.13
La2 82.04 85.51 73.99 87.50
Tab.4  Comparison of max Macro-F1 (%) for more methods
1 G Forman . An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 2003, 3: 1289–1305
2 A, Rehman K, Javed H A Babri . Feature selection based on a normalized difference measure for text classification. Information Processing & Management, 2017, 53( 2): 473–489
3 A, Rehman K, Javed H A, Babri M N Asim . Selection of the most relevant terms based on a max-min ratio metric for text classification. Expert Systems with Applications, 2018, 114: 78–96
4 K, Kim S Y Zzang . Trigonometric comparison measure: a feature selection method for text categorization. Data & Knowledge Engineering, 2019, 119: 1–21
5 H, Zhou Y, Ma X Li . Feature selection based on term frequency deviation rate for text classification. Applied Intelligence, 2021, 51( 6): 3255–3274
6 Y, Zhao G, Karypis U Fayyad . Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 2005, 10( 2): 141–168
7 A, Rehman K, Javed H A, Babri M Saeed . Relative discrimination criterion − A novel feature ranking method for text data. Expert Systems with Applications, 2015, 42( 7): 3670–3681
8 B, Parlak A K Uysal . A novel filter feature selection method for text classification: extensive feature selector. Journal of Information Science, 2021, 47( 2): 1–20
9 M, Belazzoug M, Touahria F, Nouioua M Brahimi . An improved sine cosine algorithm to select features for text categorization. Journal of King Saud University-Computer and Information Sciences, 2020, 32( 4): 454–464
[1] FCS-22154-of-LJ_suppl_1 Download
[2] FCS-22154-of-LJ_suppl_2 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed