Please wait a minute...
Frontiers of Optoelectronics

ISSN 2095-2759

ISSN 2095-2767(Online)

CN 10-1029/TN

Postal Subscription Code 80-976

Front. Optoelectron.    2017, Vol. 10 Issue (3) : 273-279    https://doi.org/10.1007/s12200-017-0726-4
RESEARCH ARTICLE
Recursive feature elimination in Raman spectra with support vector machines
Bernd KAMPE1, Sandra KLOß1,2, Thomas BOCKLITZ1,2, Petra RÖSCH1,2, Jürgen POPP1,2,3()
1. Institute of Physical Chemistry and Abbe Center of Photonics, University of Jena, Helmholtzweg 4, D-07743 Jena, Germany
2. InfectoGnostics Research Campus Jena, Center for Applied Research, Philosophenweg 7, 07743 Jena, Germany
3. Leibniz-Institute of Photonic Technology, Albert-Einstein-Straße 9, D-07745 Jena, Germany
 Download: PDF(210 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.

Keywords feature selection      Raman spectroscopy      pattern recognition      chemometrics     
Corresponding Author(s): Jürgen POPP   
Just Accepted Date: 21 June 2017   Online First Date: 14 July 2017    Issue Date: 26 September 2017
 Cite this article:   
Bernd KAMPE,Sandra KLOß,Thomas BOCKLITZ, et al. Recursive feature elimination in Raman spectra with support vector machines[J]. Front. Optoelectron., 2017, 10(3): 273-279.
 URL:  
https://academic.hep.com.cn/foe/EN/10.1007/s12200-017-0726-4
https://academic.hep.com.cn/foe/EN/Y2017/V10/I3/273
bacterial speciesin training setin validation set
E. faecalis42953
E. faecium25650
S. epidermidis22747
S. haemolyticus22552
S. hominis20749
S. saprophyticus23730
S. aureus28550
E. coli36037
K. pneumoniae23340
P. aeruginosa24939
P. mirabilis24467
Tab.1  Numbers of spectra per data set
Fig.1  Sample preprocessed Raman spectrum of E. faecalis showing the induced ranking of the method
Fig.2  Development of the cross-validation error rate on the training data set over the course of feature removal
Fig.3  Development of the classification error rate on the independent validation data set over the course of feature removal
Fig.4  Development of the classification error rate on the data set of patient samples over the course of feature removal
Fig.5  Trend of the accuracy of the model discriminating E. faecalis from E. coli based on cross-validation
1 Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. Journal of Raman Spectroscopy : JRS, 2016, 47(1): 89–109
https://doi.org/10.1002/jrs.4844
2 Meisel S, Stöckel S, Rösch P, Popp J. Identification of meat-associated pathogens via Raman microspectroscopy. Food Microbiology, 2014, 38: 36–43
https://doi.org/10.1016/j.fm.2013.08.007
3 Rösch P, Harz M, Schmitt M, Peschke K D, Ronneberger O, Burkhardt H, Motzkus H W, Lankers M, Hofer S, Thiele H, Popp J. Chemotaxonomic identification of single bacteria by micro-Raman spectroscopy: application to clean-room-relevant biological contaminations. Applied and Environmental Microbiology, 2005, 71(3): 1626–1637
https://doi.org/10.1128/AEM.71.3.1626-1637.2005 pmid: 15746368
4 Mukherjee S. Classifying Microarray Data Using Support Vector Machines in A Practical Approach to Microarray Data Analysis. Boston: Springer US, 2003, 166–185
5 Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, Popp J. A comprehensive study of classification methods for medical diagnosis. Journal of Raman Spectroscopy: JRS, 2009, 40(12): 1759–1765 
https://doi.org/10.1002/jrs.2529
6 Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324 
https://doi.org/10.1016/S0004-3702(97)00043-X
7 Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England), 2007, 23(19): 2507–2517
https://doi.org/10.1093/bioinformatics/btm344 pmid: 17720704
8 Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using Support Vector Machines. Machine Learning, 2002, 46(1/3): 389–422 
https://doi.org/10.1023/A:1012487302797
9 Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006, 83(2): 83–90
https://doi.org/10.1016/j.chemolab.2006.01.007
10 Menze B H, Kelm B M, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht F A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009, 10(1): 213 
https://doi.org/10.1186/1471-2105-10-213 pmid: 19591666
11 Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32 
https://doi.org/10.1023/A:1010933404324
12 Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics (Oxford, England), 2011, 27(14): 1986–1994
https://doi.org/10.1093/bioinformatics/btr300 pmid: 21576180
13 Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273–297
https://doi.org/10.1007/BF00994018
14 Kloß S, Kampe B, Sachse S, Rösch P, Straube E, Pfister W, Kiehntopf M, Popp J. Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. Analytical Chemistry, 2013, 85(20): 9610–9616
https://doi.org/10.1021/ac401806f pmid: 24010860
15 Morháč M, Kliman J, Matoušek V, Veselský M, Turzo I. Background elimination methods for multidimensional coincidence  g-ray spectra. Nuclear Instruments & Methods in Physics Research Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 401(1): 113–132 
https://doi.org/10.1016/S0168-9002(97)01023-1
16 Zhang D, Jallad K N, Ben-Amotz D. Stripping of cosmic spike spectral artifacts using a new upper-bound spectrum algorithm. Applied Spectroscopy, 2001, 55(11): 1523–1531
https://doi.org/10.1366/0003702011953757
17 Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Zeitschrift für Physikalische Chemie, 2011, 225(6–7): 753–764
https://doi.org/10.1524/zpch.2011.0077
18 Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York: ACM, 1992, 144–152
19 Vapnik V. The Nature of Statistical Learning Theory. 2nd ed. New York: Springer Science & Business Media, 2013
20 Couvreur C, Bresler Y. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 2000, 21(3): 797–808
https://doi.org/10.1137/S0895479898332928
21 Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, 5: 101–141
22 R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, 2016
23 Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab – An S4 package for kernel methods in R. Journal of Statistical Software, 2004, 11(9): 1–20 
https://doi.org/10.18637/jss.v011.i09
24 Van Campenhout J M. Topics in measurement selection. In: Handbook of Statistics. Elsevier, 1982, 793–803
25 Sima C, Dougherty E R. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters, 2008, 29(11): 1667–1674
https://doi.org/10.1016/j.patrec.2008.04.010
26 Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2011, 73(5): 753–772 
https://doi.org/10.1111/j.1467-9868.2011.00783.x pmid: 22323898
27 Lavine B K, Davidson C E, Moores A J, Griffiths P R. Raman spectroscopy and genetic algorithms for the classification of wood types. Applied Spectroscopy, 2001, 55(8): 960–966
28 Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157–1182
[1] Yuanyuan ZHOU,Hector F. GARCES,Nitin P. PADTURE. Challenges in the ambient Raman spectroscopy characterization of methylammonium lead triiodide perovskite thin films[J]. Front. Optoelectron., 2016, 9(1): 81-86.
[2] Yun-Qing CAO, Xin XU, Shu-Xin LI, Wei LI, Jun XU, Kunji CHEN. Improved photovoltaic properties of Si quantum dots/SiC multilayers-based heterojunction solar cells by reducing tunneling barrier thickness[J]. Front Optoelec, 2013, 6(2): 228-233.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed