|
|
Recursive feature elimination in Raman spectra with support vector machines |
Bernd KAMPE1, Sandra KLOß1,2, Thomas BOCKLITZ1,2, Petra RÖSCH1,2, Jürgen POPP1,2,3() |
1. Institute of Physical Chemistry and Abbe Center of Photonics, University of Jena, Helmholtzweg 4, D-07743 Jena, Germany 2. InfectoGnostics Research Campus Jena, Center for Applied Research, Philosophenweg 7, 07743 Jena, Germany 3. Leibniz-Institute of Photonic Technology, Albert-Einstein-Straße 9, D-07745 Jena, Germany |
|
|
Abstract The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.
|
Keywords
feature selection
Raman spectroscopy
pattern recognition
chemometrics
|
Corresponding Author(s):
Jürgen POPP
|
Just Accepted Date: 21 June 2017
Online First Date: 14 July 2017
Issue Date: 26 September 2017
|
|
1 |
Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. Journal of Raman Spectroscopy : JRS, 2016, 47(1): 89–109
https://doi.org/10.1002/jrs.4844
|
2 |
Meisel S, Stöckel S, Rösch P, Popp J. Identification of meat-associated pathogens via Raman microspectroscopy. Food Microbiology, 2014, 38: 36–43
https://doi.org/10.1016/j.fm.2013.08.007
|
3 |
Rösch P, Harz M, Schmitt M, Peschke K D, Ronneberger O, Burkhardt H, Motzkus H W, Lankers M, Hofer S, Thiele H, Popp J. Chemotaxonomic identification of single bacteria by micro-Raman spectroscopy: application to clean-room-relevant biological contaminations. Applied and Environmental Microbiology, 2005, 71(3): 1626–1637
https://doi.org/10.1128/AEM.71.3.1626-1637.2005
pmid: 15746368
|
4 |
Mukherjee S. Classifying Microarray Data Using Support Vector Machines in A Practical Approach to Microarray Data Analysis. Boston: Springer US, 2003, 166–185
|
5 |
Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, Popp J. A comprehensive study of classification methods for medical diagnosis. Journal of Raman Spectroscopy: JRS, 2009, 40(12): 1759–1765
https://doi.org/10.1002/jrs.2529
|
6 |
Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324
https://doi.org/10.1016/S0004-3702(97)00043-X
|
7 |
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England), 2007, 23(19): 2507–2517
https://doi.org/10.1093/bioinformatics/btm344
pmid: 17720704
|
8 |
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using Support Vector Machines. Machine Learning, 2002, 46(1/3): 389–422
https://doi.org/10.1023/A:1012487302797
|
9 |
Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006, 83(2): 83–90
https://doi.org/10.1016/j.chemolab.2006.01.007
|
10 |
Menze B H, Kelm B M, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht F A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009, 10(1): 213
https://doi.org/10.1186/1471-2105-10-213
pmid: 19591666
|
11 |
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32
https://doi.org/10.1023/A:1010933404324
|
12 |
Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics (Oxford, England), 2011, 27(14): 1986–1994
https://doi.org/10.1093/bioinformatics/btr300
pmid: 21576180
|
13 |
Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273–297
https://doi.org/10.1007/BF00994018
|
14 |
Kloß S, Kampe B, Sachse S, Rösch P, Straube E, Pfister W, Kiehntopf M, Popp J. Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. Analytical Chemistry, 2013, 85(20): 9610–9616
https://doi.org/10.1021/ac401806f
pmid: 24010860
|
15 |
Morháč M, Kliman J, Matoušek V, Veselský M, Turzo I. Background elimination methods for multidimensional coincidence g-ray spectra. Nuclear Instruments & Methods in Physics Research Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 401(1): 113–132
https://doi.org/10.1016/S0168-9002(97)01023-1
|
16 |
Zhang D, Jallad K N, Ben-Amotz D. Stripping of cosmic spike spectral artifacts using a new upper-bound spectrum algorithm. Applied Spectroscopy, 2001, 55(11): 1523–1531
https://doi.org/10.1366/0003702011953757
|
17 |
Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Zeitschrift für Physikalische Chemie, 2011, 225(6–7): 753–764
https://doi.org/10.1524/zpch.2011.0077
|
18 |
Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York: ACM, 1992, 144–152
|
19 |
Vapnik V. The Nature of Statistical Learning Theory. 2nd ed. New York: Springer Science & Business Media, 2013
|
20 |
Couvreur C, Bresler Y. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 2000, 21(3): 797–808
https://doi.org/10.1137/S0895479898332928
|
21 |
Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, 5: 101–141
|
22 |
R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, 2016
|
23 |
Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab – An S4 package for kernel methods in R. Journal of Statistical Software, 2004, 11(9): 1–20
https://doi.org/10.18637/jss.v011.i09
|
24 |
Van Campenhout J M. Topics in measurement selection. In: Handbook of Statistics. Elsevier, 1982, 793–803
|
25 |
Sima C, Dougherty E R. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters, 2008, 29(11): 1667–1674
https://doi.org/10.1016/j.patrec.2008.04.010
|
26 |
Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2011, 73(5): 753–772
https://doi.org/10.1111/j.1467-9868.2011.00783.x
pmid: 22323898
|
27 |
Lavine B K, Davidson C E, Moores A J, Griffiths P R. Raman spectroscopy and genetic algorithms for the classification of wood types. Applied Spectroscopy, 2001, 55(8): 960–966
|
28 |
Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157–1182
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|