Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2022, Vol. 16 Issue (2) : 162601    https://doi.org/10.1007/s11704-020-0025-x
RESEARCH ARTICLE
Cancer classification with data augmentation based on generative adversarial networks
Kaimin WEI1,2, Tianqi LI1,2, Feiran HUANG1,2(), Jinpeng CHEN3, Zefan HE1,2
1. College of Information Science and Technology, Jinan University, Guangzhou 510632, China
2. Guangdong Key Laboratory of Data Security and Privacy Protection, Guangzhou 510632, China
3. School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
 Download: PDF(8634 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Accurate diagnosis is a significant step in cancer treatment. Machine learning can support doctors in prognosis decision-making, and its performance is always weakened by the high dimension and small quantity of genetic data. Fortunately, deep learning can effectively process the high dimensional data with growing. However, the problem of inadequate data remains unsolved and has lowered the performance of deep learning. To end it, we propose a generative adversarial model that uses non target cancer data to help target generator training. We use the reconstruction loss to further stabilize model training and improve the quality of generated samples. We also present a cancer classification model to optimize classification performance. Experimental results prove that mean absolute error of cancer gene made by our model is 19.3% lower than DC-GAN, and the classification accuracy rate of our produced data is higher than the data created by GAN. As for the classification model, the classification accuracy of our model reaches 92.6%, which is 7.6% higher than the model without any generated data.

Keywords data mining      cancer data analysis      deep learning      generative adversarial networks     
Corresponding Author(s): Feiran HUANG   
Just Accepted Date: 27 April 2020   Online First Date: 31 August 2021    Issue Date: 03 September 2021
 Cite this article:   
Kaimin WEI,Tianqi LI,Feiran HUANG, et al. Cancer classification with data augmentation based on generative adversarial networks[J]. Front. Comput. Sci., 2022, 16(2): 162601.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-0025-x
https://academic.hep.com.cn/fcs/EN/Y2022/V16/I2/162601
Fig.1  The structure of the generative adversarial model for cancer gene expression data
Fig.2  The structure of cancer classification model based on generated data
Fig.3  The difference between real samples and generated ones
Fig.4  An image and its corresponding label
Fig.5  A generated image sample with its label vector
Cancer name #Normal sample #Ill sample
Lung 59 78
Breast 113 150
Prostate 52 71
Colon 41 57
Gastric 32 47
Liver 50 68
Rectal 10 22
Esophageal 11 23
Thyroid 58 77
CCRCC 72 91
Uterine 35 50
HNSCC 44 61
Tab.1  The statistics of cancer datasets
Generative model Normal sample Ill sample
Mean SD Mean SD
GAN 0.238 0.008 0.225 0.014
DCGAN 0.344 0.011 0.276 0.008
VAE 0.323 0.021 0.238 0.004
Gene-GAN 0.151 0.002 0.103 0.003
Tab.2  The MAE of different generative models
Fig.6  The convergence of different generative models
Generative models Accuracy
Mean SD
GAN 0.843 0.025
DCGAN 0.877 0.044
VAE 0.872 0.065
Gene-GAN 0.892 0.046
Tab.3  Classification results of different generative models
Classifiers Accuracy
Mean SD
Decision tree 0.608 0
KNN (k=3) 0.864 0
Support vector machine 0.84 0
VGG 0.781 0.144
ResNet 0.849 0.012
Gene-GAN (non-amplified) 0.85 0.027
Gene-GAN (first fake then real) 0.872 0.047
Gene-GAN (mixed) 0.892 0.046
Tab.4  Classification results for different classifiers
Fig.7  The ROC curve of different classifiers
Label smooth rate Accuracy
Mean SD
? = 1 0.889 0.025
? = 0.8 0.899 0.022
? = 0.6 0.912 0.021
? = 0.4 0.926 0.008
? = 0.2 0.919 0.017
? = 0 0.892 0.046
Tab.5  Accuracy results under different label smooth rate
Fig.8  The ROC curve under different smooth rate
#Amplified data Accuracy
Mean SD
N=100 0.913 0.016
N=300 0.917 0.012
N=600 0.930 0.016
N=1000 0.926 0.008
N=1500 0.918 0.012
Tab.6  Results with different quantities of amplified data
Fig.9  The ROC curve with different quantities of amplified data
1 V V Padma . An overview of targeted cancer therapy. BioMedicine, 2015, 5( 4): 1– 6
2 R Siegel , K Miller , A Jemal . Cancer statistics 2019. CA: A Cancer Journal for Clinicians, 2019, 69( 1): 7– 34
https://doi.org/10.3322/caac.21551
3 T Abeel , T Helleputte , Y Van de Deer , P Dupont , Y Saeys . Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 2009, 26( 3): 392– 398
4 N A Bokulich , B D Kaehler , J R Rideout , M Dillon , E Bolyen , R Knight , G A Huttley , J G Caporaso . Optimizing taxonomic classification of marker-gene amplicon sequences with qiime 2s q2-feature-classifier plugin. Microbiome, 2018, 6( 90): 1– 17
5 R Zhang , G Huang , N Sundararajan , P Saratchandran . Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Transactions on Computer Biology Bioinformation, 2007, 4( 3): 485– 495
https://doi.org/10.1109/tcbb.2007.1012
6 Sun W, Zheng B, Qian W. Computer aided lung cancer diagnosis with deep learning algorithms. Medical Imaging 2016: Computer-Aided Diagnosis. 2016, 9785: 97850Z
7 Institute N C. The cancer genome atlas. see the homepage of National Cancer Institute, 2020
8 A Ebigbo , R Mendel , A Probst , J Manzeneder , L A de Souza Jr , J P Papa , C Palm , H Messmann . Computer-aided diagnosis using deep learning in the evaluation of early oesophageal adenocarcinoma. Gut, 2019, 68( 7): 1143– 1145
https://doi.org/10.1136/gutjnl-2018-317573
9 N Khosravan , H Celik , B Turkbey , E C Jones , B Wood , U Bagci . A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Medical Image Analysis, 2019, 51 : 101– 115
https://doi.org/10.1016/j.media.2018.10.010
10 P Afshar , A Mohammadi , K N Plataniotis , A Oikonomou , H Benali . From handcrafted to deep-learning-based cancer radiomics: challenges and opportunities. IEEE Signal Processing Magazine, 2019, 36( 4): 132– 160
https://doi.org/10.1109/MSP.2019.2900993
11 P R Jeyaraj , E R S Nadar . Computer-assisted medical image classification for early diagnosis of oral cancer employing deep learning algorithm. Journal of Cancer Research and Clinical Oncology, 2019, 145( 4): 829– 837
https://doi.org/10.1007/s00432-018-02834-7
12 T R Golub , D K Slonim , P Tamayo , C Huard , M Gaasenbeek , J P Mesirov , H Coller , M L Loh , J R Downing , M A Caligiuri . Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286( 5439): 531– 537
https://doi.org/10.1126/science.286.5439.531
13 T S Furey , N Cristianini , N Duffy , D W Bednarski , M Schummer , D Haussler . Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16( 10): 906– 914
https://doi.org/10.1093/bioinformatics/16.10.906
14 S Reddy , K T Reddy , V V Kumari , K V Varma . An SVM based approach to breast cancer classification using rbf and polynomial kernel functions with varying arguments. International Journal of Computer Science and Information Technologies, 2014, 5( 4): 5901– 5904
15 Fakoor R, Ladhak F, Nazi A, Huber M. Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of International Conference on Machine Learning. 2013, 1–7
16 Danaee P, Ghaeini R, Hendrix D. A deep learning approach for cancer detection and relevant gene identification. In: Proceedings of Pacific Symposium on Biocomputing. 2017, 219–229
17 A Esteva , B Kuprel , R A Novoa , J Ko , S M Swetter , H M Blau , S Thrun . Dermatologist-level classification of skin cancer with deep neural networks. Nature, 2017, 542( 7639): 115– 118
https://doi.org/10.1038/nature21056
18 K Sirinukunwattana , S E A Raza , Y Tsang , D R J Snead , I A Cree , N M Rajpoot . Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transacations on Medical Imaging, 2016, 35( 5): 1196– 1206
https://doi.org/10.1109/TMI.2016.2525803
19 N Coudray , P S Ocampo , T Sakellaropoulos , N Narula , M Snuderl , D Fenyö , A L Moreira , N Razavian , A Tsirigos . Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature Medicine, 2018, 24( 10): 1559– 1569
https://doi.org/10.1038/s41591-018-0177-5
20 M Liang , Z Li , T Chen , J Zeng . Integrative data analysis of multiplatform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computer Biology Bioinformation, 2015, 12( 4): 928– 937
https://doi.org/10.1109/TCBB.2014.2377729
21 N V Chawla , K W Bowyer , L O Hall , W P Kegelmeyer . Smote: synthetic minority over-sampling. Journal of Artificial Intelligence Research, 2002, 16( 1): 321– 357
22 Li F, Fergus R, Perona P. A bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1134–1141
23 Perez L, Wang J. The effectiveness of data augmentation in image classification using deep learning. 2017, arXiv preprint arXiv: 1712.04621
24 Peng X, Tang Z, Yang F, Feris R S, Metaxas D N. Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2018, 2226–2234
25 Mok T C W, Chung A C S. Learning data augmentation for brain tumor segmentation with coarse-to-fine generative adversarial networks. In: Proceedings of the 4th International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. 2018, 70–80
26 Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A C, Bengio Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 2672–2680
27 Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of the 4th International Conference on Learning Representations. 2016, 1–16
28 Kingma D P, Welling M. Auto-encoding variational bayes. In: Proceedings of the 2nd International Conference on Learning Representations. 2014, 1–14
[1] Yi WEI, Mei XUE, Xin LIU, Pengxiang XU. Data fusing and joint training for learning with noisy labels[J]. Front. Comput. Sci., 2022, 16(6): 166338-.
[2] Donghong HAN, Yanru KONG, Jiayi HAN, Guoren WANG. A survey of music emotion recognition[J]. Front. Comput. Sci., 2022, 16(6): 166335-.
[3] Tian WANG, Jiakun LI, Huai-Ning WU, Ce LI, Hichem SNOUSSI, Yang WU. ResLNet: deep residual LSTM network with longer input for action recognition[J]. Front. Comput. Sci., 2022, 16(6): 166334-.
[4] Pinzhuo TIAN, Yang GAO. Improving meta-learning model via meta-contrastive loss[J]. Front. Comput. Sci., 2022, 16(5): 165331-.
[5] Tian WANG, Shiye LEI, Youyou JIANG, Choi CHANG, Hichem SNOUSSI, Guangcun SHAN, Yao FU. Accelerating temporal action proposal generation via high performance computing[J]. Front. Comput. Sci., 2022, 16(4): 164317-.
[6] Song SUN, Bo ZHAO, Muhammad MATEEN, Xin CHEN, Junhao WEN. Mask guided diverse face image synthesis[J]. Front. Comput. Sci., 2022, 16(3): 163311-.
[7] Yu OU, Lang LI. Side-channel analysis attacks based on deep learning network[J]. Front. Comput. Sci., 2022, 16(2): 162303-.
[8] Qiang LIN, Yusheng HAO, Caihong LIU. Wi-Fi based non-invasive detection of indoor wandering using LSTM model[J]. Front. Comput. Sci., 2021, 15(6): 156505-.
[9] Hongwei GE, Yuxuan HAN, Wenjing KANG, Liang SUN. Unpaired image to image transformation via informative coupled generative adversarial networks[J]. Front. Comput. Sci., 2021, 15(4): 154326-.
[10] Anirban DUTTA, Gudmalwar ASHISHKUMAR, Ch V Rama RAO. Performance analysis of ASR system in hybrid DNN-HMM framework using a PWL euclidean activation function[J]. Front. Comput. Sci., 2021, 15(4): 154705-.
[11] Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[12] Genan DAI, Xiaoyang HU, Youming GE, Zhiqing NING, Yubao LIU. Attention based simplified deep residual network for citywide crowd flows prediction[J]. Front. Comput. Sci., 2021, 15(2): 152317-.
[13] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[14] Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[15] Chune LI, Yongyi MAO, Richong ZHANG, Jinpeng HUAI. A revisit to MacKay algorithm and its application to deep network compression[J]. Front. Comput. Sci., 2020, 14(4): 144304-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed