Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

邮发代号 80-971

Quantitative Biology  2024, Vol. 12 Issue (2): 173-181   https://doi.org/10.1002/qub2.40
  本期目录
A feature extraction framework for discovering pan-cancer driver genes based on multi-omics data
Xiaomeng Xue, Feng Li(), Junliang Shang, Lingyun Dai, Daohui Ge, Qianqian Ren
School of Computer Science, Qufu Normal University, Rizhao, China
 全文: PDF(765 KB)  
Abstract

The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan-cancer driver genes based on multi-omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan-cancer data, pan-cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan-cancer data from the gene functional features, a method to rank pan-cancer data based on the average inverse rank. These features represent the common message of pan-cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision-recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.

Key wordscancer driver genes    feature extraction    multi-omics data    network propagation    pan-cancer
收稿日期: 2023-06-15      出版日期: 2024-07-26
Corresponding Author(s): Feng Li   
 引用本文:   
. [J]. Quantitative Biology, 2024, 12(2): 173-181.
Xiaomeng Xue, Feng Li, Junliang Shang, Lingyun Dai, Daohui Ge, Qianqian Ren. A feature extraction framework for discovering pan-cancer driver genes based on multi-omics data. Quant. Biol., 2024, 12(2): 173-181.
 链接本文:  
https://academic.hep.com.cn/qb/CN/10.1002/qub2.40
https://academic.hep.com.cn/qb/CN/Y2024/V12/I2/173
1 F Bray , J-S Ren , E Masuyer , J Ferlay . Global estimates of cancer prevalence for 27 sites in the adult population in 2008. Int J Cancer. 2013; 132 (5): 1133- 45.
2 D Hanahan , RA Weinberg . Hallmarks of cancer: the next generation. Cell. 2011; 144 (5): 646- 74.
3 G Dinstag , R Shamir . Prodigy: personalized prioritization of driver genes. Bioinformatics. 2020; 36 (6): 1831- 9.
4 LA Garraway , ES Lander . Lessons from the cancer genome. Cell. 2013; 153 (1): 17- 37.
5 H Ledford . The cancer genome challenge. Nature. 2010; 464 (7291): 972- 4.
6 JN Weinstein , EA Collisson , GB Mills , KRM Shaw , BA Ozenberger , K Ellrott , et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013; 45 (10): 1113- 20.
7 J Zhang , R Bajari , D Andric , F Gerthoffert , A Lepsa , H Nahal-Bose , et al. The international cancer genome consortium data portal. Nat Biotechnol. 2019; 37 (4): 367- 9.
8 D Repana , J Nulsen , L Dressler , M Bortolomeazzi , SK Venkata , A Tourna , et al. The network of cancer genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 2019; 20 (1): 1.
9 Z Sondka , S Bamford , CG Cole , SA Ward , I Dunham , SA Forbes . The cosmic cancer gene census: describing genetic dysfunction across all human cancers. Nat Rev Cancer. 2018; 18 (11): 696- 705.
10 H Guo , X Lv , Y Li , M Li . Attention-based gcn integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification. Brief Funct Genomics. 2023; 22 (5): 463- 74.
11 D Tamborero , A Gonzalez-Perez , N Lopez-Bigas . Oncodriveclust: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013; 29 (18): 2238- 44.
12 MS Lawrence , P Stojanov , P Polak , GV Kryukov , K Cibulskis , A Sivachenko , et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013; 499 (7457): 214- 8.
13 CJ Tokheim , N Papadopoulos , KW Kinzler , B Vogelstein , R Karchin . Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci USA. 2016; 113 (50): 14330- 5.
14 L Cowen , T Ideker , BJ Raphael , R Sharan . Network propagation: a universal amplifier of genetic associations. Nat Rev Genet. 2017; 18 (9): 551- 62.
15 L Page , S Brin , R Motwani , T Winograd . The pagerank citation ranking: bringing order to the web; 1998; ID: 1508503.
16 MDM Leiserson , F Vandin , H-T Wu , JR Dobson , JV Eldridge , JL Thomas , et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2015; 47 (2): 106- 14.
17 B Perozzi , R Al-Rfou , S Skiena . Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining; 2014. p. 701- 10.
18 S-W Zhang , J-Y Xu , T Zhang . Dgmp: identifying cancer driver genes by jointing DGCN and MLP from multi-omics genomic data. Dev Reprod Biol. 2022; 20 (5): 928- 38.
19 R Schulte-Sasse , S Budach , D Hnisz , A Marsico . Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nat Mach Intell. 2021; 3 (6): 513- 26.
20 Z Pavić , V Novoselac . Notes on topsis method. Int J Res Eng Sci. 2013.
21 P Chen . Effects of the entropy weight on topsis. Expert Syst Appl. 2021; 168: 114186.
22 H-S Shih , H-J Shyur , ES Lee . An extension of topsis for group decision making. Math Comput Model. 2007; 45 (7-8): 801- 13.
23 J Shen , Z Wu , D Lei , J Shang , X Ren , J Han . Setexpan: corpus-based set expansion via context feature selection and rank ensemble. In: Machine learning and knowledge discovery in databases. Springer International Publishing; 2017. p. 288- 304.
24 X Chen , X Liu . A weighted bagging lightgbm model for potential lncrna-disease association identification. In: Bio-inspired computing: theories and applications. Springer Singapore; 2018. p. 307- 14.
25 O Collier , V Stoven , J-P Vert . Lotus: a single- and multi-task machine learning algorithm for the prediction of cancer driver genes. PLoS Comput Biol. 2019; 15 (9): e1007381.
26 AC Gumpinger , K Lage , H Horn , K Borgwardt . Prediction of cancer driver genes through network-based moment propagation of mutation scores. Bioinformatics. 2020; 36 (Suppl_1): 508- 15.
27 K Boyd , KH Eng , CD Page . Area under the precision-recall curve: point estimates and confidence intervals. In: Machine learning and knowledge discovery in databases. Springer Berlin Heidelberg; 2013. p. 451- 66.
28 A Ziegler , IR Koenig . Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Min Knowl Discov. 2014; 4 (1): 55- 63.
29 W Bao , Q Cui , B Chen , B Yang . Phage_unir_lgbm: phage virion proteins classification with unirep features and lightgbm model. Comput Math Methods Med. 2022; 2022: 1- 8.
30 S Huang , N Cai , PP Pacheco , S Narandes , Y Wang , W Xu . Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics. 2018; 15: 41- 51.
31 VN Kristensen , OC Lingjoerde , HG Russnes , HKM Vollan , A Frigessi , A-L Borresen-Dale . Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer. 2014; 14 (5): 299- 313.
32 C Xie , X Mao , J Huang , Y Ding , J Wu , S Dong , et al. Kobas 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011; 39 (Suppl l_2): W316- 22.
33 T Ma , A Zhang . Affinity network fusion and semi-supervised learning for cancer patient clustering. Methods. 2018; 145: 16- 24.
34 W Zhao , X Gu , S Chen , J Wu , Z Zhou . Modig: integrating multi-omics and multi-dimensional gene network for cancer driver gene identification based on graph attention network model. Bioinformatics. 2022; 38 (21): 4901- 7.
35 X Shi , H Teng , L Shi , W Bi , W Wei , F Mao , et al. Comprehensive evaluation of computational methods for predicting cancer driver genes. Briefings Bioinf. 2022; 23 (2): bbab548.
36 TY Ren , FF Ye , LH Yang , J Liu , Y Wang . Dynamic rule activation method based on activation factor for extended belief rule-based systems. In: 2021 16th international conference on intelligent systems and knowledge engineering (ISKE); 2021. p. 82- 6.
37 H Wu , Z Chen , Y Wu , H Zhang , Q Liu . Integrating protein-protein interaction networks and somatic mutation data to detect driver modules in pan-cancer. Interdiscipl Sci Comput Life Sci. 2022; 14 (1): 151- 67.
38 A Kamburov , K Pentchev , H Galicka , C Wierling , H Lehrach , R Herwig . Consensuspathdb: toward a more complete picture of cell biology. Nucleic Acids Res. 2011; 39 (Suppl l_1): D712- 7.
39 D Szklarczyk , AL Gable , D Lyon , A Junge , S Wyder , J Huerta-Cepas , et al. String v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019; 47 (D1): D607- 13.
40 E Khurana , Y Fu , J Chen , M Gerstein . Interpretation of genomic variants using a unified biological network approach. PLoS Comput Biol. 2013; 9 (3): e1002886.
41 S Razick , G Magklaras , IM Donaldson . Irefindex: a consolidated protein interaction database with provenance. BMC Bioinf. 2008; 9 (1): 405.
42 JK Huang , DE Carlin , MK Yu , W Zhang , JF Kreisberg , P Tamayo , et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Systems. 2018; 6 (4): 484- 95.
43 Q Wang , J Armenia , C Zhang , AV Penson , E Reznik , L Zhang , et al. Unifying cancer and normal rna sequencing data from different sources. Sci Data. 2018; 5 (1): 180061.
44 W Peng , R Wu , W Dai , Y Ning , X Fu , L Liu , et al. Mirna-gene network embedding for predicting cancer driver genes. Brief Funct Genomics. 2023; 22 (4): 341- 50.
45 VA McKusick . Mendelian inheritance in man and its online version, omim. Am J Hum Genet. 2007; 80 (4): 588- 604.
46 H Ogata , S Goto , K Sato , W Fujibuchi , H Bono , M Kanehisa . Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999; 27 (1): 29- 34.
47 J Xiang , N-R Zhang , J-S Zhang , X-Y Lv , M Li . Prgefne: predicting disease-related genes by fast network embedding. Methods. 2021; 192: 3- 12.
48 O Vanunu , O Magger , E Ruppin , T Shlomi , R Sharan . Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010; 6 (1): e1000641.
49 F Li , L Gao , B Wang . Detection of driver modules with rarely mutated genes in cancers. IEEE ACM Trans Comput Biol Bioinf. 2020; 17 (2): 390- 401.
50 L-c Zhang , C-j Li , Z-l Yu . Dynamic web service selection group decision-making based on heterogeneous QOS models. J China Univ Posts Telecommun. 2012; 19 (3): 80- 90.
51 Z Li , Z Luo , Y Wang , G Fan , J Zhang . Suitability evaluation system for the shallow geothermal energy implementation in region by entropy weight method and topsis method. Renew Energy. 2022; 184: 564- 76.
52 H Xu , W Zeng , X Zeng , GG Yen . An evolutionary algorithm based on minkowski distance for many-objective optimization. IEEE Trans Cybern. 2019; 49 (11): 3968- 79.
53 T Chen , C Guestrin . Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 785- 94.
54 H Rao , X Shi , AK Rodrigue , J Feng , Y Xia , M Elhoseny , et al. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput. 2019; 74: 634- 42.
55 A Borji , M-M Cheng , H Jiang , J Li . Salient object detection: a benchmark. IEEE Trans Image Process. 2015; 24 (12): 5706- 22.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed