Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2022, Vol. 16 Issue (1) : 161901    https://doi.org/10.1007/s11704-021-0476-8
RESEARCH ARTICLE
A framework combines supervised learning and dense subgraphs discovery to predict protein complexes
Suyu MEI()
Software College, Shenyang Normal University, Shenyang 110034, China
 Download: PDF(1142 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Rapidly identifying protein complexes is significant to elucidate the mechanisms of macromolecular interactions and to further investigate the overlapping clinical manifestations of diseases. To date, existing computational methods majorly focus on developing unsupervised graph clustering algorithms, sometimes in combination with prior biological insights, to detect protein complexes from protein-protein interaction (PPI) networks. However, the outputs of these methods are potentially structural or functional modules within PPI networks. These modules do not necessarily correspond to the actual protein complexes that are formed via spatiotemporal aggregation of subunits. In this study, we propose a computational framework that combines supervised learning and dense subgraphs discovery to predict protein complexes. The proposed framework consists of two steps. The first step reconstructs genome-scale protein co-complex networks via training a supervised learning model of l2-regularized logistic regression on experimentally derived co-complexed protein pairs; and the second step infers hierarchical and balanced clusters as complexes from the co-complex networks via effective but computationally intensive k-clique graph clustering method or efficient maximum modularity clustering (MMC) algorithm. Empirical studies of cross validation and independent test show that both steps achieve encouraging performance. The proposed framework is fundamentally novel and excels over existing methods in that the complexes inferred from protein cocomplex networks are more biologically relevant than those inferred from PPI networks, providing a new avenue for identifying novel protein complexes.

Keywords protein complexes      protein co-complex networks      machine learning      L2-regularized logistic regression      graph clustering     
Corresponding Author(s): Suyu MEI   
Just Accepted Date: 16 March 2021   Issue Date: 19 November 2021
 Cite this article:   
Suyu MEI. A framework combines supervised learning and dense subgraphs discovery to predict protein complexes[J]. Front. Comput. Sci., 2022, 16(1): 161901.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-021-0476-8
https://academic.hep.com.cn/fcs/EN/Y2022/V16/I1/161901
1 N J Krogan , W Peng , G Cagney , M D Robinson , R Haw , G Zhong , et al. High-definition macromolecular composition of yeast RNA-processing complexes. Molecular Cell, 2004, 13 (2): 225- 239
https://doi.org/10.1016/S1097-2765(04)00003-6
2 K Lage , E O Karlberg , Z M Størling , P I Olason , A G Pedersen , O Rigina , et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nature Biotechnology, 2007, 25 (3): 309- 316
https://doi.org/10.1038/nbt1295
3 H W Mewes , C Amid , R Arnold , D Frishman , U Güldener , G Mannhaupt , et al. MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Research, 2004, 32 (suppl_1): D41- D44
https://doi.org/10.1093/nar/gkj148
4 A Ruepp , B Waegele , M Lechner , B Brauner , Kaltenbach Dunger , G Fobo , et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Research, 2010, 38 (suppl_4): D497- D501
5 T S Keshava Prasad , R Goel , K Kandasamy , S Keerthikumar , S Kumar , S Mathivanan , et al. Human Protein Reference Database—2009 update. Nucleic Acids Research, 2009, 37 (suppl_1): D767- D772
6 X Li , M Wu , C K Kwoh , S K Ng . Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics, 2010, 11 (1): 1- 19
https://doi.org/10.1186/1471-2164-11-1
7 S Srihari , C H Yong , A Patil , L Wong . Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes. FEBS Letters, 2015, 589 (19): 2590- 2602
8 J Zahiri , A Emamjomeh , S Bagheri , A Ivazeh , G Mahdevar , H Sepasi , et al. Protein complex prediction: a survey. Genomics, 2020, 112 (1): 174- 183
https://doi.org/10.1016/j.ygeno.2019.01.011
9 C Bron , J Kerbosch . Finding all cliques of an undirected graph. Communications of the ACM, 1973, 16 (9): 575- 580
https://doi.org/10.1145/362342.362367
10 G Bader , C Hogue . An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003, 4 (1): 1- 27
https://doi.org/10.1186/1471-2105-4-1
11 S Van Dongen . Graph clustering by flow simulation. University of Utrecht, 2000
12 T Nepusz , H Yu , A Paccanaro . Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods, 2012, 9 (5): 471- 472
https://doi.org/10.1038/nmeth.1938
13 M Pellegrini , M Baglioni , F Geraci . Protein complex prediction for large protein protein interaction networks with the Core&Peel method. BMC Bioinformatics, 2016, 17 (12): 37- 58
https://doi.org/10.1186/s12859-016-1191-6
14 C Hernandez , C Mella , G Navarro , A Olivera-Nappa , J Araya . Protein complex prediction via dense subgraphs and false positive analysis. PLoS ONE, 2017, 12: e0183460
https://doi.org/10.1371/journal.pone.0183460
15 M Wu , Z Xie , X Li , C K Kwoh , J Zheng . Identifying protein complexes from heterogeneous biological data. Proteins, 2013, 81 (11): 2023- 2033
https://doi.org/10.1002/prot.24365
16 A C Gavin , P Aloy , P Grandi , R Krause , M Boesche , M Marzioch , et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440 (7084): 631- 636
https://doi.org/10.1038/nature04532
17 G Geva , R Sharan . Identification of protein complexes from coimmunoprecipitation data. Bioinformatics, 2011, 27 (1): 111- 117
https://doi.org/10.1093/bioinformatics/btq652
18 N J Krogan , G Cagney , H Yu , G Zhong , X Guo , A Ignatchenko , et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 2006, 440 (7084): 637- 643
https://doi.org/10.1038/nature04670
19 Y Qi , F Balem , C Faloutsos , J Klein-Seetharaman , Z Bar-Joseph . Protein complex identification by supervised graph local clustering. Bioinformatics, 2008, 24 (13): i250- i268
https://doi.org/10.1093/bioinformatics/btn164
20 A Fabregat , K Sidiropoulos , P Garapati , M Gillespie , K Hausmann , R Haw , et al. The Reactome pathway Knowledgebase. Nucleic Acids Research, 2016, 44 (D1): D481- D487
https://doi.org/10.1093/nar/gkv1351
21 G Wu , X Feng , L Stein . A human functional protein interaction network and its application to cancer data analysis. Genome Biology, 2010, 11 (5): 1- 23
22 A Chatr-Aryamontri , B J Breitkreutz , R Oughtred , L Boucher , S Heinicke , D Chen , et al. The BioGRID interaction database: 2015 update. Nucleic Acids Research, 2015, 43 (D1): D470- D478
https://doi.org/10.1093/nar/gku1204
23 S Orchard , M Ammari , B Aranda , L Breuza , L Briganti , F Broackes-Carter , et al. The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research, 2014, 42 (D1): D358- D363
https://doi.org/10.1093/nar/gkt1115
24 S R Collins , P Kemmeren , X C Zhao , J F Greenblatt , F Spencer , F C Holstege , et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Molecular & Cellular Proteomics, 2007, 6 (3): 439- 450
https://doi.org/10.1074/mcp.M600381-MCP200
25 H Yu , P Braun , M A Yildirim , I Lemmens , K Venkatesan , J Sahalie , et al. High-quality binary protein interaction map of the yeast interactome network. Science, 2008, 322 (5898): 104- 110
https://doi.org/10.1126/science.1158684
26 T Ito , T Chiba , R Ozawa , M Yoshida , M Hattori , Y Sakaki . A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of The United States of America, 2001, 98 (8): 4569- 4574
https://doi.org/10.1073/pnas.061034498
27 P Uetz , L Giot , G Cagney , T A Mansfield , R S Judson , J R Knight , et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000, 403 (6770): 623- 627
https://doi.org/10.1038/35001009
28 S Pu , J Wong , B Turner , E Cho , S J Wodak . Up-to-date catalogues of yeast protein complexes. Nucleic Acids Research, 2009, 37 (3): 825- 831
https://doi.org/10.1093/nar/gkn1005
29 S Maetschke , M Simonsen , M Davis , M A Ragan . Gene ontology-driven inference of protein-protein interactions using inducers. Bioinformatics, 2012, 28 (1): 69- 75
https://doi.org/10.1093/bioinformatics/btr610
30 Y Qi , O Tastan , J G Carbonell , J Klein-Seetharaman , J Weston . Semisupervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics, 2010, 26 (18): i645- i652
https://doi.org/10.1093/bioinformatics/btq394
31 S Mei , H Zhu . A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks. Scientific Reports, 2015, 5: 8034
https://doi.org/10.1038/srep08034
32 S Mei . In silico enhancing M. tuberculosis protein interaction networks in STRING to predict drug-resistance pathways and pharmacological risks. Journal of Proteome Research, 2018, 17 (5): 1749- 1760
https://doi.org/10.1021/acs.jproteome.7b00702
33 S Mei , E K Flemington , K Zhang . Transferring knowledge of bacterial protein interaction networks to predict pathogen targeted human genes and immune signaling pathways: a case study on M. tuberculosis. BMC Genomics, 2018, 19 (1): 1- 21
https://doi.org/10.1186/s12864-017-4368-0
34 S F Altschul , T L Madden , A A Schäffer , J Zhang , Z Zhang . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 1997, 25 (17): 3389- 3402
https://doi.org/10.1093/nar/25.17.3389
35 B Boeckmann , A Bairoch , R Apweiler , M C Blatter , A Estreicher , E Gasteiger , et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research, 2003, 31 (1): 365- 370
https://doi.org/10.1093/nar/gkg095
36 D Barrell , E Dimmer , R P Huntley , D Binns , C O’Donovan , R Apweiler , et al. The GOA database in 2009–an integrated gene ontology annotation resource. Nucleic Acids Research, 2009, 37 (D1): D396- D403
37 F Yu , F Huang , C Lin . Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning, 2011, 85: 41- 75
https://doi.org/10.1007/s10994-010-5221-8
38 R Fan , K Chang , C Hsieh , X Wang , C Lin . LIBLINEAR: a library for large linear classification. Machine Learning Research, 2008, 9 (2): 1871- 1874
https://doi.org/10.1145/1390681.1442794
39 G Palla , I Derényi , I Farkas , T Vicsek . Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435 (7043): 814- 818
https://doi.org/10.1038/nature03607
40 B Adamcsek , G Palla , I J Farkas , I Derényi , T Vicsek . CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 2006, 22 (8): 1021- 1023
https://doi.org/10.1093/bioinformatics/btl039
41 A Noack , R Rotta . Multi-level algorithms for modularity clustering. In: Proceedings of the 8th International Symposium on Experimental Algorithms. 2009, 257- 268
42 F Rossi , N Villa-Vialaneix . Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets. Journal de la Société Française de Statistique, 2011, 152: 34- 65
43 M E Newman . Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 2006, 74: 036104
https://doi.org/10.1103/PhysRevE.74.036104
44 L V Zhang , S L Wong , O D King , F P Roth . Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics, 2004, 5 (1): 1- 15
https://doi.org/10.1186/1471-2105-5-1
45 J Qiu , W S Noble . Predicting co-complexed protein pairs from heterogeneous data. PLoS Computational Biology, 2008, 4 (4): e1000054
https://doi.org/10.1371/journal.pcbi.1000054
46 S Kikugawa , K Nishikata , K Murakami , Y Sato , M Suzuki , M Altaf-UlAmin , et al. PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-Invitational protein-protein interactions integrative dataset. BMC Systems Biology, 2012, 6 (Suppl 2): S7
https://doi.org/10.1186/1752-0509-6-S2-S7
47 S Romero-Molina , Y B Ruiz-Blanco , M Harms , J Münch , E SanchezGarcia . PPI-Detect: a support vector machine model for sequencebased prediction of protein-protein interactions. Journal of Computational Chemistry, 2019, 40 (11): 1233- 1242
https://doi.org/10.1002/jcc.25780
48 M Chen , C J Ju , G Zhou , X Chen , T Zhang , K W Chang , et al. Multifaceted protein-protein interaction prediction based on Siamese residual RCNN. Bioinformatics, 2019, 35 (14): i305- i314
https://doi.org/10.1093/bioinformatics/btz328
[1] Article highlights1 Download
[2] Article highlights 2 Download
[3] Article highlights3 Download
[4] Article highlights4 Download
[5] Article highlights5 Download
[1] Zhen SONG, Yu GU, Zhigang WANG, Ge YU. DRPS: efficient disk-resident parameter servers for distributed machine learning[J]. Front. Comput. Sci., 2022, 16(4): 164321-.
[2] Yu OU, Lang LI. Side-channel analysis attacks based on deep learning network[J]. Front. Comput. Sci., 2022, 16(2): 162303-.
[3] Xinyu TONG, Ziao YU, Xiaohua TIAN, Houdong GE, Xinbing WANG. Improving accuracy of automatic optical inspection with machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161310-.
[4] Yi REN, Ning XU, Miaogen LING, Xin GENG. Label distribution for multimodal machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161306-.
[5] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[6] Xiaobing SUN, Tianchi ZHOU, Rongcun WANG, Yucong DUAN, Lili BO, Jianming CHANG. Experience report: investigating bug fixes in machine learning frameworks/libraries[J]. Front. Comput. Sci., 2021, 15(6): 156212-.
[7] Yan-Ping SUN, Min-Ling ZHANG. Compositional metric learning for multi-label classification[J]. Front. Comput. Sci., 2021, 15(5): 155320-.
[8] Jian SUN, Pu-Feng DU. Predicting protein subchloroplast locations: the 10th anniversary[J]. Front. Comput. Sci., 2021, 15(2): 152901-.
[9] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[10] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[11] Yu-Feng LI, De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
[12] Wenhao ZHENG, Hongyu ZHOU, Ming LI, Jianxin WU. CodeAttention: translating source code to comments by exploiting the code constructs[J]. Front. Comput. Sci., 2019, 13(3): 565-578.
[13] Hao SHAO. Query by diverse committee in transfer active learning[J]. Front. Comput. Sci., 2019, 13(2): 280-291.
[14] Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
[15] Ruochen HUANG, Xin WEI, Liang ZHOU, Chaoping LV, Hao MENG, Jiefeng JIN. A survey of data-driven approach on multimedia QoE evaluation[J]. Front. Comput. Sci., 2018, 12(6): 1060-1075.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed