Proteome-wide prediction of protein-protein interactions from high-throughput data
Proteome-wide prediction of protein-protein interactions from high-throughput data
Zhi-Ping Liu1(), Luonan Chen1,2()
1. Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; 2. Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan
In this paper, we present a brief review of the existing computational methods for predicting proteome-wide protein-protein interaction networks from highthroughput data. The availability of various types of omics data provides great opportunity and also unprecedented challenge to infer the interactome in cells. Reconstructing the interactome or interaction network is a crucial step for studying the functional relationship among proteins and the involved biological processes. The protein interaction network will provide valuable resources and alternatives to decipher the mechanisms of these functionally interacting elements as well as the running system of cellular operations. In this paper, we describe the main steps of predicting protein-protein interaction networks and categorize the available approaches to couple the physical and functional linkages. The future topics and the analyses beyond prediction are also discussed and concluded.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402 . doi: 10.1093/nar/25.17.3389
2
Andres, L.E., Ezkurdia, I., Garcia, B., Valencia, A., and Juan, D. (2009). EcID. A database for the inference of functional interactions in E. coli. Nucleic Acids Res 37, D629-D635 doi: 10.1093/nar/gkn853
3
Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A.T., Kerrien, S., Khadake, J., . (2010). The IntAct molecular interaction database in 2010. Nucleic Acids Res 38, D525-D531 . doi: 10.1093/nar/gkp878
4
Aytuna, A.S., Gursoy, A., and Keskin, O. (2005). Prediction of protein- protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics 21, 2850-2855 . doi: 10.1093/bioinformatics/bti443
5
Bader, G.D., Betel, D., and Hogue, C.W. (2003). BIND: the bio-molecular interaction network database. Nucleic Acids Res 31, 248-250 . doi: 10.1093/nar/gkg056
6
Bader, J.S., Chaudhuri, A., Rothberg, J.M., and Chant, J. (2004). Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 22, 78-85 . doi: 10.1038/nbt924
7
Barabasi, A.L., and Oltvai, Z. (2004). Network biology: understanding the cell’s functional organization. Nat Rev Genet 5, 101-113 . doi: 10.1038/nrg1272
8
Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., and Edgar, R. (2007). NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res 35, D760-D765 . doi: 10.1093/nar/gkl887
Bossi, A., and Lehner, B. (2009). Tissue specificity and the human protein interaction network. Mol Syst Biol 5, 260. doi: 10.1038/msb.2009.17
11
Behrends, C., Sowa, M.E., Gygi, S.P., and Harper, J.W. (2010). Network organization of the human autophagy system. Nature 466, 68-76 . doi: 10.1038/nature09204
12
Bhardwaj, N., and Lu, H. (2005). Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 21, 2730-2738 . doi: 10.1093/bioinformatics/bti398
13
Bork, P., Jensen, L.J., von Mering, C., Ramani, A.K., Lee, I., and Marcotte, E.M. (2004). Protein interaction networks from yeast to human. Curr Opin Struct Biol 14, 292-299 . doi: 10.1016/j.sbi.2004.05.003
14
Brown, K.R., and Jurisica, I. (2007). Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol 8, R95. doi: 10.1186/gb-2007-8-5-r95
15
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., and Thompson, J.D. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31, 3497-3500 . doi: 10.1093/nar/gkg500
16
Chatr-aryamontri, A., Ceol, A., Palazzi, L.M., Nardelli, G., Schneider, M.V., Castagnoli, L., and Cesareni, G. (2007). MINT: the Molecular INTeraction database. Nucleic Acids Res 35, D572-D574 . doi: 10.1093/nar/gkl950
17
Chen, L., Liu, R., Liu, Z.P., Li, M., and Aihara, K. (2012). Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep . 2, 342. doi: 10.1038/srep00342
18
Chen, L., Wang, R.S., and Zhang, X.S. (2009). Biomolecular networks: methods and applications in systems biology (John Wiley & Sons, Hoboken, New Jersey).
19
Chen, L., Wang, R., Li, C., and Aihara, K. (2010). Modelling biomolecular networks in cells: structures and dynamics. (Springer-Verlag, Berlin).
20
Chen, L., Wu, L.Y., Wang, Y., and Zhang, X.S. (2006). Inferring protein interactions from experimental data by association probabilistic method. Proteins 62, 833-837 . doi: 10.1002/prot.20783
21
Cole, S.T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Gordon, S.V., Eiglmeier, K., Gas, S., Barry, C.E., . (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537-544 . doi: 10.1038/31159
22
Dandekar, T., Snel, B., Huynen, M., and Bork, P. (1998). Conserva-tion of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23, 324-328 . doi: 10.1016/S0968-0004(98)01274-2
23
Enright, A.J., Iliopoulos, I., Kyrpides, N.C., and Ouzounis, C.A. (1999). Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86-90 .
24
Eisenberg, D., Marcotte, E.M., Xenarios, I., and Yeates, T.O. (2000). Protein function in the post-genomic era. Nature 405, 823-826 . doi: 10.1038/35015694
25
Ge, H., Liu, Z., Church, G.M., and Vidal, M. (2001). Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29, 482-486 . doi: 10.1038/ng776
26
Gobel, U., Sander, C., Schneider, R., and Valencia, A. (1994). Correlated mutations and residue contacts in proteins. Proteins 18, 309-317 . doi: 10.1002/prot.340180402
27
Goh, C.S., Bogan, A.A., Joachimiak, M., Walther, D., and Cohen, F.E. (2000). Co-evolution of proteins with their interaction partners. J Mol Biol 299, 283-293 . doi: 10.1006/jmbi.2000.3732
28
Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M., and Barabasi, A.L. (2007). The human disease network. Proc Natl Acad Sci U S A 104, 8685-8690 . doi: 10.1073/pnas.0701361104
29
Grigoriev, A. (2001). A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res 29, 3513-3519 . doi: 10.1093/nar/29.17.3513
30
Guo, Y., Yu, L., Wen, Z., and Li, M. (2008). Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res 36, 3025-3030 . doi: 10.1093/nar/gkn159
31
Han, J.D., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout, A.J., Cusick, M.E., Roth, F.P., . (2004a). Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88-93 . doi: 10.1038/nature02555
32
Han, K., Park, B., Kim, H., Hong, J., and Park, J. (2004b). PID: the Human Protein Interaction Database. Bioinformatics 20, 2466-2470 . doi: 10.1093/bioinformatics/bth253
33
Hayashida, M., Ueda, N., and Akutsu, T. (2003). Inferring strengths of protein-protein interactions from experimental data using linear programming. Bioinformatics 19, ii58-ii65 . doi: 10.1093/bioinformatics/btg1061
34
He, D., Liu, Z.P., and Chen, L. (2011). Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach. BMC Genomics 12, 592. doi: 10.1186/1471-2164-12-592
35
He, D., Liu, Z.P., Honda, M., Kaneko, S., and Chen, L. (2012). Coexpression network analysis in chronic hepatitis B and C hepatic lesion reveals distinct patterns of disease progression to hepatocellular carcinoma. J Mol Cell Biol 4, 140-152 . doi: 10.1093/jmcb/mjs011
36
Huynen, M., Snel, B., Lathe, W. 3rd, and Bork, P. (2000). Predicting rotein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10, 1204-1210 . doi: 10.1101/gr.10.8.1204
37
Ideker, T., and Sharan, R. (2008). Protein networks in disease. Genome Res 18, 644-652 . doi: 10.1101/gr.071852.107
38
Jager, S., Cimermancic, P., Gulbahce, N., Johnson, J.R., McGovern, K.E., Clarke, S.C., Shales, M., Mercenne, G., Pache, L., Li. K., . (2011). Global landscape of HIV-human protein complexes. Nature 481, 365-370 .
39
Jansen, R., Greenbaum, D., and Gerstein, M. (2002). Relating whole-genome expression data with protein-protein interactions. Genome Res 12, 37-46 . doi: 10.1101/gr.205602
40
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., and Gerstein, M. (2003). A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449-453 . doi: 10.1126/science.1087361
41
Jothi, R., Kann, M.G., and Przytycka, T.M. (2005). Predicting protein- protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21, i241-i250 . doi: 10.1093/bioinformatics/bti1009
42
Kanehisa, M., and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27-30 . doi: 10.1093/nar/28.1.27
43
Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., . (2007). IntAct--open source resource for molecular interaction data. Nucleic Acids Res 35, D561-D565 . doi: 10.1093/nar/gkl958
44
Lage, K., Mollgard, K., Greenway, S., Wakimoto, H., Gorham, J.M., Workman, C.T., Bendsen, E., Hansen, N.T., Rigina, O., Roque, F.S., . (2010). Dissecting spatio-temporal protein networks driving human heart development and related disorders. Mol Syst Biol 6, 381. doi: 10.1038/msb.2010.36
45
Lee, K., Chuang, H.Y., Beyer, A., Sung, M.K., Huh, W.K., Lee, B., and Ideker, T. (2008) Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Res 36, e136. doi: 10.1093/nar/gkn619
46
Liu, X., Liu, Z.P., Zhao, X.M., and Chen, L. (2012a). Identifying disease genes and module biomarkers with differential interactions. J Am Med Inform Assoc 19, 241-248 . doi: 10.1136/amiajnl-2011-000658
47
Liu, Z.P., Wang, J., Qiu, Y.Q., Leung, R.K.K., Zhang, X.S., Tsui, S.T.W., and Chen, L. (2012b). Inferring a protein interaction map of Mycobacterium tuberculosis based on sequences and interologs. BMC Bioinformatics 13 (Suppl 7), S6. doi: 10.1186/1471-2105-13-S7-S6
48
Liu, Z.P., Wang, Y., Zhang, X.S., and Chen, L. (2012c). Network- based analysis of complex diseases. IET Syst Biol 6: 22-33 . doi: 10.1049/iet-syb.2010.0052
49
Liu, Z.P., Wang, Y., Zhang, X.S., Xia, W., and Chen, L. (2011). Detecting and analyzing differentially activated pathways in brain regions of Alzheimer's disease patients. Mol Biosyst 7, 1441-1452 . doi: 10.1039/c0mb00325e
Liu, Z.P., Wu, L.Y., Wang, Y., Zhang, X.S., and Chen, L. (2010). Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26, 1616-1622 . doi: 10.1093/bioinformatics/btq253
52
Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., and Gerstein, M. (2005) Assessing the limits of genomic data integration for predicting protein networks. Genome Res 15, 945-953 . doi: 10.1101/gr.3610305
53
von Mering, C., Jensen, L.J., Kuhn, M., Chaffron, S., Doerks, T., Kruger, B., Snel, B., and Bork, P. (2007). STRING 7 – recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35, D358-D362 . doi: 10.1093/nar/gkl825
54
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science 298, 824-827 . doi: 10.1126/science.298.5594.824
55
Newman, M.E., and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys Rev E 69, 026113 . doi: 10.1103/PhysRevE.69.026113
56
Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D., and Maltsev, N. (1999). Use of contiguity on the chromosome to predict functional coupling. In Silico Biol 1, 93-108 .
Pazos, F., and Valencia, A. (2001). Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 14, 609-614 . doi: 10.1093/protein/14.9.609
59
Pazos, F., and Valencia, A. (2002). In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219-227 . doi: 10.1002/prot.10074
60
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. (1999). Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96, 4285-4288 . doi: 10.1073/pnas.96.8.4285
61
Prasad, T.S.K., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, D., Raju, R., . (2009). Human Protein Reference Database- 2009 update. Nucleic Acids Res 37, D767-D772 . doi: 10.1093/nar/gkn892
62
Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., and Jiang, H. (2007). Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci U S A 104, 4337-4341 . doi: 10.1073/pnas.0607879104
Smith, G.R., and Sternberg, M.J. (2002). Prediction of protein-protein interactions by docking methods. Curr Opin Struct Biol 12, 28-35 . doi: 10.1016/S0959-440X(02)00285-3
65
Sprinzak, E., and Margalit, H. (2001). Correlated sequence- signatures as markers of protein-protein interaction. J Mol Biol 311, 681-692 . doi: 10.1006/jmbi.2001.4920
66
Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34, D535-D539 . doi: 10.1093/nar/gkj109
67
Szilagyi, A., Grimm, V., Arakaki, A.K., and Skolnick, J. (2005). Prediction of physical protein-protein interactions. Phys Biol 2, S1-S16 . doi: 10.1088/1478-3975/2/2/S01
68
Tamames, J., Casari, G., Ouzounis, C., and Valencia, A. (1997). Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 44, 66-73 . doi: 10.1007/PL00006122
69
Tsoka, S., and Ouzounis, C.A. (2000). Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion. Nat Genet 26, 141-142 . doi: 10.1038/79847
70
Sapkota, A., Liu, X., Zhao, X.M., Cao, Y., Liu, J., Liu, Z.P., and Chen, L. (2011). DIPOS: database of interacting proteins in Oryza sativa. Mol Biosyst 7, 2615-2621 . doi: 10.1039/c1mb05120b
71
Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., and Eisenberg, D. (2004). The database of interacting proteins: 2004 update. Nucleic Acids Res 32, D449-D451 . doi: 10.1093/nar/gkh086
72
Smialowski, P., Pagel, P., Wong, P., Brauner, B., Dunger, I., Fobo, G., Frishman, G., Montrone, C., Rattei, T., Frishman, D., . (2009). The Negatome database: a reference set of non-interacting protein pairs. Nucleic Acids Res 38, D540-D544 . doi: 10.1093/nar/gkp1026
73
Valencia, A., and Pazos, F. (2002). Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 12, 368-373 . doi: 10.1016/S0959-440X(02)00333-0
74
Vapnik, V. (1995). The nature of statistical learning theory. ( Springer-Verlag, New York ).
75
Vastrik, I., D'Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., . (2007). Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8, R39. doi: 10.1186/gb-2007-8-3-r39
76
Walhout, A.J., Sordella, R., Lu, X., Hartley, J.L., Temple, G.F., Brasch, M.A., Thierry-Mieg, N., and Vidal, M. (2000). Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116-122 . doi: 10.1126/science.287.5450.116
77
Wang, R.S., Wang, Y., Wu, L.Y., Zhang, X.S., and Chen, L. (2007). Analysis on multi-domain cooperation for predicting protein- protein interactions. BMC Bioinformatics 8, 391. doi: 10.1186/1471-2105-8-391
78
Wang, J., Huo, K. , Ma, L., Tang, L., Li, D., Huang, X., Yuan, Y., Li, C., Wang, W., Guan, W., . (2011). Toward an understanding of the protein interaction network of the human liver. Mol Syst Biol 7, 536. doi: 10.1038/msb.2011.67
79
Wang, L., Liu, Z.P., Zhang, X.S., and Chen, L. (2012). Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng Des Sel 25, 119-126 . doi: 10.1093/protein/gzr066
80
Winter, C., Henschel, A., Kim, W.K., and Schroeder, M. (2006). SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res 34, D310-D314 . doi: 10.1093/nar/gkj099
81
Wu, J., Kasif, S., and DeLisi, C. (2003). Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524-1530 . doi: 10.1093/bioinformatics/btg187
82
Yu, H., Luscombe, N.M., Lu, H.X., Zhu, X., Xia, Y., Han, J.D., Bertin, N., Chung, S., Vidal, M., and Gerstein, M. (2004). Annotation transfer between genomes: protein-protein interologs and protein- DNA regulogs. Genome Res 14, 1107-1118 . doi: 10.1101/gr.1774904
83
Yu, X., Wallqvist, A., and Reifman, J. (2012). Inferring high-confidence human protein-protein interactions. BMC Bioinformatics 13, 79.
84
Zhang, X.S., Wang, R.S., Wang, Y., Wang, J., Qiu, Y., Wang, L., and Chen, L. (2009). Modularity optimization in community detection of complex networks. Europhys Lett 87, 38002. doi: 10.1209/0295-5075/87/38002
85
Zhao, X.M., Chen, L., and Aihara, K. (2010). A discriminative approach to identifying domain-domain interactions from protein- protein interactions. Proteins 78, 1243-1253 . doi: 10.1002/prot.22643
86
Zhao, X.M., Zhang, X.W., Tang, W., and Chen, L. (2009). FPPI: Fusarium graminearum protein-protein interaction database. J Proteome Res 8, 4714-4721 . doi: 10.1021/pr900415b
87
Zhou, H.X., and Shan, Y. (2001). Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44, 336-343 . doi: 10.1002/prot.1099