Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

邮发代号 80-971

Quantitative Biology  2015, Vol. 3 Issue (3): 115-123   https://doi.org/10.1007/s40484-015-0045-y
  RESEARCH ARTICLE 本期目录
Determination of specificity influencing residues for key transcription factor families
Ronak Y. Patel1,*(),Christian Garde2,Gary D. Stormo1,*()
1. Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63108, USA
2. Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, DK 2800, Denmark
 全文: PDF(1173 KB)   HTML
Abstract

Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (direct information, PSICOV and adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs for homeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at http://stormo.wustl.edu/SpecPred.

Key wordsspecificity determinants    direct information    protein-DNA interactions    residue co-variance    motifs    co-evolution    feature selection
收稿日期: 2015-03-01      出版日期: 2015-11-04
Corresponding Author(s): Ronak Y. Patel,Gary D. Stormo   
 引用本文:   
. [J]. Quantitative Biology, 2015, 3(3): 115-123.
Ronak Y. Patel, Christian Garde, Gary D. Stormo. Determination of specificity influencing residues for key transcription factor families. Quant. Biol., 2015, 3(3): 115-123.
 链接本文:  
https://academic.hep.com.cn/qb/CN/10.1007/s40484-015-0045-y
https://academic.hep.com.cn/qb/CN/Y2015/V3/I3/115
Fig.1  
Fig.2  
Fig.3  
Family Source Number of instances
HD FlyFactorSurvey 85
UniProbe (BEEML-PBM) 127
Jolma et al., 2013 84
bHLH FlyFfactorSurvey 31
UniProbe 21
Jolma et al., 2013 35
LacI RegPrecise 404
GntR RegPrecise 977
Tab.1  
Fig.4  
1 Balwierz, P. J., Pachkov, M., Arnold, P., Gruber, A. J., Zavolan, M. and van Nimwegen, E. (2014) ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs. Genome Res., 24, 869−884
https://doi.org/10.1101/gr.169508.113 pmid: 24515121
2 Khurana, E., Fu, Y., Colonna, V., Mu, X. J., Kang, H. M., Lappalainen, T., Sboner, A., Lochovsky, L., Chen, J., Harmanci, A., (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science, 342, 1235587
https://doi.org/10.1126/science.1235587. pmid: 24092746
3 Wright, D. A., Li, T., Yang, B. and Spalding, M. H. (2014) TALEN-mediated genome editing: prospects and perspectives. Biochem. J., 462, 15−24
https://doi.org/10.1042/BJ20140295 pmid: 25057889
4 Mendenhall, E. M., Williamson, K. E., Reyon, D., Zou, J. Y., Ram, O., Joung, J. K. and Bernstein, B. E. (2013) Locus-specific editing of histone modifications at endogenous enhancers. Nat. Biotechnol., 31, 1133−1136
https://doi.org/10.1038/nbt.2701 pmid: 24013198
5 Lin, Y., Chomvong, K., Acosta-Sampson, L., Estrela, R., Galazka, J. M., Kim, S. R., Jin, Y. S. and Cate, J. H. (2014) Leveraging transcription factors to speed cellobiose fermentation by Saccharomyces cerevisiae. Biotechnol. Biofuels, 7, 126
pmid: 25435910.
6 Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K. K., Dong, X., Djebali, S., Ruan, Y., (2012) Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res., 22, 1658−1667
https://doi.org/10.1101/gr.136838.111 pmid: 22955978
7 Haynes, B. C., Maier, E. J., Kramer, M. H., Wang, P. I., Brown, H. and Brent, M. R. (2013) Mapping functional transcription factor networks from gene expression data. Genome Res., 23, 1319−1328
https://doi.org/10.1101/gr.150904.112 pmid: 23636944
8 Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. and Luscombe, N. M. (2009) A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet., 10, 252−263
https://doi.org/10.1038/nrg2538. pmid: 19274049
9 Matthews, B. W. (1988) No code for recognition. Nature, 335, 294−295
https://doi.org/10.1038/335294a0. pmid: 3419498
10 Benos, P. V., Lapedes, A. S. and Stormo, G. D. (2002) Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol., 323, 701−727
https://doi.org/10.1016/S0022-2836(02)00917-8. pmid: 12419259
11 Gupta, A., Christensen, R. G., Bell, H. A., Goodwin, M., Patel, R. Y., Pandey, M., Enuameh, M. S., Rayla, A. L., Zhu, C., Thibodeau-Beganny, S., (2014) An improved predictive recognition model for Cys2-His2 zinc finger proteins. Nucleic Acids Res., 42, 4800−4812
https://doi.org/10.1093/nar/gku132. pmid: 24523353
12 Kaplan, T., Friedman, N. and Margalit, H. (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput. Biol., 1, e1
https://doi.org/10.1371/journal.pcbi.0010001. pmid: 16103898
13 Liu, J. and Stormo, G. D. (2008) Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors. Bioinformatics, 24, 1850−1857
https://doi.org/10.1093/bioinformatics/btn331. pmid: 18586699
14 Persikov, A. V., Osada, R. and Singh, M. (2009) Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics, 25, 22−29
https://doi.org/10.1093/bioinformatics/btn580. pmid: 19008249
15 Persikov, A. V. and Singh, M. (2014) De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res., 42, 97−108
https://doi.org/10.1093/nar/gkt890 pmid: 24097433
16 Wolfe, S. A., Nekludova, L. and Pabo, C. O. (2000) DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct., 29, 183−212
https://doi.org/10.1146/annurev.biophys.29.1.183. pmid: 10940247
17 Christensen, R. G., Enuameh, M. S., Noyes, M. B., Brodsky, M. H., Wolfe, S. A. and Stormo, G. D. (2012) Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics, 28, i84−i89
https://doi.org/10.1093/bioinformatics/bts202. pmid: 22689783
18 Stormo, G. D.(2013) Introduction to protein-DNA interactions: structure, thermodynamics, and bioinformatics. NewYork: Cold Spring Harbor Laboratory Press.
19 Giraud, B. G., Heumann, J. M. and Lapedes, A. S. (1999) Superadditive correlation. Phys. Rev. E, 59, 4983−4991
https://doi.org/10.1103/PhysRevE.59.4983. pmid: 11969452
20 Lapedes, A. S., Giraud, B., Liu, L.C. and Stormo, G.D. (1999) Correlated mutations in models of protein sequences: phylogenetic and structural effects. The institute of mathematical statistics lecture notes- monograph series, 33, 236−256.
21 Lapedes, A., Giraud, B. and Jarzynski, C. (2002)Using sequence alignments to predict protein structure and stability with high accuracy. q-bio. <?Pub Caret?>QM, arXiv:1207.2484
22 Cocco, S., Monasson, R. and Weigt, M. (2013) From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLoS Comput. Biol., 9, e1003176
https://doi.org/10.1371/journal.pcbi.1003176 pmid: 23990764
23 Jones, D. T., Buchan, D. W., Cozzetto, D. and Pontil, M. (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics, 28, 184−190
https://doi.org/10.1093/bioinformatics/btr638. pmid: 22101153
24 Kamisetty, H., Ovchinnikov, S. and Baker, D. (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. USA, 110, 15674−15679
https://doi.org/10.1073/pnas.1314045110 pmid: 24009338
25 Marks, D. S., Colwell, L. J., Sheridan, R., Hopf, T. A., Pagnani, A., Zecchina, R. and Sander, C. (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One, 6, e28766
https://doi.org/10.1371/journal.pone.0028766 pmid: 22163331
26 Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., Zecchina, R., Onuchic, J. N., Hwa, T. and Weigt, M. (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA, 108, E1293−E1301
https://doi.org/10.1073/pnas.1111471108 pmid: 22106262
27 Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. and Hwa, T. (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. USA, 106, 67−72
https://doi.org/10.1073/pnas.0805923106 pmid: 19116270
28 Burger, L. and van Nimwegen, E. (2008) Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol., 4, 165
https://doi.org/10.1038/msb4100203 pmid: 18277381
29 Ovchinnikov, S., Kamisetty, H. and Baker, D. (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife, 3, e02030
https://doi.org/10.7554/eLife.02030 pmid: 24842992
30 Feizi, S., Marbach, D., Médard, M. and Kellis, M. (2013) Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol., 31, 726−733
https://doi.org/10.1038/nbt.2635 pmid: 23851448
31 Zhu, L. J., Christensen, R. G., Kazemian, M., Hull, C. J., Enuameh, M. S., Basciotta, M. D., Brasefield, J. A., Zhu, C., Asriyan, Y., Lapointe, D. S., (2011) FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res., 39, D111−D117
https://doi.org/10.1093/nar/gkq858. pmid: 21097781
32 Robasky, K. and Bulyk, M. L. (2011) UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res., 39, D124−D128
https://doi.org/10.1093/nar/gkq992 pmid: 21037262
33 Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K. R., Rastas, P., Morgunova, E., Enge, M., Taipale, M., Wei, G., (2013) DNA-binding specificities of human transcription factors. Cell, 152, 327−339
https://doi.org/10.1016/j.cell.2012.12.009 pmid: 23332764
34 Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A., Kazanov, M. D., Riehl, W., Arkin, A. P., Dubchak, I., (2013) RegPrecise 3.0—a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics, 14, 745
https://doi.org/10.1186/1471-2164-14-745 pmid: 24175918
35 Magrane, M. and Consortium, U. (2011) UniProt Knowledgebase: a hub of integrated protein data. Database, 2011, bar009
https://doi.org/10.1093/database/bar009 pmid: 21447597
36 Dehal, P. S., Joachimiak, M. P., Price, M. N., Bates, J. T., Baumohl, J. K., Chivian, D., Friedland, G. D., Huang, K. H., Keller, K., Novichkov, P. S., (2010) MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res., 38, D396−D400
https://doi.org/10.1093/nar/gkp919. pmid: 19906701
37 Eddy, S. R. (2011) Accelerated profile HMM searches. PLoS Comput. Biol., 7, e1002195
https://doi.org/10.1371/journal.pcbi.1002195. pmid: 22039361
38 Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., (2014) Pfam: the protein families database. Nucleic Acids Res., 42, D222−D230
https://doi.org/10.1093/nar/gkt1223 pmid: 24288371
39 Wang, T. and Stormo, G. D. (2003) Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics, 19, 2369−2380
https://doi.org/10.1093/bioinformatics/btg329 pmid: 14668220
40 Wang, T. and Stormo, G. D. (2005) Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. USA, 102, 17400−17405
https://doi.org/10.1073/pnas.0505147102 pmid: 16301543
41 Mahony, S. and Benos, P.V. (2007) STAMP: a web tool for exploring DNA-binding motif similarities.Nucleic Acids Res, 35(Web Server issue), W253−W258.
42 Kwan, C. (2014) A regression- based interpretation of the inverse of thesample covariance matrix. Spreadsheets in Education (eJSiE): 7, Article 3..
43 Dunn, S. D., Wahl, L. M. and Gloor, G. B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24, 333−340
https://doi.org/10.1093/bioinformatics/btm604. pmid: 18057019
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed