|
|
|
Predicting enhancer-promoter interaction from genomic sequence with deep neural networks |
Shashank Singh1, Yang Yang2, Barnabás Póczos1, Jian Ma2( ) |
1. Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2. Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA |
|
|
|
|
Abstract Background: In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions. Methods: Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. Results: Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes. Conclusions: This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.
|
| Keywords
chromatin interaction
enhancer-promoter interaction
deep neural network
|
|
Corresponding Author(s):
Jian Ma
|
|
Online First Date: 23 April 2019
Issue Date: 30 May 2019
|
|
| 1 |
T.Sexton, and G. Cavalli, (2015) The role of chromosome domains in shaping the functional genome. Cell, 160, 1049–1059
https://doi.org/10.1016/j.cell.2015.02.040.
pmid: 25768903
|
| 2 |
E.Lieberman-Aiden, , N. L.van Berkum, , L.Williams, , M.Imakaev, , T.Ragoczy, , A.Telling, , I.Amit, , B. R.Lajoie, , P. J.Sabo, , M. O.Dorschner, , et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293
https://doi.org/10.1126/science.1181369.
pmid: 19815776
|
| 3 |
M. J.Fullwood, and Y.Ruan, (2009) ChIP-based methods for the identification of long-range chromatin interactions. J. Cell. Biochem., 107, 30–39
https://doi.org/10.1002/jcb.22116.
pmid: 19247990
|
| 4 |
Z.Tang, , O. J. Luo, , X.Li, , M.Zheng, , J. J.Zhu, , P.Szalaj, , P.Trzaskoma, , A.Magalska, , J.Włodarczyk, , B.Ruszczycki, , et al. (2015) CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell, 163, 1611–1627
https://doi.org/10.1016/j.cell.2015.11.024.
pmid: 26686651
|
| 5 |
Y.Zhang, , C.-H. Wong, , R. Y.Birnbaum, , G.Li, , R. Favaro, , C. Y.Ngan, , J.Lim, , E.Tai, , H. M.Poh, , E.Wong, , et al. (2013) Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature, 504, 306–310
https://doi.org/10.1038/nature12716.
pmid: 24213634
|
| 6 |
J. R.Dixon, , I. Jung, , S.Selvaraj, , Y.Shen, , J. E.Antosiewicz-Bourget, , A. Y.Lee, , Z.Ye, , A. Kim, , N.Rajagopal, , W.Xie, , et al. (2015) Chromatin architecture reorganization during stem cell differentiation. Nature, 518, 331–336
https://doi.org/10.1038/nature14222.
pmid: 25693564
|
| 7 |
Y.Guo, , Q. Xu, , D.Canzio, , J.Shou, , J.Li, , D. U. Gorkin, , I.Jung, , H.Wu, , Y. Zhai, , Y.Tang, , et al. (2015) CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell, 162, 900–910
https://doi.org/10.1016/j.cell.2015.07.038.
pmid: 26276636
|
| 8 |
A.Sanyal, , B. R. Lajoie, , G.Jain, and J.Dekker, (2012) The long-range interaction landscape of gene promoters. Nature, 489, 109–113
https://doi.org/10.1038/nature11279.
pmid: 22955621
|
| 9 |
G.Li, , X. Ruan, , R. K.Auerbach, , K. S.Sandhu, , M.Zheng, , P.Wang, , H. M.Poh, , Y.Goh, , J.Lim, , J.Zhang, , et al. (2012) Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell, 148, 84–98
https://doi.org/10.1016/j.cell.2011.12.014.
pmid: 22265404
|
| 10 |
S. S.Rao, , M. H. Huntley, , N. C.Durand, , E. K.Stamenova, , I. D.Bochkov, , J. T.Robinson, , A. L.Sanborn, , I.Machol, , A. D.Omer, , E. S.Lander, , et al. (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159, 1665–1680
https://doi.org/10.1016/j.cell.2014.11.021.
pmid: 25497547
|
| 11 |
S.Roy, , A. F. Siahpirani, , D.Chasman, , S.Knaack, , F.Ay, , R. Stewart, , M.Wilson, and R.Sridharan, (2015) A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res., 43, 8694–8712
https://doi.org/10.1093/nar/gkv865.
pmid: 26338778
|
| 12 |
S.Whalen, , R. M. Truty, and K. S.Pollard, (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet., 48, 488–496
https://doi.org/10.1038/ng.3539.
pmid: 27064255
|
| 13 |
J.Schreiber, , M. Libbrecht, , J.Bilmes, and W.Noble, (2018) Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv, 103614
|
| 14 |
Y.Zhu, , Z. Chen, , K.Zhang, , M.Wang, , D.Medovoy, , J. W.Whitaker, , B.Ding, , N.Li, , L. Zheng, and W.Wang, (2016) Constructing 3D interaction maps from 1D epigenomes. Nat. Commun., 7, 10812
https://doi.org/10.1038/ncomms10812.
pmid: 26960733
|
| 15 |
Q.Cao, , C. Anyansi, , X.Hu, , L.Xu, , L. Xiong, , W.Tang, , M. T. S.Mok, , C.Cheng, , X.Fan, , M.Gerstein, , et al. (2017) Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet., 49, 1428–1436
https://doi.org/10.1038/ng.3950.
pmid: 28869592
|
| 16 |
Y.Yang, , R. Zhang, , S.Singh, , and J.Ma, (2017) Exploiting sequence-based features for predicting enhancer-promoter interactions. Bioinformatics 33, i252–i260
https://doi.org/https://doi.org/10.1093/bioinformatics/btx257
|
| 17 |
J. H.Friedman, (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat., 29, 1189–1232
https://doi.org/10.1214/aos/1013203451.
|
| 18 |
J.Zhou, and O. G. Troyanskaya, (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods, 12, 931–934
https://doi.org/10.1038/nmeth.3547.
pmid: 26301843
|
| 19 |
Y.Park, and M. Kellis, (2015) Deep learning for regulatory genomics. Nat. Biotechnol., 33, 825–826
https://doi.org/10.1038/nbt.3313.
pmid: 26252139
|
| 20 |
B.Alipanahi, , A. Delong, , M. T.Weirauch, and B. J.Frey, (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838
https://doi.org/10.1038/nbt.3300.
pmid: 26213851
|
| 21 |
D.Quang, and X. Xie, (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res., 44, e107
https://doi.org/10.1093/nar/gkw226.
pmid: 27084946
|
| 22 |
Y.Li, , W. Shi, and W. W.Wasserman, (2018) Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics, 19, 202
|
| 23 |
D. R.Kelley, , J.Snoek, and J. L.Rinn, (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res., 26, 990–999
https://doi.org/10.1101/gr.200535.115.
pmid: 27197224
|
| 24 |
S.Zhang, , H. Hu, , T.Jiang, , L.Zhang, and J.Zeng, (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
|
| 25 |
J. T.Cuperus, , B.Groves, , A.Kuchina, , A. B.Rosenberg, , N.Jojic, , S.Fields, and G.Seelig, (2017) Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res., 27, 2015–2024
https://doi.org/10.1101/gr.224964.117.
pmid: 29097404
|
| 26 |
R.Singh, , J. Lanchantin, , A.Sekhon, and Y.Qi, (2017) Attend and predict: understanding gene regulation by selective attention on chromatin. In: Advances in Neural Infornation Processing Systems 30
|
| 27 |
S.Zhang, , H. Hu, , T.Jiang, , L.Zhang, and J.Zeng, (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
https://doi.org/10.1093/bioinformatics/btx247.
pmid: 28881981
|
| 28 |
V.Boža, , B. Brejová, , and T.Vinař, (2017) DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PloS one, 12, e0178751
|
| 29 |
S.Wang, , S. Sun, , Z.Li, , R.Zhang, and J.Xu, (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324
https://doi.org/10.1371/journal.pcbi.1005324.
pmid: 28056090
|
| 30 |
T.Ching, , D. S. Himmelstein, , B. K.Beaulieu-Jones, , A. A.Kalinin, , B. T.Do, , G. P.Way, , E.Ferrero, , P.-M.Agapow, , M.Zietz, , M. M.Hoffman, , et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface, 15, 142760
https://doi.org/10.1098/rsif.2017.0387.
pmid: 29618526
|
| 31 |
C.Angermueller, , T. Pärnamaa, , L.Parts, and O.Stegle, (2016) Deep learning for computational biology. Mol. Syst. Biol., 12, 878
https://doi.org/10.15252/msb.20156651.
pmid: 27474269
|
| 32 |
ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74
https://doi.org/10.1038/nature11247.
pmid: 22955616
|
| 33 |
A.Kundaje, , W. Meuleman, , J.Ernst, , M.Bilenky, , A.Yen, , A.Heravi-Moussavi, , P.Kheradpour, , Z.Zhang, , J.Wang, , M. J.Ziller, , et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature, 518, 317–330
https://doi.org/10.1038/nature14248.
pmid: 25693563
|
| 34 |
I. V.Kulakovskiy, , I. E.Vorontsov, , I. S.Yevshin, , A. V.Soboleva, , A. S.Kasianov, , H.Ashoor, , W.Ba-Alawi, , V. B.Bajic, , Y. A.Medvedeva, , F. A.Kolpakov, , et al. (2016) HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res., 44, D116–D125
https://doi.org/10.1093/nar/gkv1249.
pmid: 26586801
|
| 35 |
J.Xu, , V. G. Sankaran, , M.Ni, , T. F.Menne, , R. V.Puram, , W.Kim, and S. H.Orkin, (2010) Transcriptional silencing of γ-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev., 24, 783–798
https://doi.org/10.1101/gad.1897310.
pmid: 20395365
|
| 36 |
C. L.Frank, , F. Liu, , R.Wijayatunge, , L.Song, , M. T.Biegler, , M. G.Yang, , C. M.Vockley, , A.Safi, , C. A.Gersbach, , G. E.Crawford, , et al. (2015) Regulation of chromatin accessibility and Zic binding at enhancers in the developing cerebellum. Nat. Neurosci., 18, 647–656
https://doi.org/10.1038/nn.3995.
pmid: 25849986
|
| 37 |
I.Krivega, and A.Dean, (2017) LDB1-mediated enhancer looping can be established independent of mediator and cohesin. Nucleic Acids Res., 45, 8255–8268
https://doi.org/10.1093/nar/gkx433.
pmid: 28520978
|
| 38 |
C. J.Bowman, , D. E.Ayer, and B. D.Dynlacht, (2014) Foxk proteins repress the initiation of starvation-induced atrophy and autophagy programs. Nat. Cell Biol., 16, 1202–1214
https://doi.org/10.1038/ncb3062.
pmid: 25402684
|
| 39 |
B.van Riel, and F.Rosenbauer, (2014) Epigenetic control of hematopoiesis: the PU.1 chromatin connection. Biol. Chem., 395, 1265–1274
https://doi.org/10.1515/hsz-2014-0195.
pmid: 25205721
|
| 40 |
U.Steidl, , F. Rosenbauer, , R. G.Verhaak, , X.Gu, , A. Ebralidze, , H. H.Otu, , S.Klippel, , C.Steidl, , I.Bruns, , D. B.Costa, , et al. (2006) Essential role of Jun family transcription factors in PU.1 knockdown-induced leukemic stem cells. Nat. Genet., 38, 1269–1277
https://doi.org/10.1038/ng1898.
pmid: 17041602
|
| 41 |
S.Gupta, , J. A. Stamatoyannopoulos, , T. L.Bailey, and W. S.Noble, (2007) Quantifying similarity between motifs. Genome Biol., 8, R24
https://doi.org/10.1186/gb-2007-8-2-r24.
pmid: 17324271
|
| 42 |
E.Hodis, , I. R. Watson, , G. V.Kryukov, , S. T.Arold, , M.Imielinski, , J.-P.Theurillat, , E.Nickerson, , D.Auclair, , L.Li, , C. Place, , et al. (2012) A landscape of driver mutations in melanoma. Cell, 150, 251–263
https://doi.org/10.1016/j.cell.2012.06.024.
pmid: 22817889
|
| 43 |
W.Xi, and M. A. Beer, (2018). Local epigenomic state cannot discriminate interacting and non-interacting enhancer–promoter pairs with high accuracy. PLoS Comput. Biol., 14, e1006625
|
| 44 |
Q.Cao, , C. Anyansi, , X.Hu, , L.Xu, , L. Xiong, , W.Tang, , M.T.S.Mok, , C.Cheng, , X.Fan, , M.Gerstein, et al. (2017) Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet., 49, 1428–1436
|
| 45 |
A.Shrikumar, , P. Greenside, , A.Shcherbina, and A.Kundaje, (2016) Not just a black box: learning important features through propagating activation differences. arXiv, 1605.01713
|
| 46 |
Y.Li, , C.-Y. Chen, and W. W.Wasserman, (2016) Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol., 23, 322–336
https://doi.org/10.1089/cmb.2015.0189.
pmid: 26799292
|
| 47 |
X.Glorot, , A. Bordes, and Y.Bengio, (2011) Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligen Vol. 15, pp. 275
|
| 48 |
Y.LeCun, , Y. Bengio, and G.Hinton, (2015) Deep learning. Nature, 521, 436–444
https://doi.org/10.1038/nature14539.
pmid: 26017442
|
| 49 |
S.Hochreiter, and J.Schmidhuber, (1997) Long short-term memory. Neural Comput., 9, 1735–1780
https://doi.org/10.1162/neco.1997.9.8.1735.
pmid: 9377276
|
| 50 |
A.Graves, , N. Jaitly, and A.-R.Mohamed, (2013) Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on IEEE pp. 273–278
|
| 51 |
F.Chollet, (2015) Keras. , accessed on April 10, 2018
|
| 52 |
D.Kingma, and J. Ba, (2014) Adam: a method for stochastic optimization. arXiv, 1412.6980
|
| 53 |
S.Ioffe, and C. Szegedy, (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd International Conference on Machine Learning pp. 448–456
|
| 54 |
A.Krizhevsky, , I. Sutskever, and G. E.Hinton, (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems pp. 1097–1105
|
| 55 |
C. E.Grant, , T. L. Bailey, and W. S.Noble, (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics, 27, 1017–1018
https://doi.org/10.1093/bioinformatics/btr064.
pmid: 21330290
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|