Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2019, Vol. 7 Issue (2) : 122-137    https://doi.org/10.1007/s40484-019-0154-0
RESEARCH ARTICLE
Predicting enhancer-promoter interaction from genomic sequence with deep neural networks
Shashank Singh1, Yang Yang2, Barnabás Póczos1, Jian Ma2()
1. Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
2. Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
 Download: PDF(1298 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions.

Methods: Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given.

Results: Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes.

Conclusions: This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.

Keywords chromatin interaction      enhancer-promoter interaction      deep neural network     
Corresponding Author(s): Jian Ma   
Online First Date: 23 April 2019    Issue Date: 30 May 2019
 Cite this article:   
Shashank Singh,Yang Yang,Barnabás Póczos, et al. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks[J]. Quant. Biol., 2019, 7(2): 122-137.
 URL:  
https://academic.hep.com.cn/qb/EN/10.1007/s40484-019-0154-0
https://academic.hep.com.cn/qb/EN/Y2019/V7/I2/122
Fig.1  Diagram of our deep learning model SPEID to predict enhancer-promoter interactions based on sequences only.
Cell line Positive pairs Augmented positive pairs Negative pairs
GM12878 2,113 42,260 42,200
HeLa-S3 1,740 34,800 34,800
HUVEC 1,524 30,480 30,400
IMR90 1,254 25,080 25,000
K562 1,977 39,540 39,500
NHEK 1,291 25,820 25,600
Total 9,899 197,980 197,500
Tab.1  Number of positive sample, augmented positive sample, and negative sample counts, for each cell line
Fig.2  Prediction results of SPEID, TargetFinder’s E/P/W model, and PEP’s integrated model in each cell line, as estimated by 10-fold cross-validation.
Cell Line Top 5% in both Only in SPEID Only in PEP
GM12878 22 28 26
HeLa-S3 16 34 22
HUVEC 20 29 27
IMR90 23 27 26
K562 18 32 30
NHEK 14 36 17
Chance 5 45 45
Tab.2  Number of TF clusters (out of 503) predicted by both SPEID and PEP, only SPEID, and only PEP, to be in the top 10% feature importance in enhancers and promoters, in each cell line
Fig.3  Feature importance in each cell line, for 100 features with highest average importance rank, sorted by average rank.
Cell Line Predicted important in both Only in SPEID Only in TargetFinder
GM12878 22 9 53
HeLa-S3 13 15 37
HUVEC 1 14 7
IMR90 4 31 16
K562 27 26 85
NHEK 0 16 5
Tab.3  Number of potentially important TFs in enhancers involved in EPIs identified by SPEID, TargetFinder, or both
Potentially important TFs involved in EPI
Also in TargetFinder SPI1, EBF1, SP1, IRF3, TCF12, BATF, PAX5, MEF2A,
(GM12878) BCL11A, EGR1, SRF, IRF4, BHLHE40, PBX3, MEF2C,
MAZ, NRF1, YY1, GABPA, ETS1, STAT1, NFYA
Unique in SPEID (GM12878) CPEB1, HXC10, ARI3A, IRF1, IRF8, MNT, TBX15, TBX2, TBX5
Also in TargetFinder(K562) CTCF, SRF, ATF3, MAZ, JUND, MEF2A, CEBPD, BHLHE40,
NR2F2, EGR1, FOSL1, FOS, TAL1, JUNB, JUN, MAFK, E2F6,
SP1, NFE2, NR4A1, GATA1, THAP1, SP2, RFX5, NRF1, USF2
Unique in SPEID (K562) SP4, SP3, TFDP1, ZFX, WT1, KLF15, TBX1, ETV1, ZNF148,
KLF6, HEN1, KLF14, TBX15, CLOCK, ELF2, PLAL1, PURA,
ZNF740, AP2D, CPEB1, EGR2, FOXJ3, HES1, NR1I3,
SREBF2, THA
Tab.4  Predicted potentially important TFs in enhancers involved in EPIs from SPEID in GM12878 and K562
Fig.4  Example of a possibly reduced EPI (extended enhancer at chr19:6514600?6517600, promoter at chr19:6736700?6738700) with mutation occurring within the motif of SP1 in the promoter region shown.
1 T.Sexton, and G. Cavalli, (2015) The role of chromosome domains in shaping the functional genome. Cell, 160, 1049–1059
https://doi.org/10.1016/j.cell.2015.02.040. pmid: 25768903
2 E.Lieberman-Aiden, , N. L.van Berkum, , L.Williams, , M.Imakaev, , T.Ragoczy, , A.Telling, , I.Amit, , B. R.Lajoie, , P. J.Sabo, , M. O.Dorschner, , et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293
https://doi.org/10.1126/science.1181369. pmid: 19815776
3 M. J.Fullwood, and Y.Ruan, (2009) ChIP-based methods for the identification of long-range chromatin interactions. J. Cell. Biochem., 107, 30–39
https://doi.org/10.1002/jcb.22116. pmid: 19247990
4 Z.Tang, , O. J. Luo, , X.Li, , M.Zheng, , J. J.Zhu, , P.Szalaj, , P.Trzaskoma, , A.Magalska, , J.Włodarczyk, , B.Ruszczycki, , et al. (2015) CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell, 163, 1611–1627
https://doi.org/10.1016/j.cell.2015.11.024. pmid: 26686651
5 Y.Zhang, , C.-H. Wong, , R. Y.Birnbaum, , G.Li, , R. Favaro, , C. Y.Ngan, , J.Lim, , E.Tai, , H. M.Poh, , E.Wong, , et al. (2013) Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature, 504, 306–310
https://doi.org/10.1038/nature12716. pmid: 24213634
6 J. R.Dixon, , I. Jung, , S.Selvaraj, , Y.Shen, , J. E.Antosiewicz-Bourget, , A. Y.Lee, , Z.Ye, , A. Kim, , N.Rajagopal, , W.Xie, , et al. (2015) Chromatin architecture reorganization during stem cell differentiation. Nature, 518, 331–336
https://doi.org/10.1038/nature14222. pmid: 25693564
7 Y.Guo, , Q. Xu, , D.Canzio, , J.Shou, , J.Li, , D. U. Gorkin, , I.Jung, , H.Wu, , Y. Zhai, , Y.Tang, , et al. (2015) CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell, 162, 900–910
https://doi.org/10.1016/j.cell.2015.07.038. pmid: 26276636
8 A.Sanyal, , B. R. Lajoie, , G.Jain, and J.Dekker, (2012) The long-range interaction landscape of gene promoters. Nature, 489, 109–113
https://doi.org/10.1038/nature11279. pmid: 22955621
9 G.Li, , X. Ruan, , R. K.Auerbach, , K. S.Sandhu, , M.Zheng, , P.Wang, , H. M.Poh, , Y.Goh, , J.Lim, , J.Zhang, , et al. (2012) Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell, 148, 84–98
https://doi.org/10.1016/j.cell.2011.12.014. pmid: 22265404
10 S. S.Rao, , M. H. Huntley, , N. C.Durand, , E. K.Stamenova, , I. D.Bochkov, , J. T.Robinson, , A. L.Sanborn, , I.Machol, , A. D.Omer, , E. S.Lander, , et al. (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159, 1665–1680
https://doi.org/10.1016/j.cell.2014.11.021. pmid: 25497547
11 S.Roy, , A. F. Siahpirani, , D.Chasman, , S.Knaack, , F.Ay, , R. Stewart, , M.Wilson, and R.Sridharan, (2015) A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res., 43, 8694–8712
https://doi.org/10.1093/nar/gkv865. pmid: 26338778
12 S.Whalen, , R. M. Truty, and K. S.Pollard, (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet., 48, 488–496
https://doi.org/10.1038/ng.3539. pmid: 27064255
13 J.Schreiber, , M. Libbrecht, , J.Bilmes, and W.Noble, (2018) Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv, 103614
14 Y.Zhu, , Z. Chen, , K.Zhang, , M.Wang, , D.Medovoy, , J. W.Whitaker, , B.Ding, , N.Li, , L. Zheng, and W.Wang, (2016) Constructing 3D interaction maps from 1D epigenomes. Nat. Commun., 7, 10812
https://doi.org/10.1038/ncomms10812. pmid: 26960733
15 Q.Cao, , C. Anyansi, , X.Hu, , L.Xu, , L. Xiong, , W.Tang, , M. T. S.Mok, , C.Cheng, , X.Fan, , M.Gerstein, , et al. (2017) Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet., 49, 1428–1436
https://doi.org/10.1038/ng.3950. pmid: 28869592
16 Y.Yang, , R. Zhang, , S.Singh, , and J.Ma, (2017) Exploiting sequence-based features for predicting enhancer-promoter interactions. Bioinformatics 33, i252–i260
https://doi.org/https://doi.org/10.1093/bioinformatics/btx257
17 J. H.Friedman, (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat., 29, 1189–1232
https://doi.org/10.1214/aos/1013203451.
18 J.Zhou, and O. G. Troyanskaya, (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods, 12, 931–934
https://doi.org/10.1038/nmeth.3547. pmid: 26301843
19 Y.Park, and M. Kellis, (2015) Deep learning for regulatory genomics. Nat. Biotechnol., 33, 825–826
https://doi.org/10.1038/nbt.3313. pmid: 26252139
20 B.Alipanahi, , A. Delong, , M. T.Weirauch, and B. J.Frey, (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838
https://doi.org/10.1038/nbt.3300. pmid: 26213851
21 D.Quang, and X. Xie, (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res., 44, e107
https://doi.org/10.1093/nar/gkw226. pmid: 27084946
22 Y.Li, , W. Shi, and W. W.Wasserman, (2018) Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics, 19, 202
23 D. R.Kelley, , J.Snoek, and J. L.Rinn, (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res., 26, 990–999
https://doi.org/10.1101/gr.200535.115. pmid: 27197224
24 S.Zhang, , H. Hu, , T.Jiang, , L.Zhang, and J.Zeng, (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
25 J. T.Cuperus, , B.Groves, , A.Kuchina, , A. B.Rosenberg, , N.Jojic, , S.Fields, and G.Seelig, (2017) Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res., 27, 2015–2024
https://doi.org/10.1101/gr.224964.117. pmid: 29097404
26 R.Singh, , J. Lanchantin, , A.Sekhon, and Y.Qi, (2017) Attend and predict: understanding gene regulation by selective attention on chromatin. In: Advances in Neural Infornation Processing Systems 30
27 S.Zhang, , H. Hu, , T.Jiang, , L.Zhang, and J.Zeng, (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
https://doi.org/10.1093/bioinformatics/btx247. pmid: 28881981
28 V.Boža, , B. Brejová, , and T.Vinař, (2017) DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PloS one, 12, e0178751
29 S.Wang, , S. Sun, , Z.Li, , R.Zhang, and J.Xu, (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324
https://doi.org/10.1371/journal.pcbi.1005324. pmid: 28056090
30 T.Ching, , D. S. Himmelstein, , B. K.Beaulieu-Jones, , A. A.Kalinin, , B. T.Do, , G. P.Way, , E.Ferrero, , P.-M.Agapow, , M.Zietz, , M. M.Hoffman, , et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface, 15, 142760
https://doi.org/10.1098/rsif.2017.0387. pmid: 29618526
31 C.Angermueller, , T. Pärnamaa, , L.Parts, and O.Stegle, (2016) Deep learning for computational biology. Mol. Syst. Biol., 12, 878
https://doi.org/10.15252/msb.20156651. pmid: 27474269
32 ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74
https://doi.org/10.1038/nature11247. pmid: 22955616
33 A.Kundaje, , W. Meuleman, , J.Ernst, , M.Bilenky, , A.Yen, , A.Heravi-Moussavi, , P.Kheradpour, , Z.Zhang, , J.Wang, , M. J.Ziller, , et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature, 518, 317–330
https://doi.org/10.1038/nature14248. pmid: 25693563
34 I. V.Kulakovskiy, , I. E.Vorontsov, , I. S.Yevshin, , A. V.Soboleva, , A. S.Kasianov, , H.Ashoor, , W.Ba-Alawi, , V. B.Bajic, , Y. A.Medvedeva, , F. A.Kolpakov, , et al. (2016) HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res., 44, D116–D125
https://doi.org/10.1093/nar/gkv1249. pmid: 26586801
35 J.Xu, , V. G. Sankaran, , M.Ni, , T. F.Menne, , R. V.Puram, , W.Kim, and S. H.Orkin, (2010) Transcriptional silencing of γ-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev., 24, 783–798
https://doi.org/10.1101/gad.1897310. pmid: 20395365
36 C. L.Frank, , F. Liu, , R.Wijayatunge, , L.Song, , M. T.Biegler, , M. G.Yang, , C. M.Vockley, , A.Safi, , C. A.Gersbach, , G. E.Crawford, , et al. (2015) Regulation of chromatin accessibility and Zic binding at enhancers in the developing cerebellum. Nat. Neurosci., 18, 647–656
https://doi.org/10.1038/nn.3995. pmid: 25849986
37 I.Krivega, and A.Dean, (2017) LDB1-mediated enhancer looping can be established independent of mediator and cohesin. Nucleic Acids Res., 45, 8255–8268
https://doi.org/10.1093/nar/gkx433. pmid: 28520978
38 C. J.Bowman, , D. E.Ayer, and B. D.Dynlacht, (2014) Foxk proteins repress the initiation of starvation-induced atrophy and autophagy programs. Nat. Cell Biol., 16, 1202–1214
https://doi.org/10.1038/ncb3062. pmid: 25402684
39 B.van Riel, and F.Rosenbauer, (2014) Epigenetic control of hematopoiesis: the PU.1 chromatin connection. Biol. Chem., 395, 1265–1274
https://doi.org/10.1515/hsz-2014-0195. pmid: 25205721
40 U.Steidl, , F. Rosenbauer, , R. G.Verhaak, , X.Gu, , A. Ebralidze, , H. H.Otu, , S.Klippel, , C.Steidl, , I.Bruns, , D. B.Costa, , et al. (2006) Essential role of Jun family transcription factors in PU.1 knockdown-induced leukemic stem cells. Nat. Genet., 38, 1269–1277
https://doi.org/10.1038/ng1898. pmid: 17041602
41 S.Gupta, , J. A. Stamatoyannopoulos, , T. L.Bailey, and W. S.Noble, (2007) Quantifying similarity between motifs. Genome Biol., 8, R24
https://doi.org/10.1186/gb-2007-8-2-r24. pmid: 17324271
42 E.Hodis, , I. R. Watson, , G. V.Kryukov, , S. T.Arold, , M.Imielinski, , J.-P.Theurillat, , E.Nickerson, , D.Auclair, , L.Li, , C. Place, , et al. (2012) A landscape of driver mutations in melanoma. Cell, 150, 251–263
https://doi.org/10.1016/j.cell.2012.06.024. pmid: 22817889
43 W.Xi, and M. A. Beer, (2018). Local epigenomic state cannot discriminate interacting and non-interacting enhancer–promoter pairs with high accuracy. PLoS Comput. Biol., 14, e1006625
44 Q.Cao, , C. Anyansi, , X.Hu, , L.Xu, , L. Xiong, , W.Tang, , M.T.S.Mok, , C.Cheng, , X.Fan, , M.Gerstein, et al. (2017) Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet., 49, 1428–1436
45 A.Shrikumar, , P. Greenside, , A.Shcherbina, and A.Kundaje, (2016) Not just a black box: learning important features through propagating activation differences. arXiv, 1605.01713
46 Y.Li, , C.-Y. Chen, and W. W.Wasserman, (2016) Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol., 23, 322–336
https://doi.org/10.1089/cmb.2015.0189. pmid: 26799292
47 X.Glorot, , A. Bordes, and Y.Bengio, (2011) Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligen Vol. 15, pp. 275
48 Y.LeCun, , Y. Bengio, and G.Hinton, (2015) Deep learning. Nature, 521, 436–444
https://doi.org/10.1038/nature14539. pmid: 26017442
49 S.Hochreiter, and J.Schmidhuber, (1997) Long short-term memory. Neural Comput., 9, 1735–1780
https://doi.org/10.1162/neco.1997.9.8.1735. pmid: 9377276
50 A.Graves, , N. Jaitly, and A.-R.Mohamed, (2013) Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on IEEE pp. 273–278
51 F.Chollet, (2015) Keras. , accessed on April 10, 2018
52 D.Kingma, and J. Ba, (2014) Adam: a method for stochastic optimization. arXiv, 1412.6980
53 S.Ioffe, and C. Szegedy, (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd International Conference on Machine Learning pp. 448–456
54 A.Krizhevsky, , I. Sutskever, and G. E.Hinton, (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems pp. 1097–1105
55 C. E.Grant, , T. L. Bailey, and W. S.Noble, (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics, 27, 1017–1018
https://doi.org/10.1093/bioinformatics/btr064. pmid: 21330290
[1] QB-18154-OF-MJ_suppl_1 Download
[1] Huixia Ren, Mengdi Zhao, Bo Liu, Ruixiao Yao, Qi liu, Zhipeng Ren, Zirui Wu, Zongmao Gao, Xiaojing Yang, Chao Tang. Cellbow: a robust customizable cell segmentation program[J]. Quant. Biol., 2020, 8(3): 245-255.
[2] Zhijun Han, Gang Wei. Computational tools for Hi-C data analysis[J]. Quant. Biol., 2017, 5(3): 215-225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed