Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2019, Vol. 7 Issue (2) : 122-137
Predicting enhancer-promoter interaction from genomic sequence with deep neural networks
Shashank Singh1, Yang Yang2, Barnabás Póczos1, Jian Ma2()
1. Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
2. Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Background: In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions.

Methods: Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given.

Results: Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes.

Conclusions: This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.

Keywords chromatin interaction      enhancer-promoter interaction      deep neural network     
Corresponding Author(s): Jian Ma   
Online First Date: 23 April 2019    Issue Date: 30 May 2019
 Cite this article:   
Shashank Singh,Yang Yang,Barnabás Póczos, et al. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks[J]. Quant. Biol., 2019, 7(2): 122-137.
Fig.1  Diagram of our deep learning model SPEID to predict enhancer-promoter interactions based on sequences only.
Cell line Positive pairs Augmented positive pairs Negative pairs
GM12878 2,113 42,260 42,200
HeLa-S3 1,740 34,800 34,800
HUVEC 1,524 30,480 30,400
IMR90 1,254 25,080 25,000
K562 1,977 39,540 39,500
NHEK 1,291 25,820 25,600
Total 9,899 197,980 197,500
Tab.1  Number of positive sample, augmented positive sample, and negative sample counts, for each cell line
Fig.2  Prediction results of SPEID, TargetFinder’s E/P/W model, and PEP’s integrated model in each cell line, as estimated by 10-fold cross-validation.
Cell Line Top 5% in both Only in SPEID Only in PEP
GM12878 22 28 26
HeLa-S3 16 34 22
HUVEC 20 29 27
IMR90 23 27 26
K562 18 32 30
NHEK 14 36 17
Chance 5 45 45
Tab.2  Number of TF clusters (out of 503) predicted by both SPEID and PEP, only SPEID, and only PEP, to be in the top 10% feature importance in enhancers and promoters, in each cell line
Fig.3  Feature importance in each cell line, for 100 features with highest average importance rank, sorted by average rank.
Cell Line Predicted important in both Only in SPEID Only in TargetFinder
GM12878 22 9 53
HeLa-S3 13 15 37
HUVEC 1 14 7
IMR90 4 31 16
K562 27 26 85
NHEK 0 16 5
Tab.3  Number of potentially important TFs in enhancers involved in EPIs identified by SPEID, TargetFinder, or both
Potentially important TFs involved in EPI
Also in TargetFinder SPI1, EBF1, SP1, IRF3, TCF12, BATF, PAX5, MEF2A,
(GM12878) BCL11A, EGR1, SRF, IRF4, BHLHE40, PBX3, MEF2C,
Unique in SPEID (GM12878) CPEB1, HXC10, ARI3A, IRF1, IRF8, MNT, TBX15, TBX2, TBX5
Also in TargetFinder(K562) CTCF, SRF, ATF3, MAZ, JUND, MEF2A, CEBPD, BHLHE40,
Unique in SPEID (K562) SP4, SP3, TFDP1, ZFX, WT1, KLF15, TBX1, ETV1, ZNF148,
ZNF740, AP2D, CPEB1, EGR2, FOXJ3, HES1, NR1I3,
Tab.4  Predicted potentially important TFs in enhancers involved in EPIs from SPEID in GM12878 and K562
Fig.4  Example of a possibly reduced EPI (extended enhancer at chr19:6514600?6517600, promoter at chr19:6736700?6738700) with mutation occurring within the motif of SP1 in the promoter region shown.
