1. 1School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China 2. 2School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China 3. 3Department of Computer Science, Islamia College University Peshawar, Peshawar 25000, Pakistan
Enhancers are short DNA cis-elements that can be bound by proteins (activators) to increase the possibility that transcription of a particular gene will occur. The Enhancers perform a significant role in the formation of proteins and regulating the gene transcription process. Human diseases such as cancer, inflammatory bowel disease, Parkinson’s, addiction, and schizophrenia are due to genetic variation in enhancers. In the current study, we havemade an effort by building, amore robust and novel computational a bi-layered model. The representative feature vector was constructed over a linear combination of six features. The optimum Hybrid feature vector was obtained via the Novel Cascade Multi-Level Subset Feature selection (CMSFS) algorithm. The first layer predicts the enhancer, and the secondary layer carries the prediction of their subtypes. The baseline model obtained 87.88% of accuracy, 95.29% of sensitivity, 80.47% of specificity, 0.766 of MCC, and 0.9603 of a roc value on Layer-1. Similarly, the model obtained 68.24%, 65.54%, 70.95%, 0.3654, and 0.7568 as an Accuracy, sensitivity, specificity, MCC, and ROC values on layer-2 respectively. Over an independent dataset on layer-1, the piEnPred secured 80.4% accuracy, 82.5% of sensitivity, 78.4% of specificity, and 0.6099 as MCC, respectively. Subsequently, the proposed predictor obtained 72.5% of accuracy, 70.0% of sensitivity, 75% of specificity, and 0.4506 of MCC on layer-2, respectively. The proposed method remarkably performed in contrast to other state-of-the-art predictors. For the convenience of most experimental scientists, a user-friendly and publicly freely accessible web server@/bienhancer dot pythonanywhere dot com has been developed.
R G Roeder. The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochemical Sciences, 1996, 21(9): 327–335 https://doi.org/10.1016/0968-0004(96)10050-5
3
D B Nikolov, S K Burley. RNA polymerase II transcription initiation: a structural view. Proceedings of the National Academy of Sciences, 1997, 94(1): 15–22 https://doi.org/10.1073/pnas.94.1.15
L A Pennacchio, W Bickmore, A Dean, M A Nobrega, G Bejerano. Enhancers: five essential questions. Nature Reviews Genetics, 2013, 14(4): 288–295 https://doi.org/10.1038/nrg3458
6
O I Kulaeva, E V Nizovtseva, Y S Polikanov, S V Ulianov, V M Studitsky. Distant activation of transcription: mechanisms of enhancer action. Molecular and Cellular Biology, 2012, 32(24): 4892–4897 https://doi.org/10.1128/MCB.01127-12
7
A Civas, P Génin, P Morin, R Lin, J Hiscott. Promoter organization of the interferon-A genes differentially affects virus-induced expression and responsiveness to TBK1 and IKKc. Journal of Biological Chemistry, 2006, 281(8): 4856–486 https://doi.org/10.1074/jbc.M506812200
8
R Sharan, S Karni, Y Felder. Analysis of biological networks: transcriptional networks-promoter sequence analysis. Tel Aviv University, 2007, 1–5
9
M Li, C Marin-Muller, U Bharadwaj, K H Chow, Q Yao, C Chen. MicroRNAs: control and loss of control in human physiology and disease. World Journal of Surgery, 2009, 33(4): 667–684 https://doi.org/10.1007/s00268-008-9836-x
10
C T Ong, V G Corces. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nature Reviews Genetics, 2011, 12(4): 283–293 https://doi.org/10.1038/nrg2957
11
P J Wittkopp, G Kalay. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nature Reviews Genetics, 2012, 13(1): 59–69 https://doi.org/10.1038/nrg3095
12
P Gagniuc, C Ionescu-Tirgoviste. Gene promoters show chromosomespecificity and reveal chromosome territories in humans. BMC Genomics, 2013, 14(1): 1–13 https://doi.org/10.1186/1471-2164-14-278
M Boyd, M Thodberg, M Vitezic, J Bornholdt, K Vitting-Seerup, Y Chen, M Coskun, Y Li, B Z S Lo, P Klausen. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nature Communications, 2018, 9(1): 1–9 https://doi.org/10.1038/s41467-018-03766-z
G Zhang, J Shi, S Zhu, Y Lan, L Xu, H Yuan, G Liao, X Liu, Y Zhang, Y Xiao. DiseaseEnhancer: a resource of human disease-associated enhancer catalog. Nucleic Acids Research, 2017, 46(D1): D78–D84 https://doi.org/10.1093/nar/gkx920
17
W A Whyte, D A Orlando, D Hnisz, B J Abraham, C Y Lin, M H Kagey, P B Rahl, T I Lee, R A Young. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell, 2013, 153(2): 307–319 https://doi.org/10.1016/j.cell.2013.03.035
18
S C Parker, M L Stitzel, D L Taylor, J M Orozco, M R Erdos, J A Akiyama, K L van Bueren , P S Chines, N Narisu, B L Black, A Visel. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proceedings of the National Academy of Sciences, 2013, 110(44): 17921–17926 https://doi.org/10.1073/pnas.1317023110
19
B Chatterjee, B Banoth, T Mukherjee, N Taye, B Vijayaragavan, S Chattopadhyay, J Gomes, S Basak. Late-phase synthesis of IKBαinsulates the TLR4-activated canonical NF-KB pathway from noncanonical NF-KB signaling in macrophages. Science Signaling, 2016, 9(457): ra120–ra120 https://doi.org/10.1126/scisignal.aaf1129
20
A R Niederriter, A Varshney, S C Parker, D M Martin. Super enhancers in cancers, complex disease, and developmental disorders. Genes, 2015, 6(4): 1183–1200 https://doi.org/10.3390/genes6041183
21
S F Schmidt, B D Larsen, A Loft, R Nielsen, J G S Madsen, S Mandrup. Acute TNF-induced repression of cell identity genes is mediated by NFKB-directed redistribution of cofactors from super-enhancers. Genome Research, 2015, 25(9): 1281–1294 https://doi.org/10.1101/gr.188300.114
22
G Vahedi, Y Kanno, Y Furumoto, K Jiang, S C J Parker, MR Erdos , S R Davis, R Roychoudhuri, N P Restifo, M Gadina. Super-enhancers delineate disease-associated regulatory nodes in T cells. Nature, 2015, 520(7548): 558–562 https://doi.org/10.1038/nature14154
23
J D Brown, C Y Lin, Q Duan, G Griffin, A J Federation, R M Paranal, S Bair, G Newton, A H Lichtman, A L Kung. NF-KB directs dynamic super enhancer formation in inflammation and atherogenesis. Molecular Cell, 2014, 56(2): 219–231 https://doi.org/10.1016/j.molcel.2014.08.024
24
S A Vlahopoulos, O Cen, N Hengen, J Agan, M Moschovi, E Critselis, M Adamaki, F Bacopoulou, J A Copland, I Boldogh. Dynamic aberrant NF-KB spurs tumorigenesis: a new model encompassing the microenvironment. Cytokine & Growth Factor Reviews, 2015, 26(4): 389–403 https://doi.org/10.1016/j.cytogfr.2015.06.001
25
Z Zou, B Huang, X Wu, H Zhang, J Qi, J Bradner, S Nair, L F Chen. Brd4 maintains constitutively active NF-KB in cancer cells by binding to acetylated RelA. Oncogene, 2014, 33(18): 2395–2404 https://doi.org/10.1038/onc.2013.179
26
D Shlyueva, G Stampfel, A Stark. Transcriptional enhancers: from properties to genome-wide predictions. Nature Reviews Genetics, 2014, 15(4): 272–286 https://doi.org/10.1038/nrg3682
27
M Tahir, M Hayat, S A Khan. A two-layer computational model for discrimination of enhancer and their types using hybrid features pace of pseudo k-tuple nucleotide composition. Arabian Journal for Science and Engineering, 2018, 43(12): 6719–6727 https://doi.org/10.1007/s13369-017-2818-2
28
A Visel, M J Blow, Z Li, T Zhang, J A Akiyama, A Holt, I Plajzer-Frick, M Shoukry, C Wright, F Chen. ChIP-seq accurately predicts tissuespecific activity of enhancers. Nature, 2009, 457(7231): 854–858 https://doi.org/10.1038/nature07730
29
A Visel, S Prabhakar, J A Akiyama, M Shoukry, K D Lewis, A Holt, I Plajzer-Frick , V Afzal, E M Rubin, L A Pennacchio. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nature Genetics, 2008, 40(2): 158–160 https://doi.org/10.1038/ng.2007.55
30
I V Kulakovskiy, Y A Medvedeva, U Schaefer, A S Kasianov, I E Vorontsov, V B Bajic, V J Makeev. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Research, 2012, 41(D1): 195–202 https://doi.org/10.1093/nar/gks1089
31
J C Bryne, E Valen, M H E Tang, T Marstrand, O Winther, I da Piedade, A Krogh, B Lenhard, A Sandelin. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Research, 2007, 36(suppl_1): 102–106 https://doi.org/10.1093/nar/gkm955
32
J Ernst, M Kellis. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods, 2012, 9(3): 215–216 https://doi.org/10.1038/nmeth.1906
33
M M, Hoffman O J Buske, J Wang, Z Weng, J A Bilmes, W S Noble. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods, 2012, 9(5): 473–480 https://doi.org/10.1038/nmeth.1937
34
H A Firpi, D Ucar, K Tan. Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics, 2010, 26(13): 1579–1586 https://doi.org/10.1093/bioinformatics/btq248
35
N Rajagopal, W Xie, Y Li, U Wagner, W Wang, J Stamatoyannopoulos, J Ernst, M Kellis, B Ren. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Computational Biology, 2013, 9(3): e1002968 https://doi.org/10.1371/journal.pcbi.1002968
36
G D Erwin, N Oksenberg, R M Truty, D Kostka, K K Murphy, N Ahituv, K S Pollard, J A Capra. Integrating diverse datasets improves developmental enhancer prediction. PLoS Computational Biology, 2014, 10(6): e1003677 https://doi.org/10.1371/journal.pcbi.1003677
37
Y Lu, W Qu, G Shan, C Zhang. DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications. PLoS ONE, 2015, 10(6): e0130622 https://doi.org/10.1371/journal.pone.0130622
38
H Bu, Y Gan, Y Wang, S Zhou, J Guan. A new method for enhancer prediction based on deep belief network. BMC Bioinformatics, 2017, 18(12): 418–430 https://doi.org/10.1186/s12859-017-1828-0
39
B Yang, F Liu, C Ren, Z Ouyang, Z Xie, X Bo, W Shu. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics, 2017, 33(13): 1930–1936 https://doi.org/10.1093/bioinformatics/btx105
40
D Kleftogiannis, P Kalnis, V B Bajic. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Research, 2014, 43(1): e6–e6 https://doi.org/10.1093/nar/gku1058
41
J Shao, D Xu, S N Tsai, Y Wang, S M Ngai. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS ONE, 2009, 4(3): e4920 https://doi.org/10.1371/journal.pone.0004920
42
W Chen, T Y Lei, D C Jin, H Lin, K C Chou. PseKNC: a flexible web server for generating pseudo k-tuple nucleotide composition. Analytical Biochemistry, 2014, 456(1): 53–60 https://doi.org/10.1016/j.ab.2014.04.001
43
C Jia, W He. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Scientific Reports, 2016, 6: 38741 https://doi.org/10.1038/srep38741
44
B Liu, L Fang, R Long, X Lan, K C Chou. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics, 2015, 32(3): 362–369 https://doi.org/10.1093/bioinformatics/btv604
45
B Liu, K Li, D S Huang, K C Chou. iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics, 2018, 34(22): 3835– 3842 https://doi.org/10.1093/bioinformatics/bty458
46
N Q K Le, E K Y Yapp, Q T Ho, N Nagasundaram, Ou Y Y, Yeh H Y. iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Analytical Biochemistry, 2019, 571: 53–61 https://doi.org/10.1016/j.ab.2019.02.017
47
X Zeng, S Yuan, X Huang, Q Zou. Identification of cytokine via an improved genetic algorithm. Frontiers of Computer Science, 2015, 9(4): 643–651 https://doi.org/10.1007/s11704-014-4089-3
48
W Zhao, L Wang, T X Zhang, Z N Zhao, P F Du. A brief review on software tools in generating Chou’s pseudo-factor representations for all types of biological sequences. Protein and Peptide Letters, 2018, 25(9): 822–829 https://doi.org/10.2174/0929866525666180905111124
49
S Akbar, M Hayat, M Iqbal, M Tahir. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition. Frontiers of Computer Science, 2020, 14(2): 451–460 https://doi.org/10.1007/s11704-018-8094-9
50
F Ali, M Hayat. Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. Journal of Theoretical Biology, 2015, 384: 78–83 https://doi.org/10.1016/j.jtbi.2015.07.034
51
A LiW, Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22(13): 1658–1659 https://doi.org/10.1093/bioinformatics/btl158
52
L Fu, B Niu, Z Zhu, S Wu, W Li. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28(23): 3150–3152 https://doi.org/10.1093/bioinformatics/bts565
53
B Liu, Y Liu, D Huang. Recombination hotspot/coldspot identification combining three different pseudocomponents via an ensemble learning approach. BioMed Research International, 2016, 10(1): 100–120 https://doi.org/10.1155/2016/8527435
54
Z U Khan, F Ali, I Ahmad, M Hayat, D Pi. CNC iPred : computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection. Chemometrics and Intelligent Laboratory Systems, 2019, 195: 103876 https://doi.org/10.1016/j.chemolab.2019.103876
55
Z Chen, P Zhao, F Li, T T Marquez-Lago, A Leier, J Revote, Y Zhu, D R Powell, T Akutsu, G I Webb, K C Chou. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in Bioinformatics, 2020, 21(3): 1047–1057 https://doi.org/10.1093/bib/bbz041
56
Z Chen, P Zhao, F Li, A Leier, T T Marquez-Lago, Y Wang, G I Webb, A I Smith, R J Daly, K C Chou. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018, 34(14): 2499–2502 https://doi.org/10.1093/bioinformatics/bty140
57
S Zhang, W Zhuang, Z Xu. Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. Analytical Biochemistry, 2018, 549: 149–156 https://doi.org/10.1016/j.ab.2018.03.025
58
W Chen , H Ding, X Zhou, H Lin, K C Chou. iRNA(m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition. Analytical Biochemistry, 2018, 561: 59–65 https://doi.org/10.1016/j.ab.2018.09.002
59
W Chen, P M Feng, H Lin, K C Chou. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Research, 2013, 41(6): e68–e74 https://doi.org/10.1093/nar/gks1450
60
Z U Khan, F Ali, I A Khan, Y Hussain, D Pi. iRSpot-SPI: deep learningbased recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components. Chemometrics and Intelligent Laboratory Systems, 2019, 189: 169–180 https://doi.org/10.1016/j.chemolab.2019.05.003
61
H Lin, E Z Deng, H Ding, W Chen, K C Chou. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 2014, 42(21): 12961–12972 https://doi.org/10.1093/nar/gku1019
62
P Feng, H Yang, H Ding, H Lin, W Chen , K C Chou. iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1): 96–102 https://doi.org/10.1016/j.ygeno.2018.01.005
63
H Yang, W R Qiu, G Liu, F B Guo, W Chen, K C Chou, H Lin. iRSpot- Pse6NC: identifying recombination spots in saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. International Journal of Biological Sciences, 2018, 14(8): 883 https://doi.org/10.7150/ijbs.24616
64
Z U Khan, M Hayat, M A Khan. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. Journal of Theoretical Biology, 2015, 365: 197–203 https://doi.org/10.1016/j.jtbi.2014.10.014
65
F Ali, M Kabir, M Arif, Z N Khan Swati, Z U Khan, M Ullah, D J Yu. DBPPred-PDSD: machine learning approach for prediction of DNAbinding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemometrics and Intelligent Laboratory Systems, 2018, 182: 21–30 https://doi.org/10.1016/j.chemolab.2018.08.013
66
M Hayat, A Khan. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. Journal of Theoretical Biology, 2011, 271(1): 10–17 https://doi.org/10.1016/j.jtbi.2010.11.017
67
K C Chou, H B Shen. Recent progress in protein subcellular location prediction. Analytical Biochemistry, 2007, 370(1): 1–16 https://doi.org/10.1016/j.ab.2007.07.006
A Chokka, K Sandhua Rani . AdaBoost with feature selection using IoT to bring the paths for somatic mutations evaluation in cancer. In: Internet of Things and Personalized Healthcare Systems. Springer, Singapore, 2019, 51–63 https://doi.org/10.1007/978-981-13-0866-6_5
71
S Maldonado, R Weber. A wrapper method for feature selection using Support Vector Machines. Information Sciences, 2009, 179(13): 2208–2217 https://doi.org/10.1016/j.ins.2009.02.014
72
S Das. Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 74–81
73
H H Hsu, CW Hsieh, M D Lu. Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 2011, 38(7): 8144–8150 https://doi.org/10.1016/j.eswa.2010.12.156
H Peng, F Long, C Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8):1226–1238 https://doi.org/10.1109/TPAMI.2005.159
76
R Yang, C Zhang, L Zhang, R Gao. A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed Research International, 2018, 2018(1): 1–10 https://doi.org/10.1155/2018/9364182
77
H J AL-barakati, E W McConnell , L M Hicks, L B Poole, R H Newman. SVM-SulfoSite: a support vector machine based predictor for sulfenylation sites. Scientific Reports, 2018, 8(1): 11288 https://doi.org/10.1038/s41598-018-29126-x
78
Y Ding, D Wilkins. Improving the performance of SVM-RFE to select genes in microarray data. BMC Bioinformatics, 2006, 7(2): S12 https://doi.org/10.1186/1471-2105-7-S2-S12
79
F Javed, M Hayat. Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou’s PseAAC. Genomics, 2019, 111(6): 1325–1332 https://doi.org/10.1016/j.ygeno.2018.09.004
80
B Liu, Y Liu, X Jin, X Wang, B Liu. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Scientific Reports, 2016, 6(1): 1–9 https://doi.org/10.1038/srep33483
81
C Jia, Y Zuo. S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. Journal of Theoretical Biology, 2017, 422: 84–89 https://doi.org/10.1016/j.jtbi.2017.03.031
82
K C Chou. Some remarks on predicting multi-label attributes in molecular biosystems. Molecular Biosystems, 2013, 9: 1092–1100 https://doi.org/10.1039/c3mb25555g
83
K C Chou. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 2011, 273(1): 236–247 https://doi.org/10.1016/j.jtbi.2010.12.024
84
B Liu, S Wang, R Long, K C Chou. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics, 2017, 33(1):35–41 https://doi.org/10.1093/bioinformatics/btw539
85
M Tahir, H Tayara, K T Chong. iRNA-PseKNC (2methyl): identify RNA 2’-o-methylation sites by convolution neural network and chou’s pseudo components. Journal of Theoretical Biology, 2019, 465: 1–6 https://doi.org/10.1016/j.jtbi.2018.12.034
86
H Tayara, M Tahir, K T Chong. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics, 2020, 112(2): 13S96–1403 https://doi.org/10.1016/j.ygeno.2019.08.009