Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2023, Vol. 11 Issue (2) : 155-162    https://doi.org/10.15302/J-QB-022-0315
RESEARCH ARTICLE
Prediction of chromatin looping using deep hybrid learning (DHL)
Mateusz Chiliński1,2, Anup Kumar Halder1,2, Dariusz Plewczynski1,2()
1. Faculty of Mathematics and Information Sciences, Warsaw University of Technology, 00-662 Warsaw, Poland
2. Laboratory of Functional and Structural Genomics Centre of New Technologies University of Warsaw, 02-097 Warsaw, Poland
 Download: PDF(5191 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: With the development of rapid and cheap sequencing techniques, the cost of whole-genome sequencing (WGS) has dropped significantly. However, the complexity of the human genome is not limited to the pure sequence—and additional experiments are required to learn the human genome’s influence on complex traits. One of the most exciting aspects for scientists nowadays is the spatial organisation of the genome, which can be discovered using spatial experiments (e.g., Hi-C, ChIA-PET). The information about the spatial contacts helps in the analysis and brings new insights into our understanding of the disease developments.

Methods: We have used an ensemble of deep learning with classical machine learning algorithms. The deep learning network we used was DNABERT, which utilises the BERT language model (based on transformers) for the genomic function. The classical machine learning models included support vector machines (SVMs), random forests (RFs), and K-nearest neighbor (KNN). The whole approach was wrapped together as deep hybrid learning (DHL).

Results: We found that the DNABERT can be used to predict the ChIA-PET experiments with high precision. Additionally, the DHL approach has increased the metrics on CTCF and RNAPII sets.

Conclusions: DHL approach should be taken into consideration for the models utilising the power of deep learning. While straightforward in the concept, it can improve the results significantly.

Keywords deep learning      3D genomics      transformers      spatial organisation of nucleus      ChIA-PET      DNA-Seq     
Corresponding Author(s): Dariusz Plewczynski   
About author:

* These authors contributed equally to this work.

Just Accepted Date: 10 February 2023   Online First Date: 13 March 2023    Issue Date: 21 June 2023
 Cite this article:   
Mateusz Chiliński,Anup Kumar Halder,Dariusz Plewczynski. Prediction of chromatin looping using deep hybrid learning (DHL)[J]. Quant. Biol., 2023, 11(2): 155-162.
 URL:  
https://academic.hep.com.cn/qb/EN/10.15302/J-QB-022-0315
https://academic.hep.com.cn/qb/EN/Y2023/V11/I2/155
Fig.1  The performance evaluation on the holdout datasets from (A) CTCF and (B) RNAPII.
DatasetClassifierAccuracyPrecisionRecallF1MCC
CTCFDNABERT0.78080.7820.78080.78060.562
ALL-ML0.71130.74250.64680.69140.4261
DHL0.84090.85660.8190.83740.6826
RNAPIIDNABERT0.7750.7760.7750.7750.552
ALL-ML0.74570.79590.66080.72210.4986
DHL0.84480.86750.81390.83980.6909
Tab.1  The performance evaluation of the DHL model for both datasets CTCF and RNAPII
Fig.2  Workflow of deep hybrid learning for chromatin loop prediction.
Fig.3  The performance evaluation of CTCF data with varying k values on (A) training and (B) testing set
Fig.4  The performance evaluation of RNAPII data with varying k values on (A) training and (B) testing set.
Fig.5  The performance evaluation of the CTCF and RNAPII dataset in five-fold cross-validation.
1 E. S., Lander, L. M., Linton, B., Birren, C., Nusbaum, M. C., Zody, J., Baldwin, K., Devon, K., Dewar, M., Doyle, W. FitzHugh, et al.. (2001). Initial sequencing and analysis of the human genome. Nature, 409: 860–921
https://doi.org/10.1038/35057062
2 I. H. G. S. Consortium, (2004). Finishing the euchromatic sequence of the human genome. Nature, 431: 931–945
https://doi.org/10.1038/nature03001
3 G. R., Abecasis, D., Altshuler, A., Auton, L. D., Brooks, R. M., Durbin, R. A., Gibbs, M. E. Hurles, G. A. McVean, (2010). A map of human genome variation from population-scale sequencing. Nature, 467: 1061–1073
https://doi.org/10.1038/nature09534
4 A., Auton, L. D., Brooks, R. M., Durbin, E. P., Garrison, H. M., Kang, J. O., Korbel, J. L., Marchini, S., McCarthy, G. A., McVean, G. R. Abecasis, et al.. (2015). A global reference for human genetic variation. Nature, 526: 68–74
https://doi.org/10.1038/nature15393
5 M. J. P., Chaisson, A. D., Sanders, X., Zhao, A., Malhotra, D., Porubsky, T., Rausch, E. J., Gardner, O. L., Rodriguez, L., Guo, R. L. Collins, et al.. (2019). Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun., 10: 1784
https://doi.org/10.1038/s41467-018-08148-z
6 K., Ozaki, Y., Ohnishi, A., Iida, A., Sekine, R., Yamada, T., Tsunoda, H., Sato, H., Sato, M., Hori, Y. Nakamura, et al.. (2002). Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nat. Genet., 32: 650–654
https://doi.org/10.1038/ng1047
7 A. Pombo, (2015). Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol., 16: 245–257
https://doi.org/10.1038/nrm3965
8 J., Dekker, K., Rippe, M. Dekker, (2002). Capturing chromosome conformation. Science, 295: 1306–1311
https://doi.org/10.1126/science.1067799
9 M., Simonis, P., Klous, E., Splinter, Y., Moshkin, R., Willemsen, E., de Wit, B. van Steensel, (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38: 1348–1354
https://doi.org/10.1038/ng1896
10 E., Lieberman-Aiden, N. L., van Berkum, L., Williams, M., Imakaev, T., Ragoczy, A., Telling, I., Amit, B. R., Lajoie, P. J., Sabo, M. O. Dorschner, et al.. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326: 289–293
https://doi.org/10.1126/science.1181369
11 M. J., Fullwood, M. H., Liu, Y. F., Pan, J., Liu, H., Xu, Y. B., Mohamed, Y. L., Orlov, S., Velkov, A., Ho, P. H. Mei, et al.. (2009). An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462: 58–64
https://doi.org/10.1038/nature08497
12 G., Fudenberg, D. R. Kelley, K. Pollard, (2020). Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods, 17: 1111–1117
https://doi.org/10.1038/s41592-020-0958-x
13 J., TanN., Shenker-TaurisJ., Rodriguez-HernaezE., WangT., SakellaropoulosF., BoccalatteP., ThandapaniJ., SkokI., Aifantis. (2022) Cell type-specific prediction of 3D chromatin architecture. Nat. Biotechnol.,
14 J., Devlin, M. Chang, K. Lee, (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 181004805
15 J., Zou, M., Huss, A., Abid, P., Mohammadi, A. Torkamani, (2019). A primer on deep learning in genomics. Nat. Genet., 51: 12–18
https://doi.org/10.1038/s41588-018-0295-5
16 A. Sherstinsky. (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 404: 132306
17 Y., Ji, Z., Zhou, H. Liu, R. Davuluri, (2021). DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37: 2112–2120
https://doi.org/10.1093/bioinformatics/btab083
18 C. Cortes, (1995). Support-vector networks. Mach. Learn., 20: 273–297
https://doi.org/10.1007/BF00994018
19 L. Breiman, (2001). Random forests. Mach. Learn., 45: 5–32
https://doi.org/10.1023/A:1010933404324
20 E. Fix, J. Hodges, (1989). Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev., 57: 238–247
https://doi.org/10.2307/1403797
21 S. S. P., Rao, M. H., Huntley, N. C., Durand, E. K., Stamenova, I. D., Bochkov, J. T., Robinson, A. L., Sanborn, I., Machol, A. D., Omer, E. S. Lander, et al.. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159: 1665–1680
https://doi.org/10.1016/j.cell.2014.11.021
22 E. McArthur, J. Capra, (2021). Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. Am. J. Hum. Genet., 108: 269–283
https://doi.org/10.1016/j.ajhg.2021.01.001
23 A. Halder, P., Chatterjee, M., Nasipuri, D. Plewczynski, (2019). 3gClust: human protein cluster analysis. IEEE/ACM Trans. Comput. Biol. Bioinforma., 16: 1773–1784
https://doi.org/10.1109/TCBB.2018.2840996
[1] QB-22315-OF-PD_suppl_1 Download
[1] Haiyan Gong, Zhengyuan Chen, Yuxin Tang, Minghong Li, Sichen Zhang, Xiaotong Zhang, Yang Chen. Computational methods for identifying enhancer-promoter interactions[J]. Quant. Biol., 2023, 11(2): 122-142.
[2] Huijie Sun, Junli Zhao, Chengyuan Wang, Yi Li, Niankai Zhang, Mingquan Zhou. Skull ethnic classification by combining skull auxiliary image with deep learning[J]. Quant. Biol., 2022, 10(4): 381-389.
[3] Xiaokang Chai, Yachao Di, Zhao Feng, Yue Guan, Guoqing Zhang, Anan Li, Qingming Luo. Deep learning-based large-scale named entity recognition for anatomical region of mammalian brain[J]. Quant. Biol., 2022, 10(3): 253-263.
[4] Aishwarza Panday, Muhammad Ashad Kabir, Nihad Karim Chowdhury. A survey of machine learning techniques for detecting and diagnosing COVID-19 from imaging[J]. Quant. Biol., 2022, 10(2): 188-207.
[5] HyeongChan Jo, Juhyun Kim, Tzu-Chen Huang, Yu-Li Ni. condLSTM-Q: A novel deep learning model for predicting COVID-19 mortality in fine geographical scale[J]. Quant. Biol., 2022, 10(2): 125-138.
[6] Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu. Imputation of single-cell gene expression with an autoencoder neural network[J]. Quant. Biol., 2020, 8(1): 78-94.
[7] Jie Ren, Kai Song, Chao Deng, Nathan A. Ahlgren, Jed A. Fuhrman, Yi Li, Xiaohui Xie, Ryan Poplin, Fengzhu Sun. Identifying viruses from metagenomic data using deep learning[J]. Quant. Biol., 2020, 8(1): 64-77.
[8] Jie Zheng, Ke Wang. Emerging deep learning methods for single-cell RNA-seq data analysis[J]. Quant. Biol., 2019, 7(4): 247-254.
[9] Tanlin Sun, Luhua Lai, Jianfeng Pei. Analysis of protein features and machine learning algorithms for prediction of druggable proteins[J]. Quant. Biol., 2018, 6(4): 334-343.
[10] Sheng Wang, Zhen Li, Yizhou Yu, Xin Gao. WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets[J]. Quant. Biol., 2018, 6(4): 359-368.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed