Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2023, Vol. 11 Issue (3) : 287-296    https://doi.org/10.15302/J-QB-022-0323
RESEARCH ARTICLE
Transformer-based DNA methylation detection on ionic signals from Oxford Nanopore sequencing data
Xiuquan Wang1, Mian Umair Ahsan2, Yunyun Zhou2(), Kai Wang2,3()
1. Department of Mathematics and Computer Science, Tougaloo College, Jackson, MS 39174, USA
2. Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
3. Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
 Download: PDF(2726 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: Oxford Nanopore long-read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short-read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions).

Method: In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self-attention architecture in the neural networks and has been widely used in natural language processing.

Results: Compared to traditional deep-learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self-attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation-specific signals within a specific sequence context.

Conclusion: We demonstrated the ability of Transformers to detect methylation on ionic signal data.

Keywords Nanopore      long-read sequencing      deep learning      Transformer model      DNA methylation.     
Corresponding Author(s): Yunyun Zhou,Kai Wang   
Just Accepted Date: 14 April 2023   Online First Date: 13 July 2023    Issue Date: 08 October 2023
 Cite this article:   
Xiuquan Wang,Mian Umair Ahsan,Yunyun Zhou, et al. Transformer-based DNA methylation detection on ionic signals from Oxford Nanopore sequencing data[J]. Quant. Biol., 2023, 11(3): 287-296.
 URL:  
https://academic.hep.com.cn/qb/EN/10.15302/J-QB-022-0323
https://academic.hep.com.cn/qb/EN/Y2023/V11/I3/287
Fig.1  Architecture of the transformer-based DNA methylation detection on ONT long-read sequencing data.
Fig.2  Signal visualization.
Fig.3  Experiments for model optimization.
Parameter Annotation of parameters Type Scope
d_model Size of feature vector Even integer [32, 64]
Att. head Number of attention heads Integer [27]
Att. layer Number of attention layers Integer [27]
Batch size Batch size Integer [256, 512]
Learning rate Learning rate Real [0.001, 0.01]
Tab.1  Configuration of values for hyper-parameter tuning
Fig.4  Embedding pattern visulization.
Fig.5  Performance evaluation.
Dataset Models F1 score
NA12878 Transformer method 0.8600
DeepMod 0.8706
deepsignal 0.8520
E. coli Transformer method 0.9410
DeepMod 0.9501
deepsignal 0.9330
Tab.2  Performance of Transformer-based method comparing with DeepMod and Deepsignal on NA12878 and E. coli
1 J. Bell, (2021). Genetic impacts on DNA methylation: research findings and future perspectives. Genome Biol., 22: 127
https://doi.org/10.1186/s13059-021-02347-6
2 M. Kulis, (2010). DNA methylation and cancer. Adv. Genet., 70: 27–56
https://doi.org/10.1016/B978-0-12-380866-0.60002-2
3 B. Jin, K. Robertson, (2013). DNA methyltransferases, DNA damage repair, and cancer. Adv. Exp. Med. Biol., 754: 3–29
https://doi.org/10.1007/978-1-4419-9967-2_1
4 C., Bernstein, V., Nfonsam, A. R. Prasad, (2013). Epigenetic field defects in progression to cancer. World J. Gastrointest. Oncol., 5: 43–49
https://doi.org/10.4251/wjgo.v5.i3.43
5 O., nez-Iglesias, I., Carrera, J. C., Carril, L., ndez-Novoa, N. Cacabelos, (2020). DNA methylation in neurodegenerative and cerebrovascular disorders. Int. J. Mol. Sci., 21: 2220
https://doi.org/10.3390/ijms21062220
6 H., Jeong, I., Mendizabal, S., Berto, P., Chatterjee, T., Layman, N., Usui, K., Toriumi, C., Douglas, D., Singh, I. Huh, et al.. (2021). Evolution of DNA methylation in the human brain. Nat. Commun., 12: 2021
https://doi.org/10.1038/s41467-021-21917-7
7 E. M. Jobe, (2017). DNA methylation and adult neurogenesis. Brain Plast., 3: 5–26
https://doi.org/10.3233/BPL-160034
8 P., Tognini, D. Napoli, (2015). Dynamic DNA methylation in the brain: a new epigenetic mark for experience-dependent plasticity. Front. Cell. Neurosci., 9: 331
https://doi.org/10.3389/fncel.2015.00331
9 C. R., McCoy, M. E., Glover, L. T., Flynn, R. K., Simmons, J. L., Cohen, T., Ptacek, E. J., Lefkowitz, N. L., Jackson, H., Akil, X. Wu, et al.. (2019). Altered DNA methylation in the developing brains of rats genetically prone to high versus low anxiety. J. Neurosci., 39: 3144–3158
https://doi.org/10.1523/JNEUROSCI.1157-15.2019
10 P. A., Jones, J. P. Issa, (2016). Targeting the cancer epigenome for therapy. Nat. Rev. Genet., 17: 630–641
https://doi.org/10.1038/nrg.2016.93
11 S. Mani, (2010). DNA demethylating agents and epigenetic therapy of cancer. Adv. Genet., 70: 327–340
https://doi.org/10.1016/B978-0-12-380866-0.60012-5
12 J. P., Issa, G., Garcia-Manero, F. J., Giles, R., Mannari, D., Thomas, S., Faderl, E., Bayar, J., Lyons, C. S., Rosenfeld, J. Cortes, et al.. (2004). Phase 1 study of low-dose prolonged exposure schedules of the hypomethylating agent 5-aza-2′-deoxycytidine (decitabine) in hematopoietic malignancies. Blood, 103: 1635–1640
https://doi.org/10.1182/blood-2003-03-0687
13 X. L., Ding, X., Yang, G. Liang, (2016). Isoform switching and exon skipping induced by the DNA methylation inhibitor 5-Aza-2′-deoxycytidine. Sci. Rep., 6: 24545
https://doi.org/10.1038/srep24545
14 E. S., Ovenden, N. W., McGregor, R. A. Emsley, (2018). DNA methylation and antipsychotic treatment mechanisms in schizophrenia: progress and future directions. Prog. Neuropsychopharmacol. Biol. Psychiatry, 81: 38–49
https://doi.org/10.1016/j.pnpbp.2017.10.004
15 T. A., Clark, X., Lu, K., Luong, Q., Dai, M., Boitano, S. W., Turner, C. He, (2013). Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation. BMC Biol., 11: 4
https://doi.org/10.1186/1741-7007-11-4
16 J., Beaulaurier, X. Zhang, S., Zhu, R., Sebra, C., Rosenbluh, G., Deikus, N., Shen, D., Munera, M. K., Waldor, A. Chess, et al.. (2015). Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes. Nat. Commun., 6: 7438
https://doi.org/10.1038/ncomms8438
17 Q., Liu, D. C., Georgieva, D. Egli, (2019). NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genomics, 20: 78
https://doi.org/10.1186/s12864-018-5372-8
18 J. T., Simpson, R. E., Workman, P. C., Zuzarte, M., David, L. J. Dursi, (2017). Detecting DNA cytosine methylation using Nanopore sequencing. Nat. Methods, 14: 407–410
https://doi.org/10.1038/nmeth.4184
19 C., Pimiento, D. J., Ehret, B. J. Macfadden, (2010). Ancient nursery area for the extinct giant shark megalodon from the Miocene of Panama. PLoS One, 5: e10552
https://doi.org/10.1371/journal.pone.0010552
20 P., Ni, N., Huang, Z., Zhang, D. Wang, F., Liang, Y., Miao, C. Xiao, F. Luo, (2019). DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics, 35: 4586–4595
https://doi.org/10.1093/bioinformatics/btz276
21 J. L., Weirather, M., de Cesare, Y., Wang, P., Piazza, V., Sebastiano, X. Wang, D. Buck, K. Au, (2017). Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000 Res., 6: 100
https://doi.org/10.12688/f1000research.10571.2
22 Z. W. Yuen, A., Srivastava, R., Daniel, D., McNevin, C. Jack, (2021). Systematic benchmarking of tools for CpG methylation detection from Nanopore sequencing. Nat. Commun., 12: 3438
https://doi.org/10.1038/s41467-021-23778-6
23 Q., Liu, L., Fang, G., Yu, D., Wang, C. Xiao, (2019). Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun., 10: 2449
https://doi.org/10.1038/s41467-019-10168-2
24 Y., Liu, W., Rosikiewicz, Z., Pan, N., Jillette, P., Wang, A., Taghbalout, J., Foox, C., Mason, M., Carroll, A. Cheng, et al.. (2021). DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol., 22: 295
https://doi.org/10.1186/s13059-021-02510-z
25 Y. ZhangK., YamaguchiS., HatakeyamaY., FurukawaS., Miyano R. Yamaguchi. (2021) On the application of bert models for Nanopore methylation detection. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 320–327
26 L., Jiao, F., Zhang, F., Liu, S., Yang, L., Li, Z. Feng, (2019). A survey of deep learning-based object detection. IEEE Access, 7: 128837–128868
https://doi.org/10.1109/ACCESS.2019.2939201
27 S. L., Amarasinghe, S., Su, X., Dong, L., Zappia, M. E. Ritchie, (2020). Opportunities and challenges in long-read sequencing data analysis. Genome Biol., 21: 30
https://doi.org/10.1186/s13059-020-1935-5
28 J., DevlinM. ChangK. Lee. (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv, 181004805
29 A., VaswaniN. M., ShazeerN., ParmarJ., UszkoreitL., Jones A. N., GomezL. Kaiser. (2017) Attention is all you need. arXiv, 1706.03762
[1] Yuanpeng Xiong, Xuan He, Dan Zhao, Tao Jiang, Jianyang Zeng. DeepRCI: predicting RNA-chromatin interactions via deep learning with multi-omics data[J]. Quant. Biol., 2023, 11(3): 275-286.
[2] Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang. Exploration on learning molecular docking with deep learning models[J]. Quant. Biol., 2023, 11(3): 320-331.
[3] Qijin Yin, Rui Fan, Xusheng Cao, Qiao Liu, Rui Jiang, Wanwen Zeng. DeepDrug: A general graph-based deep learning framework for drug-drug interactions and drug-target interactions prediction[J]. Quant. Biol., 2023, 11(3): 260-274.
[4] Mateusz Chiliński, Anup Kumar Halder, Dariusz Plewczynski. Prediction of chromatin looping using deep hybrid learning (DHL)[J]. Quant. Biol., 2023, 11(2): 155-162.
[5] Haiyan Gong, Zhengyuan Chen, Yuxin Tang, Minghong Li, Sichen Zhang, Xiaotong Zhang, Yang Chen. Computational methods for identifying enhancer-promoter interactions[J]. Quant. Biol., 2023, 11(2): 122-142.
[6] Huijie Sun, Junli Zhao, Chengyuan Wang, Yi Li, Niankai Zhang, Mingquan Zhou. Skull ethnic classification by combining skull auxiliary image with deep learning[J]. Quant. Biol., 2022, 10(4): 381-389.
[7] Xiaokang Chai, Yachao Di, Zhao Feng, Yue Guan, Guoqing Zhang, Anan Li, Qingming Luo. Deep learning-based large-scale named entity recognition for anatomical region of mammalian brain[J]. Quant. Biol., 2022, 10(3): 253-263.
[8] HyeongChan Jo, Juhyun Kim, Tzu-Chen Huang, Yu-Li Ni. condLSTM-Q: A novel deep learning model for predicting COVID-19 mortality in fine geographical scale[J]. Quant. Biol., 2022, 10(2): 125-138.
[9] Aishwarza Panday, Muhammad Ashad Kabir, Nihad Karim Chowdhury. A survey of machine learning techniques for detecting and diagnosing COVID-19 from imaging[J]. Quant. Biol., 2022, 10(2): 188-207.
[10] Xinsheng Sean Ling. DNA sequencing using nanopores and kinetic proofreading[J]. Quant. Biol., 2020, 8(3): 187-194.
[11] Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu. Imputation of single-cell gene expression with an autoencoder neural network[J]. Quant. Biol., 2020, 8(1): 78-94.
[12] Jie Ren, Kai Song, Chao Deng, Nathan A. Ahlgren, Jed A. Fuhrman, Yi Li, Xiaohui Xie, Ryan Poplin, Fengzhu Sun. Identifying viruses from metagenomic data using deep learning[J]. Quant. Biol., 2020, 8(1): 64-77.
[13] Jie Zheng, Ke Wang. Emerging deep learning methods for single-cell RNA-seq data analysis[J]. Quant. Biol., 2019, 7(4): 247-254.
[14] Tanlin Sun, Luhua Lai, Jianfeng Pei. Analysis of protein features and machine learning algorithms for prediction of druggable proteins[J]. Quant. Biol., 2018, 6(4): 334-343.
[15] Sheng Wang, Zhen Li, Yizhou Yu, Xin Gao. WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets[J]. Quant. Biol., 2018, 6(4): 359-368.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed