Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2018, Vol. 6 Issue (3) : 275-286    https://doi.org/10.1007/s40484-018-0149-2
METHODOLOGY ARTICLE
A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data
Minzhe Zhang1, Qiwei Li1, Yang Xie1,2,3()
1. Quantitative Biomedical Research Center, Department of Clinical Sciences, U.T. Southwestern Medical Center, Dallas, TX 75390, USA
2. Department of Bioinformatics, U.T. Southwestern Medical Center, Dallas, TX 75390, USA
3. Simmons Comprehensive Cancer Center, U.T. Southwestern Medical Center, Dallas, TX 75390, USA
 Download: PDF(1765 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data.

Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at the authors' website and the R/C++ code is available at https://github.com/liqiwei2000/BaySeqPeak.

Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods.

Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.

Keywords MeRIP-seq data      RNA epigenomics      Bayesian inference      hidden Markov model      zero-inflated negative binomial     
Corresponding Author(s): Yang Xie   
About author:

Tongcan Cui and Yizhe Hou contributed equally to this work.

Online First Date: 03 September 2018    Issue Date: 13 September 2018
 Cite this article:   
Minzhe Zhang,Qiwei Li,Yang Xie. A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data[J]. Quant. Biol., 2018, 6(3): 275-286.
 URL:  
https://academic.hep.com.cn/qb/EN/10.1007/s40484-018-0149-2
https://academic.hep.com.cn/qb/EN/Y2018/V6/I3/275
Fig.1  Examples of the model input and output. (A) One simulated data generated from the ZINB kernel and with the fold change k = 3. Non-zero counts in the control samples are marked in black circle (o), while non-zero counts in the IP samples are marked in red cross (x). The number of extra zeros for each bin and each sample group is given at the top. (B) The marginal posterior probabilities of inclusion p( zw=2 |·) inferred by BaySeqPeak-T with the plug-in size factors s^i total’s . The red dots indicate the true methylated bins. The true and estimated z by MeTPeak, exomePeak (at a 5% significance level cutoff), and BaySeqPeak (with c = 0.5 cutoff) are shown in the top, where the red regions indicate methylated bins.
Fig.2  Examples of the MCMC outputs.
Fig.3  ROC curves produced by different methods.
Fig.4  AUCs produced by different methods.
Fig.5  A comparison of the real data results obtained by our method and exomePeak.
Fig.6  An exemplary methylation region detected by both exomePeak and BaySeqPeak-T shown in the IGV browser.
Fig.7  Distribution of log2 fold change in read counts of detected methylated sites by BaySeqPeak-T, -I and exomePeak.
1 M. M. Suzuki, and A. Bird, (2008) DNA methylation landscapes: provocative insights from epigenomics. Nat. Rev. Genet., 9, 465–476
https://doi.org/10.1038/nrg2341. pmid: 18463664
2 Y. Shi, (2007) Histone lysine demethylases: emerging roles in development, physiology and disease. Nat. Rev. Genet., 8, 829–833
https://doi.org/10.1038/nrg2218. pmid: 17909537
3 Y. Motorin, and M. Helm, (2011) RNA nucleotide methylation. Wiley Interdiscip. Rev. RNA, 2, 611–631
https://doi.org/10.1002/wrna.79. pmid: 21823225
4 D. Dominissini, , S. Moshitch-Moshkovitz, , S. Schwartz, , M. Salmon-Divon, , L. Ungar, , S. Osenberg, , K. Cesarkas, , J. Jacob-Hirsch, , N. Amariglio, , M. Kupiec, , et al. (2012)Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206
5 K. D. Meyer, , Y. Saletore, , P. Zumbo, , O. Elemento, , C. E. Mason, and S. R. Jaffrey, (2012) Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell, 149, 1635–1646
https://doi.org/10.1016/j.cell.2012.05.003. pmid: 22608085
6 M. A. Machnicka, , K. Milanowska, , O. Oglou, , E. Purta, , M. Kurkowska, , A. Olchowik, , W. Januszewski, , S. Kalinowski, , S. Dunin-Horkawicz, , K. M. Rother, , et al. (2013) MODOMICS: a database of RNA modification pathways€–€2013 update. Nucleic Acids Res., 41, D262–D267
https://doi.org/10.1093/nar/gks1007. pmid: 23118484
7 R. Desrosiers, , K. Friderici, and F. Rottman, (1974) Identification of methylated nucleosides in messenger RNA from Novikoff hepatoma cells. Proc. Natl. Acad. Sci. USA, 71, 3971–3975
https://doi.org/10.1073/pnas.71.10.3971. pmid: 4372599
8 J. M. Adams, and S. Cory, (1975) Modified nucleosides and bizarre 5′-termini in mouse myeloma mRNA. Nature, 255, 28–33
https://doi.org/10.1038/255028a0. pmid: 1128665
9 Y. Aloni, , R. Dhar, and G. Khoury, (1979) Methylation of nuclear simian virus 40 RNAs. J. Virol., 32, 52–60
pmid: 232187.
10 J. Liu, , Y. Yue, , D. Han, , X. Wang, , Y. Fu, , L. Zhang, , G. Jia, , M. Yu, , Z. Lu, , X. Deng, , et al. (2014) A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation. Nat. Chem. Biol., 10, 93–95
https://doi.org/10.1038/nchembio.1432. pmid: 24316715
11 X.-L. Ping, , B. F. Sun, , L. Wang, , W. Xiao, , X. Yang, , W. J. Wang, , S. Adhikari, , Y. Shi, , Y. Lv, , Y. S. Chen, , et al. (2014) Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase. Cell Res., 24, 177–189
https://doi.org/10.1038/cr.2014.3. pmid: 24407421
12 G. Jia, , Y. Fu, , X. Zhao, , Q. Dai, , G. Zheng, , Y. Yang, , C. Yi, , T. Lindahl, , T. Pan, , Y. G. Yang, , et al. (2011) N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat. Chem. Biol., 7, 885–887
https://doi.org/10.1038/nchembio.687. pmid: 22002720
13 Y. Yue, , J. Liu, and C. He, (2015) RNA N6-methyladenosine methylation in post-transcriptional gene expression regulation. Genes Dev., 29, 1343–1355
https://doi.org/10.1101/gad.262766.115. pmid: 26159994
14 K. D. Meyer, , and S. R. Jaffrey, (2014) The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nat. Rev. Mol. Cell Bio., 15, 313–326
15 G. Cao, , H-B. Li, , Z. Yin, , R. A. Flavell, (2016) Recent advances in dynamic m6A RNA modification. Open Biol ., 6,160003
16 J. Meng, , X. Cui, , M. K. Rao, , Y. Chen, and Y. Huang, (2013) Exome-based analysis for RNA epigenome sequencing data. Bioinformatics, 29, 1565–1567
https://doi.org/10.1093/bioinformatics/btt171. pmid: 23589649
17 J. Przyborowski, and H. Wilenski, (1940) Homogeneity of results in testing samples from Poisson series: with an application to testing clover seed for dodder. Biometrika, 31, 313–323
18 X. Cui, , J. Meng, , M. K. Rao, , Y. Chen and Y. Huang (2015) HEPeak: an HMM-based exome peak-finding package for RNA epigenome sequencing data.  BMC genomics  16(Suppl 4), S2
https://doi.org/https://doi.org/10.1186/1471-2164-16-S4-S2
19 X. Cui, , J. Meng, , S. Zhang, , Y. Chen, and Y. Huang, (2016) A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. Bioinformatics, 32, i378–i385
https://doi.org/10.1093/bioinformatics/btw281. pmid: 27307641
20 A. Gelman, (2006) Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal., 1, 515–534
https://doi.org/10.1214/06-BA117A.
21 J. C. Marioni, , C. E. Mason, , S. M. Mane, , M. Stephens, and Y. Gilad, (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res., 18, 1509–1517
https://doi.org/10.1101/gr.079558.108. pmid: 18550803
22 J. H. Bullard, , E. Purdom, , K. D. Hansen, and S. Dudoit, (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform., 11, 94
https://doi.org/https://doi.org/10.1186/1471-2105-11-94
23 S. Anders, , and W Huber. (2010) Differential expression analysis for sequence count data. Genome Boil., 11, R106
https://doi.org/https://doi.org/10.1186/gb-2010-11-10-r106
24 M. D. Robinson, , D. J. McCarthy, and G. K. Smyth, (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140
https://doi.org/10.1093/bioinformatics/btp616. pmid: 19910308
25 D. Witten, , R. Tibshirani, , S. Gu, , A. Fire, and W. -O. Lui, (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biol., 8, 58
https://doi.org/https://doi.org/10.1186/1741-7007-8-58
26 D. M. Witten, (2011) Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat., 5, 2493–2518
https://doi.org/10.1214/11-AOAS493.
27 J. Li, , D. M. Witten, , I. M. Johnstone, and R. Tibshirani, (2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13, 523–538
https://doi.org/10.1093/biostatistics/kxr031. pmid: 22003245
28 A. Mortazavi, , B. A. Williams, , K. McCue, , L. Schaeffer, and B. Wold, (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods, 5, 621–628
https://doi.org/10.1038/nmeth.1226. pmid: 18516045
29 C. N. Morris, (1983) Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc., 78, 47–55
https://doi.org/10.1080/01621459.1983.10477920.
30 A. Gelman, (2008) Objections to Bayesian statistics. Bayesian Anal., 3, 445–449
https://doi.org/10.1214/08-BA318.
31 Q. Li, , M. Guindani, , B. J. Reich, , H. D. Bondell, and M. Vannucci, (2017) A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints. Statistical Analysis and Data Mining: The ASA Data Science Journal, 10, 393–409
https://doi.org/10.1002/sam.11350.
32 S. Guha, , Y. Li, and D. Neuberg, (2008) Bayesian hidden Markov modeling of array CGH data. J. Am. Stat. Assoc., 103, 485–497
https://doi.org/10.1198/016214507000000923. pmid: 22375091
33 M. A. Newton, , A. Noueiry, , D. Sarkar, and P. Ahlquist, (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155–176
https://doi.org/10.1093/biostatistics/5.2.155. pmid: 15054023
34 A. Gelman, and D. B. Rubin, (1992) Inference from iterative simulation using multiple sequences. Stat. Sci., 7, 457–472
https://doi.org/10.1214/ss/1177011136.
35 M. E. Hess, , S. Hess, , K. D. Meyer, , L. A. Verhagen, , L. Koch, , H. S. Brönneke, , M. O. Dietrich, , S. D. Jordan, , Y. Saletore, , O. Elemento, , et al. (2013) The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat. Neurosci., 16, 1042–1048
https://doi.org/10.1038/nn.3449. pmid: 23817550
36
37 J. Meng, , Z. Lu, , H. Liu, , L. Zhang, , S. Zhang, , Y. Chen, , M. K. Rao, and Y. Huang, (2014) A protocol for RNA methylation differential analysis with MeRIP-seq data and exomePeak R/Bioconductor package. Methods, 69, 274–281
https://doi.org/10.1016/j.ymeth.2014.06.008. pmid: 24979058
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed