Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2018, Vol. 6 Issue (3) : 275-286
A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data
Minzhe Zhang1, Qiwei Li1, Yang Xie1,2,3()
1. Quantitative Biomedical Research Center, Department of Clinical Sciences, U.T. Southwestern Medical Center, Dallas, TX 75390, USA
2. Department of Bioinformatics, U.T. Southwestern Medical Center, Dallas, TX 75390, USA
3. Simmons Comprehensive Cancer Center, U.T. Southwestern Medical Center, Dallas, TX 75390, USA
Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data.

Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at the authors' website and the R/C++ code is available at

Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods.

Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.

Keywords MeRIP-seq data      RNA epigenomics      Bayesian inference      hidden Markov model      zero-inflated negative binomial     
Corresponding Author(s): Yang Xie   
Online First Date: 03 September 2018    Issue Date: 13 September 2018
 Cite this article:   
Minzhe Zhang,Qiwei Li,Yang Xie. A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data[J]. Quant. Biol., 2018, 6(3): 275-286.
Fig.1  Examples of the model input and output. (A) One simulated data generated from the ZINB kernel and with the fold change k = 3. Non-zero counts in the control samples are marked in black circle (o), while non-zero counts in the IP samples are marked in red cross (x). The number of extra zeros for each bin and each sample group is given at the top. (B) The marginal posterior probabilities of inclusion p( zw=2 |·) inferred by BaySeqPeak-T with the plug-in size factors s^i total’s . The red dots indicate the true methylated bins. The true and estimated z by MeTPeak, exomePeak (at a 5% significance level cutoff), and BaySeqPeak (with c = 0.5 cutoff) are shown in the top, where the red regions indicate methylated bins.
Fig.2  Examples of the MCMC outputs.
Fig.3  ROC curves produced by different methods.
Fig.4  AUCs produced by different methods.
Fig.5  A comparison of the real data results obtained by our method and exomePeak.
Fig.6  An exemplary methylation region detected by both exomePeak and BaySeqPeak-T shown in the IGV browser.
Fig.7  Distribution of log2 fold change in read counts of detected methylated sites by BaySeqPeak-T, -I and exomePeak.
