Quant. Biol.    2016, Vol. 4 Issue (1) : 22-35
Mapping and differential expression analysis from short-read RNA-Seq data in model organisms
Qiong-Yi Zhao1(), Jacob Gratten1, Restuadi Restuadi1, Xuan Li2()
1. The University of Queensland, Queensland Brain Institute, St Lucia, Qld 4072, Australia
2. Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
Recent advances in next-generation sequencing technology allow high-throughput RNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies. For model organisms with a reference genome, the first step in analysis of RNA-Seq data involves mapping of short-read sequences to the reference genome. Reference-guided transcriptome assembly is an optional step, which is recommended if the aim is to identify novel transcripts. Following read mapping, the primary interest of biologists in many RNA-Seq studies is the investigation of differential expression between experimental groups. In this review, we discuss recent developments in RNA-Seq data analysis applied to model organisms, including methods and algorithms for direct mapping, reference-guided transcriptome assembly and differential expression analysis, and provide insights for the future direction of RNA-Seq.

Author Summary   

RNA-Seq is a revolutionary methodology that employs high-throughput sequencing technologies to enable highly sensitive detection and quantification of RNA in biological samples. Mapping of RNA-Seq data to a reference is a fundamental step for all forms of RNA-Seq data analysis in model organisms, and differential expression analysis is the primary interest of biologists in many RNA-Seq studies. In this review we discuss recent developments in these two fields and provide insights for the future direction of RNA-Seq. We see our review as a resource for the community that will enable researchers to select the most appropriate tools for RNA-Seq data analysis.

Keywords RNA-Seq      mapping      reference-guided transcriptome assembly      differential expression analysis     
Corresponding Author(s): Qiong-Yi Zhao,Xuan Li   
Online First Date: 16 March 2016    Issue Date: 16 March 2016
Qiong-Yi Zhao,Jacob Gratten,Restuadi Restuadi, et al. Mapping and differential expression analysis from short-read RNA-Seq data in model organisms[J]. Quant. Biol., 2016, 4(1): 22-35.
Fig.1  The general concept and data structures for three broad categories of mapping algorithms.
AlignerSpliced (Y/N)Supported NGS data
Aligners based on hash tables
BWA-MEMNIllumina (>70 bp), 454 and long-read data (e.g., PacBio). The developer of BWA recommends BWA-MEM over BWA-SW as it is faster and more accurate than BWA-SW
CLC genomicsNAlmost all NGS data (commercial software package)
ElandNIllumina (implemented by Illumina)
MAQNIllumina. MAQ has not been maintained since 2008. The developer of MAQ recommends people to use other tools (such as BWA) rather than MAQ
NovoalignNIllumina (commercial software package)
RazerS 3NIllumina, 454 and long read platforms (e.g., PacBio). RazerS 3 is a successor to RazerS
RMAPNIllumina and bisulfite-treaded Illumina reads
SHRiMPNIllumina and SOLiD. SHRiMP has not been maintained since 2012
SMALTNIllumina and 454
ZOOMNIllumina and SOLiD
Aligners based on suffix trees
BowtieNIllumina, 454, SOLiD. Works best when aligning short reads to large genomes
Bowtie2NIllumina, 454 and long-read data. For reads>50 bp, Bowtie2 is generally faster, more sensitive, and uses less memory than Bowtie
BWANIllumina (≤100 bp)
BWA-SWNIllumina (>70 bp), 454. BWA-SW has better sensitivity when alignment gaps are frequent
HISATYIllumina, 454. HISAT is>50 times faster than?TopHat2 with better alignment quality
HISAT2YIllumina, 454. HISAT2 is a successor to both HISAT and TopHat2
SegemehlNIllumina and bisulfite-treaded Illumina data, 454 and long-read data
TopHatYIllumina, 454, SOLiD. It uses Bowtie or Bowtie2 as the underlying mapping engine
TopHat2YIllumina, 454, SOLiD. TopHat2 is a successor to TopHat
Aligners based on merge sorting
SliderNData from Illumina Genome Analyzer
SliderIINData from Illumina Genome Analyzer
Tab.1  Popular short-read aligners.
Fig.2  The workflow for RNA-Seq differential expression analysis.
