Whole genome sequencing and its applications in medical genetics
Jiaxin Wu, Mengmeng Wu, Ting Chen, Rui Jiang()
MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China
Fundamental improvement was made for genome sequencing since the next-generation sequencing (NGS) came out in the 2000s. The newer technologies make use of the power of massively-parallel short-read DNA sequencing, genome alignment and assembly methods to digitally and rapidly search the genomes on a revolutionary scale, which enable large-scale whole genome sequencing (WGS) accessible and practical for researchers. Nowadays, whole genome sequencing is more and more prevalent in detecting the genetics of diseases, studying causative relations with cancers, making genome-level comparative analysis, reconstruction of human population history, and giving clinical implications and instructions. In this review, we first give a typical pipeline of whole genome sequencing, including the lab template preparation, sequencing, genome assembling and quality control, variants calling and annotations. We compare the difference between whole genome and whole exome sequencing (WES), and explore a wide range of applications of whole genome sequencing for both mendelian diseases and complex diseases in medical genetics. We highlight the impact of whole genome sequencing in cancer studies, regulatory variant analysis, predictive medicine and precision medicine, as well as discuss the challenges of the whole genome sequencing.

Whole genome sequencing is prevalent in detecting the genetics of diseases, studying causative relations with cancers, making genome-level comparative analysis, and giving clinical implications and instructions. In this review, we give a typical pipeline of whole genome sequencing, compare the difference between whole genome and whole exome sequencing, and explore a wide range of applications of whole genome sequencing in medical genetics. We highlight the impact of whole genome sequencing in cancer studies, regulatory variant analysis, predictive medicine and precision medicine, and discuss the challenges of the whole genome sequencing.

Keywords whole genome sequencing      whole exome sequencing      next-generation sequencing      non-coding      regulatory variant     
Just Accepted Date: 21 March 2016   Online First Date: 11 May 2016    Issue Date: 26 May 2016
Fig.1  A typical pipeline of whole genome sequencing.
Platform Total output Time Read length # of single reads Run price
454 (Roche) GS FLX+ 700 Mb 23 h <1 kb 1 M ~$6k
454 (Roche) GS Jr. 35 Mb 10 h ~700 bp 0.1 M ~$1k
Illumina Hiseq X Ten 1.8 Tb 3 d 2 × 150bp 6 B ~$12k
Illumina Hiseq 2500 HT v4 1 Tb 6 d 2 × 125bp 4 B ~$29k
Illumina Hiseq 2500 Rapid 180 Gb 40 h 2 × 150bp 600 M ~$8k
Illumina NextSeq 500 129 Gb 29 h 2 × 150bp 400 M $4k
Illumina MiSeq 15 Gb ~65 h 2 × 300bp 25 M ~$1.4k
Life Technologies SOLiD 5500xl 95 Gb 6 d 2 × 60bp 800 M ~$10k
Life Technologies SOLiD 5500 48 Gb 6 d 2 × 60bp 400 M ~$5k
Life Technologies Ion Torrent PI ~10 Gb 2−4 h <200 bp <82 M >$1k
Tab.1  Comparison of whole genome sequencing platforms.
Name Method Platform
Bowtie2 Alignment Illumina, 454
BWA Alignment Illumina, ABI SOLiD
SOAP3-DP Alignment Illumina
MAQ Reference Illumina, SOLiD
RMAP Reference Illumina
SeqMan NGen Ref/ De novo Illumina, SOLiD, 454, Ion Torrent, Sanger
ABySS De novo Illumina,SOLid
ALLPATHS-LG De novo Illumina
Edena De novo Illumina
Euler-sr De novo Sanger, 454, Illumina
Forge De novo Sanger, 454, Illumina
Newbler De novo 454
SOAPdenovo De novo Illumina
SPAdes De novo Illumina, PacBio
SSAKE De novo Illumina
Velvet De novo Illumina, 454
Tab.2  Alignment and genome assembling tools.
Name Platform
FastQC Illumina, SOLiD, 454, PacBio
FASTX-Toolkit Illumina
HTQC Illumina
NGSQC Illumina, SOLiD
NGS QC Toolkit Illumina, 454
PRINSEQ Illumina, 454
SolexaQA Illumina, 454
TileQC Illumina
Tab.3  Quality assessment tools before the assembling.
Name Properties
ALE ?Reference-independent, statistical measure
Picard A set of tools for processing and analyzing Illumina sequence data
QUAST With and without a reference genome
REAPR Assemblers using paired end reads, without a reference genome
Tab.4  Quality assessment tools for genome assemblers.
Name Type
Bambino SNVs, indels
CORTEX SNVs, indels
GATK SNVs, indels
glfTools SNVs
SAMtools SNVs, indels
SNVer SNVs, indels
SomaticSniper SNVs
VarScan 2 SNVs, CNVs
cn.mops CNVs
CNV-seq CNVs
CopySeq CNVs
modSaRa CNVs
RDXplorer CNVs
BreakDancer SVs
ClipCrop SVs
Pindel SVs
VariationHunter SVs
Tab.5  Variants calling tools for detect SNVs, CNVs or SVs.
Whole genome sequencing Whole exome sequencing
Time 6–8 weeks 1–3 weeks
Cost $795–$4150/sample $390–$1050/sample
Sequencing depth Usually 30× Usually>50×
Sequenced region Coding and non-coding regions of the genome Exomes, promoters and enhancers
Tab.6  Comparison between whole genome and whole exome sequencing.
Name Molecular trait Other techniques Ref.
eQTL Gene expression RNA-seq or microarray Rockman [ 108]
dsQTL Open chromatin DNaseI-seq Degner [ 109]
sQTL RNA splicing RNA-seq Monlong [ 110]
rtQTL DNA replication timing FACS-sorting Amnon [ 111]
haQTL Histone acetylation ChIP-seq Rosario [ 112]
metQTL DNA methylation Bisulfite-seq Gibbs [ 113]
Tab.7  Various types of QTLs.
