Frontiers of Medicine

ISSN 2095-0217

ISSN 2095-0225(Online)

CN 11-5983/R

Postal Subscription Code 80-967

2018 Impact Factor: 1.847

Front Med    2013, Vol. 7 Issue (3) : 280-289     DOI: 10.1007/s11684-013-0265-3
Identification of cancer gene fusions based on advanced analysis of the human genome or transcriptome
Lu Wang()
Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
Download: PDF(243 KB)   HTML
Many gene fusions have been recognized as important diagnostic and/or prognostic markers in human malignancies. In recent years, novel gene fusions have been identified in cases without prior knowledge of the genetic background. Accompanied by a powerful computational data analysis method, new genome-wide screening approaches were used to detect cryptic genomic aberrations. This review focused on advanced genome-wide screening approaches in fusion gene identification, such as microarray-based approaches, next-generation sequencing, and NanoString nCounter gene expression system. The fundamental rationale and strategy for fusion gene identification using each biotech platform are also discussed.

Keywords gene fusion      cancer      microarray      next-generation sequencing      NanoString nCounter system     
Corresponding Authors: Wang Lu,   
Issue Date: 05 September 2013
URL:     OR
Fig.1  Fundamental rationale for exon-level expression profiling (by exon-array, RNA-Seq, and NanoString nCounter Expression system) in the detection of fusion genes. Left panel: schematic diagrams of two fusion partners and the corresponding fusion genes—gene structures, transcripts, and exon-array probe coverage. The promoter of the 5′ gene (gene A) is often stronger than that of the 3′ gene (gene B). Upper right panel: in terms of the fusion transcript A-B, the strong promoter A (P-A) upregulates the expression of exons derived from gene B. In contrast, the expression of the reciprocal fusion transcript B-A is under the regulation of the weak promoter B (P-B). In many translocation cases, the reciprocal fusion transcript B-A cannot be detected. Therefore, in exon-level expression data, the expression of the exons of gene B shows a special pattern characterized by a change point between the exons flanking the fusion point, as illustrated in the lower right panel.
RNA-SeqWhole-genome sequencing
TargetTranscriptome — actively expressed at the RNA levelWhole genome — considerably larger than the transcriptome
Fusion point mappingSplice exon-exon junctionsActual fusion positions in the genome
Capability and limitationIdentifies only fusion events that produce fusion transcriptsAdvantage? Focuses on events that have higher likelihood of being functional or causal in biological or disease settings? Can detect fusion transcripts resulting from trans-splicing and read-throughDisadvantageCannot detect fusions that involve only a non-transcribed promoter/enhancer element, e.g., IGH/MYC resulting from t(8;14) chromosome rearrangement in lymphomasAdvantageCan identify all chromosome rearrangements in a given genomeDisadvantage? Transcriptional status of many aberrations identified by gDNA-Seq are not established? Cannot identify fusion events that are due to non-genomic factors, such as trans-splicing and read-through events
Other issueTechnical issues lead to false positives in the identification of fusion events by RNA-Seq? Template switching during reverse transcription? Reverse transcriptases may synthesize cDNA in a primer-independent mannerCost- and time-consuming
Tab.1  Comparison of RNA-Seq and whole-genome sequencing
Data to be analyzedDescriptionCommentsReference
FusionSeqRNA-Seq,PE readsIdentification of potential fusions based on PE mapping and a sophisticated filtration cascade to filter out analysis artifactsSboner et al., 2010 [37]
ShortFuseRNA-Seq,PE readsUses both unique and ambiguously paired-end readsNo requirements of unique mappings or long single read sequencing (≥75 nt)Cannot identify transcripts that are produced by novel or aberrant splicing, e.g., fusions involving intra-exon breakpointsKinsella et al., 2011 [38]
DeFuseRNA-Seq,PE readsUses clusters of discordant paired-end alignments to perform a split read alignment analysis for finding fusion boundariesCan identify fusion boundaries in the middle of exons or involving intronicRelies on the reference of an annotated set of genes used by the alignerMcPherson et al., 2011 [39]
FusionMapRNA-Seq or gDNA-Seq,SE or PE readsDetects and aligns fusion junction-spanning reads to the reference genomeLong length reads (≥75nt) are preferredGe et al., 2011 [40]
ChimeraScanRNA-Seq,PE readsIdentifies potential fusion events by mining end sequences for alignments to all possible pairings of exons of the potentially fused gene pairs using discordantly aligned PE readsIyer et al., 2011[41]
TopHat-FusionRNA-Seq,SE or PE readsSpliced alignment programs directly detect individual reads (as well as paired reads) that span a fusion event without relying on existing annotationCan identify fusions that include novel genes and novel splice variants of known genesKim and Salzberg, 2011 [42]
FusionFinderRNA-Seq,SE or PE readsA Perl-based software suite designed for the discovery of fusion transcripts and their isoformsCannot typically detect fusions involving intra-exon breakpointsFrancis et al., 2012 [43]
EricScriptRNA-Seq,PE readsDetects chimeric transcripts in PE RNA-seq data using a transcriptome reference for mapping readsCannot discover gene fusions involving unannotated transcribed regionsBenelli et al., 2012 [44]
Tab.2  Computational approaches for RNA-Seq data analysis that are freely available from public literature
TargetRNA qualityVerification stepsOutput dataComments
Array-based gene/exon expression profilingWhole transcriptome with reliance on the availability of probesHigh-quality RNA required*RACE →RT-PCRUsually reveals the 3′ fusion partner onlyRACE is required to identify the other fusion partnerData analysis is time- consuming
RNA-SeqWhole transcriptomeHigh-quality RNA requiredRT-PCRReports the exact fusion sequence at base-pair resolutionChallenges in data alignment and data mining
Nanostring nCounter gene expressionA limited number of genes defined by the design of a probe setLess sensitive to RNA qualityRACE →RT-PCRUsually reveals one fusion partner onlyRACE is required to identify the other fusion partnerRelatively cost-efficient and less challenging in data analysis
Tab.3  Advantages and disadvantages of three major RNA-based platforms in the discovery of gene fusions
Fig.2  Workflows of the identification of gene fusions based on high-throughput genome-wide screening.
