Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2017, Vol. 5 Issue (4) : 280-290    https://doi.org/10.1007/s40484-017-0109-2
REVIEW
Transcriptome assembly strategies for precision medicine
Lu Wang1, Lipi Acharya2, Changxin Bai1, Dongxiao Zhu1()
1. Department of Computer Science, Wayne State University, Detroit, MI 48202, USA
2. Dow AgroSciences, Indianapolis, IN 46268, USA
 Download: PDF(1377 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: Precision medicine approach holds great promise to tailored diagnosis, treatment and prevention. Individuals can be vastly different in their genomic information and genetic mechanisms hence having unique transcriptomic signatures. The development of precision medicine has demanded moving beyond DNA sequencing (DNA-Seq) to much more pointed RNA-sequencing (RNA-Seq) [Cell, 2017, 168: 584–599].

Results: Here we conduct a brief survey on the recent methodology development of transcriptome assembly approach using RNA-Seq.

Conclusions: Since transcriptomes in human disease are highly complex, dynamic and diverse, transcriptome assembly is playing an increasingly important role in precision medicine research to dissect the molecular mechanisms of the human diseases.

Author Summary  Precision medicine is an emerging area in healthcare that aims to provide personalized diagnosis, treatment, and prevention by taking into account an individual’s genetic information, environment, and lifestyle. Transcriptome assembly approaches enable an in-depth understanding of the cellular mechanisms and cellular processes and thus plays a key role in the precision medicine. In this review article, we survey a number of recently developed transcriptome assembly strategies including de novo transcriptome assembly, reference-based transcriptome assembly, and de novo and reference-based combined strategies.
Keywords precision medicine      transcriptome assembly      RNA-Seq      de novo      De Bruijn     
Corresponding Author(s): Dongxiao Zhu   
Just Accepted Date: 08 June 2017   Online First Date: 31 October 2017    Issue Date: 04 December 2017
 Cite this article:   
Lu Wang,Lipi Acharya,Changxin Bai, et al. Transcriptome assembly strategies for precision medicine[J]. Quant. Biol., 2017, 5(4): 280-290.
 URL:  
https://academic.hep.com.cn/qb/EN/10.1007/s40484-017-0109-2
https://academic.hep.com.cn/qb/EN/Y2017/V5/I4/280
Fig.1  De novo transcriptome assembly strategy.
Fig.2  Reference-based transcriptome assembly strategy. We first splice-align reads to the reference genome in order to construct a splice graph to show all possible isoforms at a locus. Then we traverse the constructed graph to obtain the isoforms at the end.
Fig.3  Work flow of the bridge algorithm.
Fig.4  A generative model of the RNA-sequencing process with the notation. Z is used to model the transcripts that have non-zero abundance; e+ is the relative abundance; and F is the joint distribution over n reads conditionally parameterized by m, s and dependent on a set S of m candidate transcripts and transcript index t.
Software Strategy Link Year of publication
Trans-ABySS [24] De novo https://github.com/bcgsc/transabyss 2010
Rnnotator [31] De novo https://sites.google.com/a/lbl.gov/rnnotator/ 2010
Oases [18] De novo https://github.com/dzerbino/oases 2012
Trinity [32] De novo https://github.com/trinityrnaseq/trinityrnaseq/wiki 2013
SOAPdenovo-Trans [19] De novo https://github.com/aquaskyline/SOAPdenovo-Trans 2014
Bridger [28] De novo https://github.com/fmaguire/Bridger_Assembler 2015
Cufflinks [33] Reference-based http://cole-trapnell-lab.github.io/cufflinks/ 2010
Scripture [34] Reference-based http://software.broadinstitute.org/software/scripture/ 2010
TransComb [35] Reference-based https://sourceforge.net/projects/transcriptomeassembly/files/ 2016
Bayesembler [29] Probability model https://github.com/bioinformatics-centre/bayesembler 2014
Tab.1  Summary of software for transcriptome assembly.
1 J. S. Buguliskis, (2015) Could rna-seq become the workhorse of precision medicine? Genet. Eng. Biotech. N. 35, 8–9
2 R. Chen, and M. Snyder, (2013) Promise of personalized omics to precision medicine. Wiley Interdiscip. Rev. Syst. Biol. Med., 5, 73–82
https://doi.org/10.1002/wsbm.1198 pmid: 23184638
3 F. S. Collins, and H. Varmus, (2015) A new initiative on precision medicine. N. Engl. J. Med., 372, 793–795
https://doi.org/10.1056/NEJMp1500523 pmid: 25635347
4 F. Klauschen,, M. Andreeff,, U. Keilholz, , M. Dietel, and A. Stenzinger, (2014) The combinatorial complexity of cancer precision medicine. Oncoscience, 1, 504–509
https://doi.org/10.18632/oncoscience.66 pmid: 25594052
5 Ö. Çakır, , N. Turgut-Kara, , Ş. Arı, and B. Zhang, (2015) De novo transcriptome assembly and comparative analysis elucidate complicated mechanism regulating Astragalus chrysochlorus response to selenium stimuli. PLoS One, 10, e0135677
https://doi.org/10.1371/journal.pone.0135677 pmid: 26431547
6 Nayak L., Ray I., De R. K. (2016) Precision medicine with electronic medical records: from the patients and for the patients, Ann. Transl. Med. 4 (Suppl 1), S61
7 K. Kourou, , T. P. Exarchos, , K. P. Exarchos, , M. V. Karamouzis, and D. I. Fotiadis, (2015) Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J., 13, 8–17
https://doi.org/10.1016/j.csbj.2014.11.005 pmid: 25750696
8 S. Vural, , X. Wang, and C. Guda, (2016) Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Syst. Biol., 10, 62
https://doi.org/10.1186/s12918-016-0306-z pmid: 27587275
9 D. M. Hyman, , B. S. Taylor, and J. Baselga, (2017) Implementing genome-driven oncology. Cell, 168, 584–599
https://doi.org/10.1016/j.cell.2016.12.015 pmid: 28187282
10 A. Conesa,, P. Madrigal,, S. Tarazona,, D. Gomez-Cabrero,, A. Cervera, , A. McPherson, , M. W. Szcześniak, , D. J. Gaffney,, L. L. Elo, , X. Zhang, , et al. (2016) A survey of best practices for RNA-seq data analysis. Genome Biol., 17, 13
https://doi.org/10.1186/s13059-016-0881-8 pmid: 26813401
11 J. A. Martin, and Z. Wang, (2011) Next-generation transcriptome assembly. Nat. Rev. Genet., 12, 671–682
https://doi.org/10.1038/nrg3068 pmid: 21897427
12 D. R. Zerbino, and E. Birney, (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829
https://doi.org/10.1101/gr.074492.107 pmid: 18349386
13 J. T. Simpson, , K. Wong, , S. D. Jackman, , J. E. Schein, , S. J. Jones, and I. Birol, (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123
https://doi.org/10.1101/gr.089532.108 pmid: 19251739
14 P. A. Pevzner, , H. Tang, and M. S. Waterman, (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753
https://doi.org/10.1073/pnas.171285098 pmid: 11504945
15 M. Fumagalli, (2013) Assessing the effect of sequencing depth and sample size in population genetics inferences. PLoS One, 8, e79667
https://doi.org/10.1371/journal.pone.0079667 pmid: 24260275
16 M. G. Grabherr,, B. J. Haas,, M. Yassour, , J. Z. Levin,, D. A. Thompson,, I. Amit, , X. Adiconis,, L. Fan,, R. Raychowdhury,, Q. Zeng,, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol., 29, 644–652
https://doi.org/10.1038/nbt.1883 pmid: 21572440
17 N. Bruijn, (1946) A Combinatorial Problem. In Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen. Series A, 49, 758–764.
18 M. H. Schulz, , D. R. Zerbino, , M. Vingron, and E. Birney, (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 28, 1086–1092
https://doi.org/10.1093/bioinformatics/bts094 pmid: 22368243
19 Y. Xie, , G. Wu, , J. Tang, , R. Luo, , J. Patterson, , S. Liu, , W. Huang, , G. He, , S. Gu, , S. Li, , et al. (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics, 30, 1660–1666
https://doi.org/10.1093/bioinformatics/btu077 pmid: 24532719
20 R. Li, , C. Yu, , Y. Li, , T.-W. Lam, , S.-M. Yiu, , K. Kristiansen, and J. Wang, (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25, 1966–1967
https://doi.org/10.1093/bioinformatics/btp336 pmid: 19497933
21 C.-Y. Shi, , H. Yang,, C.-L. Wei,, O. Yu,, Z.-Z. Zhang,, C.-J. Jiang,, J. Sun,, Y.-Y. Li,, Q. Chen, , T. Xia, , et al. (2011) Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds. BMC Genomics, 12, 131
https://doi.org/10.1186/1471-2164-12-131 pmid: 21356090
22 R. Garg, , R. K. Patel, , A. K. Tyagi, and M. Jain, (2011) De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res., 18, 53–63
https://doi.org/10.1093/dnares/dsq028 pmid: 21217129
23 Q.-Y. Zhao, , Y. Wang, , Y.-M. Kong, , D. Luo, , X. Li, and P. Hao, (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 12, S2
https://doi.org/10.1186/1471-2105-12-S14-S2 pmid: 22373417
24 G. Robertson,, J. Schein,, R. Chiu, , R. Corbett, , M. Field,, S. D. Jackman, , K. Mungall, , S. Lee, , H. M. Okada, , J. Q. Qian, , et al. (2010) De novo assembly and analysis of RNA-seq data. Nat. Methods, 7, 909–912
https://doi.org/10.1038/nmeth.1517 pmid: 20935650
25 C. Trapnell, , A. Roberts, , L. Goff, , G. Pertea, , D. Kim, , D. R. Kelley, , H. Pimentel, , S. L. Salzberg, , J. L. Rinn, and L. Pachter, (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc., 7, 562–578
https://doi.org/10.1038/nprot.2012.016 pmid: 22383036
26 C. Trapnell, , L. Pachter, and S. L. Salzberg, (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105–1111
https://doi.org/10.1093/bioinformatics/btp120 pmid: 19289445
27 M. Yandell, and D. Ence, (2012) A beginner’s guide to eukaryotic genome annotation. Nat. Rev. Genet., 13, 329–342
https://doi.org/10.1038/nrg3174 pmid: 22510764
28 Z. Chang, , G. Li,, J. Liu, , Y. Zhang, , C. Ashby, , D. Liu, , C. L. Cramer, and X. Huang, (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol., 16, 30.
https://doi.org/10.1186/s13059-015-0596-2 pmid: 25723335
29 L. Maretty, , J. A. Sibbesen, and A. Krogh, ( 2014) Bayesian transcriptome assembly. Genome Biol., 15, 501
https://doi.org/10.1186/s13059-014-0501-4 pmid: 25367074
30 D. Kim, , G. Pertea, , C. Trapnell, , H. Pimentel, , R. Kelley, and S. L. Salzberg, (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol., 14, R36
https://doi.org/10.1186/gb-2013-14-4-r36 pmid: 23618408
31 J. Martin, , V. M. Bruno, , Z. Fang, , X. Meng, , M. Blow, , T. Zhang, , G. Sherlock, , M. Snyder, and Z. Wang, (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 11, 663
https://doi.org/10.1186/1471-2164-11-663 pmid: 21106091
32 B. J. Haas, , A. Papanicolaou, , M. Yassour, , M. Grabherr, , P. D. Blood, , J. Bowden, , M. B. Couger, , D. Eccles, , B. Li, , M. Lieber, , et al. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc., 8, 1494–1512
https://doi.org/10.1038/nprot.2013.084 pmid: 23845962
33 C. Trapnell, , B. A. Williams, , G. Pertea, , A. Mortazavi, , G. Kwan, , M. J. van Baren, , S. L. Salzberg, , B. J. Wold, and L. Pachter, (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 511–515
https://doi.org/10.1038/nbt.1621 pmid: 20436464
34 M. Guttman, , M. Garber, , J. Z. Levin, , J. Donaghey, , J. Robinson, , X. Adiconis, , L. Fan, , M. J. Koziol, , A. Gnirke, , C. Nusbaum, , et al. (2010) Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol., 28, 503–510
https://doi.org/10.1038/nbt.1633 pmid: 20436462
35 J. Liu, , T. Yu, , T. Jiang, and G. Li, (2016) TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol., 17, 213
https://doi.org/10.1186/s13059-016-1074-1 pmid: 27760567
36 E. W. Myers, (1995) Toward simplifying and accurately formulating fragment assembly. J. Comput. Biol., 2, 275–290
https://doi.org/10.1089/cmb.1995.2.275 pmid: 7497129
37 S. Kumar, and M. L. Blaxter, (2010) Comparing de novo assemblers for 454 transcriptome data. BMC Genomics, 11, 571
https://doi.org/10.1186/1471-2164-11-571 pmid: 20950480
38 V. Zeng, , K. E. Villanueva,, B. S. Ewen-Campen,, F. Alwes, , W. E. Browne, and C. G. Extavour, (2011) De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis. BMC Genomics, 12, 581
https://doi.org/10.1186/1471-2164-12-581 pmid: 22118449
39 J. Zhu, , F. He, , J. Wang, and J. Yu, (2008) Modeling transcriptome based on transcript-sampling data. PLoS One, 3, e1659
https://doi.org/10.1371/journal.pone.0001659 pmid: 18286206
40 B. Li, , N. Fillmore, , Y. Bai,, M. Collins, , J. A. Thomson,, R. Stewart, and C. N. Dewey, (2014) Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol., 15, 553
https://doi.org/10.1186/s13059-014-0553-5 pmid: 25608678
41 M. Garber, , M. G. Grabherr,, M. Guttman, and C. Trapnell, (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods, 8, 469–477
https://doi.org/10.1038/nmeth.1613 pmid: 21623353
[1] Jianan Lin, Zhengqing Ouyang. Large-scale analysis of the position-dependent binding and regulation of human RNA binding proteins[J]. Quant. Biol., 2020, 8(2): 119-129.
[2] Xing Chen, Yinglei Lai. A censored-Poisson model based approach to the analysis of RNA-seq data[J]. Quant. Biol., 2020, 8(2): 155-171.
[3] Raffaella Rizzi, Stefano Beretta, Murray Patterson, Yuri Pirola, Marco Previtali, Gianluca Della Vedova, Paola Bonizzoni. Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era[J]. Quant. Biol., 2019, 7(4): 278-292.
[4] Jie Zheng, Ke Wang. Emerging deep learning methods for single-cell RNA-seq data analysis[J]. Quant. Biol., 2019, 7(4): 247-254.
[5] Xingyu Liao, Min Li, You Zou, Fang-Xiang Wu, Yi-Pan, Jianxin Wang. Current challenges and solutions of de novo assembly[J]. Quant. Biol., 2019, 7(2): 90-109.
[6] Aysegul Guvenek, Bin Tian. Analysis of alternative cleavage and polyadenylation in mature and differentiating neurons using RNA-seq data[J]. Quant. Biol., 2018, 6(3): 253-266.
[7] Wei Vivian Li, Jingyi Jessica Li. Modeling and analysis of RNA-seq data: a review from a statistical perspective[J]. Quant. Biol., 2018, 6(3): 195-209.
[8] Saad M Khan, Jason E Denney, Michael X Wang, Dong Xu. Whole-exome sequencing and microRNA profiling reveal PI3K/AKT pathway’s involvement in juvenile myelomonocytic leukemia[J]. Quant. Biol., 2018, 6(1): 85-97.
[9] Varshini Vasudevaraja, Jamie Renbarger, Ridhhi Girish Shah, Garrett Kinnebrew, Murray Korc, Limei Wang, Yang Huo, Enze Liu, Lang Li, Lijun Cheng. PMTDS: a computational method based on genetic interaction networks for Precision Medicine Target-Drug Selection in cancer[J]. Quant. Biol., 2017, 5(4): 380-394.
[10] Ganlu Hu, Guang-Zhong Wang. Decoding nervous system by single-cell RNA sequencing[J]. Quant. Biol., 2017, 5(3): 210-214.
[11] Zhen Wang, Zefeng Wang, Yixue Li. Strategic planning for national biomedical big data infrastructure in China[J]. Quant. Biol., 2017, 5(3): 272-275.
[12] Jing Qin, Bin Yan, Yaohua Hu, Panwen Wang, Junwen Wang. Applications of integrative OMICs approaches to gene regulation studies[J]. Quant. Biol., 2016, 4(4): 283-301.
[13] Zhun Miao, Xuegong Zhang. Differential expression analyses for single-cell RNA-Seq: old questions on new data[J]. Quant. Biol., 2016, 4(4): 243-260.
[14] Qiong-Yi Zhao, Jacob Gratten, Restuadi Restuadi, Xuan Li. Mapping and differential expression analysis from short-read RNA-Seq data in model organisms[J]. Quant. Biol., 2016, 4(1): 22-35.
[15] Amal Katrib, William Hsu, Alex Bui, Yi Xing. “RADIOTRANSCRIPTOMICS”: A synergy of imaging and transcriptomics in clinical assessment[J]. Quant. Biol., 2016, 4(1): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed