Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

邮发代号 80-971

Quantitative Biology  2019, Vol. 7 Issue (2): 90-109   https://doi.org/10.1007/s40484-019-0166-9
  本期目录
Current challenges and solutions of de novo assembly
Xingyu Liao1, Min Li1(), You Zou1, Fang-Xiang Wu2, Yi-Pan3, Jianxin Wang1()
1. School of Computer Science and Engineering, Central south University, Changsha 410083, China
2. Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, S7N 5A9, Canada
3. Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
 全文: PDF(1527 KB)   HTML
Abstract

Background: Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.

Results: In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly.

Conclusions: This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.

Key wordsnext-generation sequencing    single-cell sequencing    single-molecule sequencing    de novo assembly algorithms
收稿日期: 2018-04-05      出版日期: 2019-05-30
Corresponding Author(s): Min Li,Jianxin Wang   
 引用本文:   
. [J]. Quantitative Biology, 2019, 7(2): 90-109.
Xingyu Liao, Min Li, You Zou, Fang-Xiang Wu, Yi-Pan, Jianxin Wang. Current challenges and solutions of de novo assembly. Quant. Biol., 2019, 7(2): 90-109.
 链接本文:  
https://academic.hep.com.cn/qb/CN/10.1007/s40484-019-0166-9
https://academic.hep.com.cn/qb/CN/Y2019/V7/I2/90
Platform Company Error rate (%) Read length (bp) No. of reads/run Time/run Cost/Gb
GS FLX 454 Life Sciences, Roche 1 200–1000 0.4–0.5 Gb ~23 h $9.5
SOLiD 5500xl Applied Biosystems 0.1 2×35–2×75 30–50 Gb ~10 d $70
Illumina HiSeq 2500 Solexa, Illumina 0.2 2×50–2×150 750–1500 Gb ~40 h $45
Illumina MiSeq Solexa, Illumina 0.2 2×50–2×300 7.5–13 Gb 21–56 h $110
PacBio RS Pacific biosciences 16 ~20 ×103 500 Mb–1 Gb ~4 h $1000
Nanopore MinION Oxford Nanopore 38 ~200 × 103 500 Mb–1.5 Gb ~50 h $750
Tab.1  
Fig.1  
Fig.2  
Basic framework Assembler Input Speed Memory N50
OLC graph PCAP SE/PE/Li/L + + +++
AMOS SE/PE/Li + + +++
Arachne SE/PE/Li + + +++
Celera SE/PE/Li/L + + +++
de Bruijn graph Velvet SE/PE/Li ++ ++ +
ALLPATHS SE/PE/Li + + +++
Abyss SE/PE/Li ++ +++ ++
SOAPdenovo2 SE/PE/Li +++ ++ ++
SparaseAssembler SE/PE/Li ++ +++ ++
JR-Assembler SE/PE/Li + + +++
MaSuRCA SE/PE/Li/L + + +++
EPGA PE + +++ ++
EPGA2 PE + + ++
SPAdes SE/PE/Li ++ +++ +++
IDBA-UD SE/PE/Li ++ +++ +++
Velevet-SC SE/PE/Li ++ +++ +++
ALLPATHS-LG PE/Li/L + + +++
String graph SGA SE/PE/Li + + +++
Readjoiner SE/PE/Li + + +++
FALCON L + +++ ++++
Tab.2  
Fig.3  
Fig.4  
Fig.5  
Fig.6  
Fig.7  
Fig.8  
Fig.9  
Fig.10  
Assembler Resource consumption
Genome Memory (GB) Time (day) Ref.
Abyss Human ~16 ~8 [18]
ALLPATHS-LG Human ~512 ~597 [19]
SGA Human ~56 ~1 [20]
SOAPdenovo2 Human ~35 ~4 [21]
Tab.3  
1 J. R. Miller, , S. Koren, and G. Sutton, (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327
https://doi.org/10.1016/j.ygeno.2010.03.001 pmid: 20211242
2 N. Nagarajan, and M. Pop, (2013) Sequence assembly demystified. Nat. Rev. Genet., 14, 157–167
https://doi.org/10.1038/nrg3367 pmid: 23358380
3 J. F. Denton, , J. Lugo-Martinez, , A. E. Tucker, , D. R. Schrider, , W. C. Warren, and M. W. Hahn, (2014) Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput. Biol., 10, e1003998
https://doi.org/10.1371/journal.pcbi.1003998 pmid: 25474019
4 S. R. Head, , H. K. Komori, , S. A. LaMere, , T. Whisenant, , F. Van Nieuwerburgh, , D. R. Salomon, and P. Ordoukhanian, (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques, 56, 61–64
https://doi.org/10.2144/000114133 pmid: 24502796
5 X. Yang, , S. P. Chockalingam, and S. Aluru, (2013) A survey of error-correction methods for next-generation sequencing. Brief. Bioinform., 14, 56–66
https://doi.org/10.1093/bib/bbs015 pmid: 22492192
6 D. R. Kelley, , M. C. Schatz, and S. L. Salzberg, (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
https://doi.org/10.1186/gb-2010-11-11-r116 pmid: 21114842
7 S. Koren, and A. M. Phillippy, (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120
https://doi.org/10.1016/j.mib.2014.11.014 pmid: 25461581
8 M. A. Madoui, , S. Engelen, , C. Cruaud, , C. Belser, , L. Bertrand, , A. Alberti, , A. Lemainque, , P. Wincker, and J. M. Aury, (2015) Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics, 16, 327
https://doi.org/10.1186/s12864-015-1519-z pmid: 25927464
9 D. Sims, , I. Sudbery, , N. E. Ilott, , A. Heger, and C. P. Ponting, (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet., 15, 121–132
https://doi.org/10.1038/nrg3642 pmid: 24434847
10 H. Chitsaz, , J. L. Yee-Greenbaum, , G. Tesler, , M. J. Lombardo, , C. L. Dupont, , J. H. Badger, , M. Novotny, , D. B. Rusch, , L. J. Fraser, , N. A. Gormley, , et al. (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921
https://doi.org/10.1038/nbt.1966 pmid: 21926975
11 S. Rodrigue, , R. R. Malmstrom, , A. M. Berlin, , B. W. Birren, , M. R. Henn, and S. W. Chisholm, (2009) Whole genome amplification and de novo assembly of single bacterial cells. PLoS One, 4, e6864
https://doi.org/10.1371/journal.pone.0006864 pmid: 19724646
12 X. Liao, , M. Li, , Y. Zou, , F. Wu, , Y. Pan, , F Luo, ., and J Wang, . (2018) Improving de novo assembly based on read classification. IEEE ACM T. Comput. Bi.
13 M. Margulies, , M. Egholm, , W. E. Altman, , S. Attiya, , J. S. Bader, , L. A. Bemben, , J. Berka, , M. S. Braverman, , Y. J. Chen, , Z. Chen, , et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380
https://doi.org/10.1038/nature03959 pmid: 16056220
14 H. H. Kazazian, Jr. (2004) Mobile elements: drivers of genome evolution. Science, 303, 1626–1632
https://doi.org/10.1126/science.1089670 pmid: 15016989
15 R. Cordaux, and M. A. Batzer, (2009) The impact of retrotransposons on human genome evolution. Nat. Rev. Genet., 10, 691–703
https://doi.org/10.1038/nrg2640 pmid: 19763152
16 S. Goodwin, , J. Gurtowski, , S. Ethe-Sayers, , P. Deshpande, , M. C. Schatz, and W. R. McCombie, (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
https://doi.org/10.1101/gr.191395.115 pmid: 26447147
17 S. Oikonomopoulos, , Y. C. Wang, , H. Djambazian, , D. Badescu, and J. Ragoussis, (2016) Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep., 6, 31602
https://doi.org/10.1038/srep31602 pmid: 27554526
18 J. T. Simpson, , K. Wong, , S. D. Jackman, , J. E. Schein, , S. J. Jones, and I. Birol, (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123
https://doi.org/10.1101/gr.089532.108 pmid: 19251739
19 S. Gnerre, , I. Maccallum, , D. Przybylski, , F. J. Ribeiro, , J. N. Burton, , B. J. Walker, , T. Sharpe, , G. Hall, , T. P. Shea, , S. Sykes, , et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA, 108, 1513–1518
https://doi.org/10.1073/pnas.1017351108 pmid: 21187386
20 J. T. Simpson, and R. Durbin, (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res., 22, 549–556
https://doi.org/10.1101/gr.126953.111 pmid: 22156294
21 R. Luo, , B. Liu, , Y. Xie, , Z. Li, , W. Huang, , J. Yuan, , G. He, , Y. Chen, , Q. Pan, , Y. Liu, , et al. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1, 18
https://doi.org/10.1186/2047-217X-1-18 pmid: 23587118
22 M. C. Schatz, , J. Witkowski, and W. R. McCombie, (2012) Current challenges in de novo plant genome sequencing and assembly. Genome Biol., 13, 243
https://doi.org/10.1186/gb-2012-13-4-243 pmid: 22546054
23 R. M. Idury, and M. S. Waterman, (1995) A new algorithm for DNA sequence assembly. J. Comput. Biol., 2, 291–306
https://doi.org/10.1089/cmb.1995.2.291 pmid: 7497130
24 P. E. C. Compeau, , P. A. Pevzner, and G. Tesler, (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
https://doi.org/10.1038/nbt.2023 pmid: 22068540
25 D. Hernandez, , P. François, , L. Farinelli, , M. Osterås, and J. Schrenzel, (2008) de novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res., 18, 802–809
https://doi.org/10.1101/gr.072033.107 pmid: 18332092
26 E. W. Myers, , G. G. Sutton, , A. L. Delcher, , I. M. Dew, , D. P. Fasulo, , M. J. Flanigan, , S. A. Kravitz, , C. M. Mobarry, , K. H. Reinert, , K. A. Remington, , et al. (2000) A whole-genome assembly of Drosophila. Science, 287, 2196–2204
https://doi.org/10.1126/science.287.5461.2196 pmid: 10731133
27 D. B. Jaffe, , J. Butler, , S. Gnerre, , E. Mauceli, , K. Lindblad-Toh, , J. P. Mesirov, , M. C. Zody, and E. S. Lander, (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res., 13, 91–96
https://doi.org/10.1101/gr.828403 pmid: 12529310
28 J. I. Sohn, and J. W. Nam, (2018) The present and future of de novo whole-genome assembly. Brief. Bioinformatics, 19, 23–40
pmid: 27742661
29 R. D. Mitra, and G. M. Church, (1999) In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res., 27, e34–e39
https://doi.org/10.1093/nar/27.24.e34 pmid: 10572186
30 H. P. J. Buermans, and J. T. den Dunnen, (2014) Next generation sequencing technology: advances and applications. Biochim. Biophys. Acta, 1842, 1932–1941
https://doi.org/10.1016/j.bbadis.2014.06.015 pmid: 24995601
31 M. L. Metzker, (2010) Sequencing technologies–the next generation. Nat. Rev. Genet., 11, 31–46
https://doi.org/10.1038/nrg2626 pmid: 19997069
32 D. Laehnemann, , A. Borkhardt, and A. C. McHardy, (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief. Bioinform., 17, 154–179
https://doi.org/10.1093/bib/bbv029 pmid: 26026159
33 M. Schirmer, , U. Z. Ijaz, , R. D’Amore, , N. Hall, , W. T. Sloan, and C. Quince, (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res., 43, e37–e37
https://doi.org/10.1093/nar/gku1341 pmid: 25586220
34 E. L. van Dijk, , H. Auger, , Y. Jaszczyszyn, and C. Thermes, (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426
https://doi.org/10.1016/j.tig.2014.07.001 pmid: 25108476
35 K. K. Mestan, , L. Ilkhanoff, , S. Mouli, and S. Lin, (2011) Genomic sequencing in clinical trials. J. Transl. Med., 9, 222
https://doi.org/10.1186/1479-5876-9-222 pmid: 22206293
36 S. Goodwin, , J. D. McPherson, and W. R. McCombie, (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351
https://doi.org/10.1038/nrg.2016.49 pmid: 27184599
37 M. A. Quail, , M. Smith, , P. Coupland, , T. D. Otto, , S. R. Harris, , T. R. Connor, , A. Bertoni, , H. P. Swerdlow, and Y. Gu, (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341
https://doi.org/10.1186/1471-2164-13-341 pmid: 22827831
38 S. C. Schuster, (2008) Next-generation sequencing transforms today’s biology. Nat. Methods, 5, 16–18
https://doi.org/10.1038/nmeth1156 pmid: 18165802
39 R. K. Patel, and M. Jain, (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One, 7, e30619
https://doi.org/10.1371/journal.pone.0030619 pmid: 22312429
40 L. Liu, , Y. Li, , S. Li, , N. Hu, , Y. He, , R. Pong, , D. Lin, , L Lu, . and M Law, . (2012) Comparison of next-generation sequencing systems. J. Biomed. Biotechnol., Article ID 251364
41 L. Liu, , N. Hu, , B. Wang, , C. Min, , W. Juan, , Z. Tian, , H Yi, . and L Dan, . (2011). A brief utilization report on the Illumina HiSeq 2000 sequencer. Mycology, 2, 169–191
42 S. A. Simon, , J. Zhai, , R. S. Nandety, , K. P. McCormick, , J. Zeng, , D. Mejia, and B. C. Meyers, (2009) Short-read sequencing technologies for transcriptional analyses. Annu. Rev. Plant Biol., 60, 305–333
https://doi.org/10.1146/annurev.arplant.043008.092032 pmid: 19575585
43 M. Kircher, and J. Kelso, (2010) High-throughput DNA sequencing–concepts and limitations. BioEssays, 32, 524–536
https://doi.org/10.1002/bies.200900181 pmid: 20486139
44 D. G. Hert, , C. P. Fredlake, and A. E. Barron, (2008) Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresis and non-electrophoresis methods. Electrophoresis, 29, 4618–4626
https://doi.org/10.1002/elps.200800456 pmid: 19053153
45 J. Henson, , G. Tischler, and Z. Ning, (2012) Next-generation sequencing and large genome assemblies. Pharmacogenomics, 13, 901–915
https://doi.org/10.2217/pgs.12.72 pmid: 22676195
46 A. Rhoads, and K. F. Au, (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics, 13, 278–289
https://doi.org/10.1016/j.gpb.2015.08.002 pmid: 26542840
47 R. Logares, , T. H. A. Haverkamp, , S. Kumar, , A. Lanzén, , A. J. Nederbragt, , C. Quince, and H. Kauserud, (2012) Environmental microbiology through the lens of high-throughput DNA sequencing: synopsis of current platforms and bioinformatics approaches. J. Microbiol. Methods, 91, 106–113
https://doi.org/10.1016/j.mimet.2012.07.017 pmid: 22849829
48 T. J. Treangen, and S. L. Salzberg, (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46
https://doi.org/10.1038/nrg3117 pmid: 22124482
49 J. M. Heather, and B. Chain, (2016) The sequence of sequencers: The history of sequencing DNA. Genomics, 107, 1–8
https://doi.org/10.1016/j.ygeno.2015.11.003 pmid: 26554401
50 C. S. Chin, , D. H. Alexander, , P. Marks, , A. A. Klammer, , J. Drake, , C. Heiner, , A. Clum, , A. Copeland, , J. Huddleston, , E. E. Eichler, , et al. (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods, 10, 563–569
https://doi.org/10.1038/nmeth.2474 pmid: 23644548
51 M. Ferrarini, , M. Moretto, , J. A. Ward, , N. Šurbanovski, , V. Stevanović, , L. Giongo, , R. Viola, , D. Cavalieri, , R. Velasco, , A. Cestaro, , et al. (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics, 14, 670
https://doi.org/10.1186/1471-2164-14-670 pmid: 24083400
52 S. Goodwin, , J. Gurtowski, , S. Ethe-Sayers, , P. Deshpande, , M. C. Schatz, and W. R. McCombie, (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
https://doi.org/10.1101/gr.191395.115 pmid: 26447147
53 T. Laver, , J. Harrison, , P. A. O’Neill, , K. Moore, , A. Farbos, , K. Paszkiewicz, and D. J. Studholme, (2015) Assessing the performance of the Oxford Nanopore technologies minion. Biomol Detect. Quantif., 3, 1–8
https://doi.org/10.1016/j.bdq.2015.02.001 pmid: 26753127
54 W. Turner, (1890) The cell theory, past and present. J. Anat. Physiol., 24(Pt 2), 253–287
55 C. Gawad, , W. Koh, and S. R. Quake, (2016) Single-cell genome sequencing: current state of the science. Nat. Rev. Genet., 17, 175–188
https://doi.org/10.1038/nrg.2015.16 pmid: 26806412
56 H. Chitsaz, , J. L. Yee-Greenbaum, , G. Tesler, , M. J. Lombardo, , C. L. Dupont, , J. H. Badger, , M. Novotny, , D. B. Rusch, , L. J. Fraser, , N. A. Gormley, , et al. (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921
https://doi.org/10.1038/nbt.1966 pmid: 21926975
57 S. Batzoglou, , D. B. Jaffe, , K. Stanley, , J. Butler, , S. Gnerre, , E. Mauceli, , B. Berger, , J. P. Mesirov, and E. S. Lander, (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res., 12, 177–189
https://doi.org/10.1101/gr.208902 pmid: 11779843
58 P. E. C. Compeau, , P. A. Pevzner, and G. Tesler, (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
https://doi.org/10.1038/nbt.2023 pmid: 22068540
59 Z. Li, , Y. Chen, , D. Mu, , J. Yuan, , Y. Shi, , H. Zhang, , J. Gan, , N. Li, , X. Hu, , B. Liu, , et al. (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief. Funct. Genomics, 11, 25–37
https://doi.org/10.1093/bfgp/elr035 pmid: 22184334
60 M. J. P. Chaisson, , R. K. Wilson, and E. E. Eichler, (2015) Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet., 16, 627–640
https://doi.org/10.1038/nrg3933 pmid: 26442640
61 X. Huang, , J. Wang, , S. Aluru, , S. P. Yang, and L. Hillier, (2003) PCAP: a whole-genome assembly program. Genome Res., 13, 2164–2170
https://doi.org/10.1101/gr.1390403 pmid: 12952883
62 T. J. Treangen, , D. D. Sommer, , F. E. Angly, , S Koren, . and M Pop, . (2011) Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics. 33, 11.8. 1–11.8. 18
63 J. Luo, , J. Wang, , Z. Zhang, , F. X. Wu, , M. Li, and Y. Pan, (2015) EPGA: de novo assembly using the distributions of reads and insert size. Bioinformatics, 31, 825–833
https://doi.org/10.1093/bioinformatics/btu762 pmid: 25406329
64 T. C. Conway, and A. J. Bromage, (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486
https://doi.org/10.1093/bioinformatics/btq697 pmid: 21245053
65 P. Pevzner, (2000) Computational Molecular Biology: An Algorithmic Approach. Cambridge: MIT press
66 P. A. Pevzner, , H. Tang, and M. S. Waterman, (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753
https://doi.org/10.1073/pnas.171285098 pmid: 11504945
67 D. R. Zerbino, and E. Birney, (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829
https://doi.org/10.1101/gr.074492.107 pmid: 18349386
68 A. Bankevich, , S. Nurk, , D. Antipov, , A. A. Gurevich, , M. Dvorkin, , A. S. Kulikov, , V. M. Lesin, , S. I. Nikolenko, , S. Pham, , A. D. Prjibelski, , et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477
https://doi.org/10.1089/cmb.2012.0021 pmid: 22506599
69 Y. Peng, , H. C. M. Leung, , S. M. Yiu, and F. Y. Chin, (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420–1428
https://doi.org/10.1093/bioinformatics/bts174 pmid: 22495754
70 J. Luo, , J. Wang, , W. Li, , Z. Zhang, , F. X. Wu, , M. Li, and Y. Pan, (2015) EPGA2: memory-efficient de novo assembler. Bioinformatics, 31, 3988–3990
pmid: 26315905
71 A. V. Zimin, , G. Marçais, , D. Puiu, , M. Roberts, , S. L. Salzberg, and J. A. Yorke, (2013) The MaSuRCA genome assembler. Bioinformatics, 29, 2669–2677
https://doi.org/10.1093/bioinformatics/btt476 pmid: 23990416
72 J. Butler, , I. MacCallum, , M. Kleber, , I. A. Shlyakhter, , M. K. Belmonte, , E. S. Lander, , C. Nusbaum, and D. B. Jaffe, (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18, 810–820
https://doi.org/10.1101/gr.7337908 pmid: 18340039
73 H. Li, and R. Durbin, (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760
https://doi.org/10.1093/bioinformatics/btp324 pmid: 19451168
74 J. T. Simpson, and R. Durbin, (2010) Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26, i367–i373
https://doi.org/10.1093/bioinformatics/btq217 pmid: 20529929
75 S. Koren, and A. M. Phillippy, (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120
https://doi.org/10.1016/j.mib.2014.11.014 pmid: 25461581
76 C. L. Xiao, , Y. Chen, , S. Q. Xie, , K-N Chen, , Y. Wang, , F. Luo, , and Z. Xie, (2016) MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads. bioRxiv, 089250
77 Y. Heo, , X. L. Wu, , D. Chen, , J. Ma, and W. M. Hwu, (2014) BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics, 30, 1354–1362
https://doi.org/10.1093/bioinformatics/btu030 pmid: 24451628
78 X. Li, and M. S. Waterman, (2003) Estimating the repeat structure and length of DNA sequences using L-tuples. Genome Res., 13, 1916–1922
pmid: 12902383
79 D. R. Kelley, , M. C. Schatz, and S. L. Salzberg, (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
https://doi.org/10.1186/gb-2010-11-11-r116 pmid: 21114842
80 X. Yang, , K. S. Dorman, and S. Aluru, (2010) Reptile: representative tiling for short read error correction. Bioinformatics, 26, 2526–2533
https://doi.org/10.1093/bioinformatics/btq468 pmid: 20834037
81 R. Li, , H. Zhu, , J. Ruan, , W. Qian, , X. Fang, , Z. Shi, , Y. Li, , S. Li, , G. Shan, , K. Kristiansen, , et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20, 265–272
https://doi.org/10.1101/gr.097261.109 pmid: 20019144
82 X. Zhao, , L. E. Palmer, , R. Bolanos, , C. Mircean, , D. Fasulo, and G. M. Wittenberg, (2010) EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol., 17, 1549–1560
https://doi.org/10.1089/cmb.2010.0127 pmid: 20973743
83 L. Salmela, and J. Schröder, (2011) Correcting errors in short reads by multiple alignments. Bioinformatics, 27, 1455–1461
https://doi.org/10.1093/bioinformatics/btr170 pmid: 21471014
84 J. D. Thompson, , J. C. Thierry, and O. Poch, (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics, 19, 1155–1161
https://doi.org/10.1093/bioinformatics/btg133 pmid: 12801878
85 T. Lassmann, and E. L. L. Sonnhammer, (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298
https://doi.org/10.1186/1471-2105-6-298 pmid: 16343337
86 A. Allam, , P. Kalnis, and V. Solovyev, (2015) Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics, 31, 3421–3428
https://doi.org/10.1093/bioinformatics/btv415 pmid: 26177965
87 L. Salmela, and E. Rivals, (2014) LoRDEC: accurate and efficient long read error correction. Bioinformatics, 30, 3506–3514
https://doi.org/10.1093/bioinformatics/btu538 pmid: 25165095
88 H. Li, and R. Durbin, (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760
https://doi.org/10.1093/bioinformatics/btp324 pmid: 19451168
89 B. Langmead, , C. Trapnell, , M. Pop, and S. L. Salzberg, (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 10, R25
https://doi.org/10.1186/gb-2009-10-3-r25 pmid: 19261174
90 S. Kurtz, , A. Phillippy, , A. L. Delcher, , M. Smoot, , M. Shumway, , C. Antonescu, and S. L. Salzberg, (2004) Versatile and open software for comparing large genomes. Genome Biol., 5, R12
https://doi.org/10.1186/gb-2004-5-2-r12 pmid: 14759262
91 Z. Ning, , A. J. Cox, and J. C. Mullikin, (2001) SSAHA: a fast search method for large DNA databases. Genome Res., 11, 1725–1729
https://doi.org/10.1101/gr.194201 pmid: 11591649
92 K. Berlin, , S. Koren, , C. S. Chin, , J. P. Drake, , J. M. Landolin, and A. M. Phillippy, (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630
https://doi.org/10.1038/nbt.3238 pmid: 26006009
93 H. Li, (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32, 2103–2110
https://doi.org/10.1093/bioinformatics/btw152 pmid: 27153593
94 P. Medvedev, , E. Scott, , B. Kakaradov, and P. Pevzner, (2011) Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 27, i137–i141
https://doi.org/10.1093/bioinformatics/btr208 pmid: 21685062
95 C. B. Do, , M. S. P. Mahabhashyam, , M. Brudno, and S. Batzoglou, (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res., 15, 330–340
https://doi.org/10.1101/gr.2821705 pmid: 15687296
96 S. I. Nikolenko, , A. I. Korobeynikov, and M. A. Alekseyev, (2013) BayesHammer: Bayesian clustering for error correction in single-cell sequencing, BMC genomics. BioMed Central, 2013, S7
https://doi.org/10.1186/1471-2164-14-S1-S7
97 W. C. Kao, , A. H. Chan, and Y. S. Song, (2011) ECHO: a reference-free short-read error correction algorithm. Genome Res., 21, 1181–1192
https://doi.org/10.1101/gr.111351.110 pmid: 21482625
98 M. J. Chaisson, and P. A. Pevzner, (2008) Short read fragment assembly of bacterial genomes. Genome Res., 18, 324–330
https://doi.org/10.1101/gr.7088808 pmid: 18083777
99 M. Li, , Z. Liao, , Y. He, , J. Wang, , J. Luo, and Y. Pan, (2017) ISEA: iterative seed-extension algorithm for de novo assembly using paired-end information and insert size distribution. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 14, 916–925
https://doi.org/10.1109/TCBB.2016.2550433 pmid: 27076460
100 J. Luo, , J. Wang, , Z. Zhang, , M. Li, and F. X. Wu, (2017) BOSS: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics, 33, 169–176
https://doi.org/10.1093/bioinformatics/btw597 pmid: 27634951
101 M. Li, , L. Tang, , F. X. Wu, , Y. Pan, and J. Wang, (2018) SCOP: a novel scaffolding algorithm based on contig classification and optimization. Bioinformatics, doi: 10.1093/bioinformatics/bty773
pmid: 30184046
102 J. Huddleston, , S. Ranade, , M. Malig, , F. Antonacci, , M. Chaisson, , L. Hon, , P. H. Sudmant, , T. A. Graves, , C. Alkan, , M. Y. Dennis, , et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696
https://doi.org/10.1101/gr.168450.113 pmid: 24418700
103 Y. Mostovoy, , M. Levy-Sakin, , J. Lam, , E. T. Lam, , A. R. Hastie, , P. Marks, , J. Lee, , C. Chu, , C. Lin, , Ž. Džakula, , et al. (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods, 13, 587–590
https://doi.org/10.1038/nmeth.3865 pmid: 27159086
104 M. J. Chaisson, and G. Tesler, (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics, 13, 238
https://doi.org/10.1186/1471-2105-13-238 pmid: 22988817
105 M. Boetzer, and W. Pirovano, (2014) SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics, 15, 211
https://doi.org/10.1186/1471-2105-15-211 pmid: 24950923
106 K. K. Lam, , K. LaButti, , A. Khalak, and D. Tse, (2015) FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads. Bioinformatics, 31, 3207–3209
https://doi.org/10.1093/bioinformatics/btv280 pmid: 26040454
107 C. Ye, , C. M. Hill, , S. Wu, , J. Ruan, and Z. S. Ma, (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep., 6, 31900
https://doi.org/10.1038/srep31900 pmid: 27573208
108 M. D. Muggli, , S. J. Puglisi, , R. Ronen, and C. Boucher, (2015) Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics, 31, i80–i88
https://doi.org/10.1093/bioinformatics/btv262 pmid: 26072512
109 B. Wu, , M. Li, , X. Liao, , J. Luo, , F. Wu, , Y. Pan, and J. Wang, (2018) MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1
https://doi.org/10.1109/TCBB.2018.2876855 pmid: 30334805
110 M. Li, , B. Wu, , X. Yan, , J. Luo, , Y. Pan, , F. X. Wu, and J. Wang, (2017) PECC: Correcting contigs based on paired-end read distribution. Comput. Biol. Chem., 69, 178–184
https://doi.org/10.1016/j.compbiolchem.2017.03.012 pmid: 28545961
111 S. Boisvert, , F. Raymond, , E. Godzaridis, , F. Laviolette, and J. Corbeil, (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol., 13, R122
https://doi.org/10.1186/gb-2012-13-12-r122 pmid: 23259615
112 M. C. Schatz, , D. Sommer, , D. Kelley, and M. Pop, (2010) De novo assembly of large genomes using cloud computing. In Proceedings of the Cold Spring Harbor Biology of Genomes Conference
113 Y. J. Chang, , C. C. Chen, , J. M. Ho, and C. –L. Chen, (2012) De novo assembly of high-throughput sequencing data with cloud computing and new operations on string graphs. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference. pp. 155–161
114 X. Guo, , N. Yu, , X. Ding, , J. Wang, and Y. Pan, (2015) DIME: a novel framework for de novo metagenomic sequence assembly. J. Comput. Biol., 22, 159–177
https://doi.org/10.1089/cmb.2014.0251 pmid: 25684202
115 R. J. Roberts, , M. O. Carneiro, and M. C. Schatz, (2013) The advantages of SMRT sequencing. Genome Biol., 14, 405
https://doi.org/10.1186/gb-2013-14-6-405 pmid: 23822731
116 T. R. Sharma, , B. N. Devanna, , K. Kiran, , P. K. Singh, , K. Arora, , P. Jain, , I. M. Tiwari, , H. Dubey, , B. Saklani, , M. Kumari, , et al. (2018) Status and prospects of next generation sequencing technologies in crop plants. Curr. Issues Mol. Biol., 27, 1–36
https://doi.org/10.21775/cimb.027.001 pmid: 28885172
117 H. Lee, , J. Gurtowski, , S. Yoo, , s. Marcus, , W McCombie, , and M. Schatz, (2014) Error correction and assembly complexity of single molecule sequencing reads. bioRxiv, 006395
118 A. Bashir, , A. Klammer, , W. P. Robins, , C. S. Chin, , D. Webster, , E. Paxinos, , D. Hsu, , M. Ashby, , S. Wang, , P. Peluso, , et al. (2012) A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol., 30, 701–707
https://doi.org/10.1038/nbt.2288 pmid: 22750883
119 R. L. Warren, , C. Yang, , B. P. Vandervalk, , B. Behsaz, , A. Lagman, , S. J. Jones, and I. Birol, (2015) LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience, 4, 35
https://doi.org/10.1186/s13742-015-0076-3 pmid: 26244089
120 S. Gao, , D. Bertrand, , B. K. H. Chia, and N. Nagarajan, (2016) OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biol., 17, 102
https://doi.org/10.1186/s13059-016-0951-y pmid: 27169502
121 D. Antipov, , A. Korobeynikov, , J. S. McLean, and P. A. Pevzner, (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics, 32, 1009–1015
https://doi.org/10.1093/bioinformatics/btv688 pmid: 26589280
122 J. Huddleston, , S. Ranade, , M. Malig, , F. Antonacci, , M. Chaisson, , L. Hon, , P. H. Sudmant, , T. A. Graves, , C. Alkan, , M. Y. Dennis, , et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696
https://doi.org/10.1101/gr.168450.113 pmid: 24418700
123 J. Luo, , J. Wang, , J. Shang, , H. Luo, , M. Li, , F. Wu, and Y. Pan, (2018) GapReduce: a gap filling algorithm based on partitioned read sets. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1
https://doi.org/10.1109/TCBB.2018.2789909 pmid: 29993951
124 M. Boetzer, and W. Pirovano, (2012) Toward almost closed genomes with GapFiller. Genome Biol., 13, R56
https://doi.org/10.1186/gb-2012-13-6-r56 pmid: 22731987
125 D. Paulino, , R. L. Warren, , B. P. Vandervalk, , A. Raymond, , S. D. Jackman, and I. Birol, (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16, 230
https://doi.org/10.1186/s12859-015-0663-4 pmid: 26209068
126 S. Kosugi, , H. Hirakawa, and S. Tabata, (2015) GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics, 31, 3733–3741
https://doi.org/10.1093/bioinformatics/btv465 pmid: 26261222
127 A. C. English, , S. Richards, , Y. Han, , M. Wang, , V. Vee, , J. Qu, , X. Qin, , D. M. Muzny, , J. G. Reid, , K. C. Worley, , et al. (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One, 7, e47768
https://doi.org/10.1371/journal.pone.0047768 pmid: 23185243
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed