|
|
|
Current challenges and solutions of de novo assembly |
Xingyu Liao1, Min Li1( ), You Zou1, Fang-Xiang Wu2, Yi-Pan3, Jianxin Wang1( ) |
1. School of Computer Science and Engineering, Central south University, Changsha 410083, China 2. Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, S7N 5A9, Canada 3. Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA |
|
|
Abstract: Background: Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings. Results: In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly. Conclusions: This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type. |
Key words:
next-generation sequencing
single-cell sequencing
single-molecule sequencing
de novo assembly algorithms
|
收稿日期: 2018-04-05
出版日期: 2019-05-30
|
Corresponding Author(s):
Min Li,Jianxin Wang
|
1 |
J. R. Miller, , S. Koren, and G. Sutton, (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327
https://doi.org/10.1016/j.ygeno.2010.03.001
pmid: 20211242
|
2 |
N. Nagarajan, and M. Pop, (2013) Sequence assembly demystified. Nat. Rev. Genet., 14, 157–167
https://doi.org/10.1038/nrg3367
pmid: 23358380
|
3 |
J. F. Denton, , J. Lugo-Martinez, , A. E. Tucker, , D. R. Schrider, , W. C. Warren, and M. W. Hahn, (2014) Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput. Biol., 10, e1003998
https://doi.org/10.1371/journal.pcbi.1003998
pmid: 25474019
|
4 |
S. R. Head, , H. K. Komori, , S. A. LaMere, , T. Whisenant, , F. Van Nieuwerburgh, , D. R. Salomon, and P. Ordoukhanian, (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques, 56, 61–64
https://doi.org/10.2144/000114133
pmid: 24502796
|
5 |
X. Yang, , S. P. Chockalingam, and S. Aluru, (2013) A survey of error-correction methods for next-generation sequencing. Brief. Bioinform., 14, 56–66
https://doi.org/10.1093/bib/bbs015
pmid: 22492192
|
6 |
D. R. Kelley, , M. C. Schatz, and S. L. Salzberg, (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
https://doi.org/10.1186/gb-2010-11-11-r116
pmid: 21114842
|
7 |
S. Koren, and A. M. Phillippy, (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120
https://doi.org/10.1016/j.mib.2014.11.014
pmid: 25461581
|
8 |
M. A. Madoui, , S. Engelen, , C. Cruaud, , C. Belser, , L. Bertrand, , A. Alberti, , A. Lemainque, , P. Wincker, and J. M. Aury, (2015) Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics, 16, 327
https://doi.org/10.1186/s12864-015-1519-z
pmid: 25927464
|
9 |
D. Sims, , I. Sudbery, , N. E. Ilott, , A. Heger, and C. P. Ponting, (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet., 15, 121–132
https://doi.org/10.1038/nrg3642
pmid: 24434847
|
10 |
H. Chitsaz, , J. L. Yee-Greenbaum, , G. Tesler, , M. J. Lombardo, , C. L. Dupont, , J. H. Badger, , M. Novotny, , D. B. Rusch, , L. J. Fraser, , N. A. Gormley, , et al. (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921
https://doi.org/10.1038/nbt.1966
pmid: 21926975
|
11 |
S. Rodrigue, , R. R. Malmstrom, , A. M. Berlin, , B. W. Birren, , M. R. Henn, and S. W. Chisholm, (2009) Whole genome amplification and de novo assembly of single bacterial cells. PLoS One, 4, e6864
https://doi.org/10.1371/journal.pone.0006864
pmid: 19724646
|
12 |
X. Liao, , M. Li, , Y. Zou, , F. Wu, , Y. Pan, , F Luo, ., and J Wang, . (2018) Improving de novo assembly based on read classification. IEEE ACM T. Comput. Bi.
|
13 |
M. Margulies, , M. Egholm, , W. E. Altman, , S. Attiya, , J. S. Bader, , L. A. Bemben, , J. Berka, , M. S. Braverman, , Y. J. Chen, , Z. Chen, , et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380
https://doi.org/10.1038/nature03959
pmid: 16056220
|
14 |
H. H. Kazazian, Jr. (2004) Mobile elements: drivers of genome evolution. Science, 303, 1626–1632
https://doi.org/10.1126/science.1089670
pmid: 15016989
|
15 |
R. Cordaux, and M. A. Batzer, (2009) The impact of retrotransposons on human genome evolution. Nat. Rev. Genet., 10, 691–703
https://doi.org/10.1038/nrg2640
pmid: 19763152
|
16 |
S. Goodwin, , J. Gurtowski, , S. Ethe-Sayers, , P. Deshpande, , M. C. Schatz, and W. R. McCombie, (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
https://doi.org/10.1101/gr.191395.115
pmid: 26447147
|
17 |
S. Oikonomopoulos, , Y. C. Wang, , H. Djambazian, , D. Badescu, and J. Ragoussis, (2016) Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep., 6, 31602
https://doi.org/10.1038/srep31602
pmid: 27554526
|
18 |
J. T. Simpson, , K. Wong, , S. D. Jackman, , J. E. Schein, , S. J. Jones, and I. Birol, (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123
https://doi.org/10.1101/gr.089532.108
pmid: 19251739
|
19 |
S. Gnerre, , I. Maccallum, , D. Przybylski, , F. J. Ribeiro, , J. N. Burton, , B. J. Walker, , T. Sharpe, , G. Hall, , T. P. Shea, , S. Sykes, , et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA, 108, 1513–1518
https://doi.org/10.1073/pnas.1017351108
pmid: 21187386
|
20 |
J. T. Simpson, and R. Durbin, (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res., 22, 549–556
https://doi.org/10.1101/gr.126953.111
pmid: 22156294
|
21 |
R. Luo, , B. Liu, , Y. Xie, , Z. Li, , W. Huang, , J. Yuan, , G. He, , Y. Chen, , Q. Pan, , Y. Liu, , et al. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1, 18
https://doi.org/10.1186/2047-217X-1-18
pmid: 23587118
|
22 |
M. C. Schatz, , J. Witkowski, and W. R. McCombie, (2012) Current challenges in de novo plant genome sequencing and assembly. Genome Biol., 13, 243
https://doi.org/10.1186/gb-2012-13-4-243
pmid: 22546054
|
23 |
R. M. Idury, and M. S. Waterman, (1995) A new algorithm for DNA sequence assembly. J. Comput. Biol., 2, 291–306
https://doi.org/10.1089/cmb.1995.2.291
pmid: 7497130
|
24 |
P. E. C. Compeau, , P. A. Pevzner, and G. Tesler, (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
https://doi.org/10.1038/nbt.2023
pmid: 22068540
|
25 |
D. Hernandez, , P. François, , L. Farinelli, , M. Osterås, and J. Schrenzel, (2008) de novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res., 18, 802–809
https://doi.org/10.1101/gr.072033.107
pmid: 18332092
|
26 |
E. W. Myers, , G. G. Sutton, , A. L. Delcher, , I. M. Dew, , D. P. Fasulo, , M. J. Flanigan, , S. A. Kravitz, , C. M. Mobarry, , K. H. Reinert, , K. A. Remington, , et al. (2000) A whole-genome assembly of Drosophila. Science, 287, 2196–2204
https://doi.org/10.1126/science.287.5461.2196
pmid: 10731133
|
27 |
D. B. Jaffe, , J. Butler, , S. Gnerre, , E. Mauceli, , K. Lindblad-Toh, , J. P. Mesirov, , M. C. Zody, and E. S. Lander, (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res., 13, 91–96
https://doi.org/10.1101/gr.828403
pmid: 12529310
|
28 |
J. I. Sohn, and J. W. Nam, (2018) The present and future of de novo whole-genome assembly. Brief. Bioinformatics, 19, 23–40
pmid: 27742661
|
29 |
R. D. Mitra, and G. M. Church, (1999) In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res., 27, e34–e39
https://doi.org/10.1093/nar/27.24.e34
pmid: 10572186
|
30 |
H. P. J. Buermans, and J. T. den Dunnen, (2014) Next generation sequencing technology: advances and applications. Biochim. Biophys. Acta, 1842, 1932–1941
https://doi.org/10.1016/j.bbadis.2014.06.015
pmid: 24995601
|
31 |
M. L. Metzker, (2010) Sequencing technologies–the next generation. Nat. Rev. Genet., 11, 31–46
https://doi.org/10.1038/nrg2626
pmid: 19997069
|
32 |
D. Laehnemann, , A. Borkhardt, and A. C. McHardy, (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief. Bioinform., 17, 154–179
https://doi.org/10.1093/bib/bbv029
pmid: 26026159
|
33 |
M. Schirmer, , U. Z. Ijaz, , R. D’Amore, , N. Hall, , W. T. Sloan, and C. Quince, (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res., 43, e37–e37
https://doi.org/10.1093/nar/gku1341
pmid: 25586220
|
34 |
E. L. van Dijk, , H. Auger, , Y. Jaszczyszyn, and C. Thermes, (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426
https://doi.org/10.1016/j.tig.2014.07.001
pmid: 25108476
|
35 |
K. K. Mestan, , L. Ilkhanoff, , S. Mouli, and S. Lin, (2011) Genomic sequencing in clinical trials. J. Transl. Med., 9, 222
https://doi.org/10.1186/1479-5876-9-222
pmid: 22206293
|
36 |
S. Goodwin, , J. D. McPherson, and W. R. McCombie, (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351
https://doi.org/10.1038/nrg.2016.49
pmid: 27184599
|
37 |
M. A. Quail, , M. Smith, , P. Coupland, , T. D. Otto, , S. R. Harris, , T. R. Connor, , A. Bertoni, , H. P. Swerdlow, and Y. Gu, (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341
https://doi.org/10.1186/1471-2164-13-341
pmid: 22827831
|
38 |
S. C. Schuster, (2008) Next-generation sequencing transforms today’s biology. Nat. Methods, 5, 16–18
https://doi.org/10.1038/nmeth1156
pmid: 18165802
|
39 |
R. K. Patel, and M. Jain, (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One, 7, e30619
https://doi.org/10.1371/journal.pone.0030619
pmid: 22312429
|
40 |
L. Liu, , Y. Li, , S. Li, , N. Hu, , Y. He, , R. Pong, , D. Lin, , L Lu, . and M Law, . (2012) Comparison of next-generation sequencing systems. J. Biomed. Biotechnol., Article ID 251364
|
41 |
L. Liu, , N. Hu, , B. Wang, , C. Min, , W. Juan, , Z. Tian, , H Yi, . and L Dan, . (2011). A brief utilization report on the Illumina HiSeq 2000 sequencer. Mycology, 2, 169–191
|
42 |
S. A. Simon, , J. Zhai, , R. S. Nandety, , K. P. McCormick, , J. Zeng, , D. Mejia, and B. C. Meyers, (2009) Short-read sequencing technologies for transcriptional analyses. Annu. Rev. Plant Biol., 60, 305–333
https://doi.org/10.1146/annurev.arplant.043008.092032
pmid: 19575585
|
43 |
M. Kircher, and J. Kelso, (2010) High-throughput DNA sequencing–concepts and limitations. BioEssays, 32, 524–536
https://doi.org/10.1002/bies.200900181
pmid: 20486139
|
44 |
D. G. Hert, , C. P. Fredlake, and A. E. Barron, (2008) Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresis and non-electrophoresis methods. Electrophoresis, 29, 4618–4626
https://doi.org/10.1002/elps.200800456
pmid: 19053153
|
45 |
J. Henson, , G. Tischler, and Z. Ning, (2012) Next-generation sequencing and large genome assemblies. Pharmacogenomics, 13, 901–915
https://doi.org/10.2217/pgs.12.72
pmid: 22676195
|
46 |
A. Rhoads, and K. F. Au, (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics, 13, 278–289
https://doi.org/10.1016/j.gpb.2015.08.002
pmid: 26542840
|
47 |
R. Logares, , T. H. A. Haverkamp, , S. Kumar, , A. Lanzén, , A. J. Nederbragt, , C. Quince, and H. Kauserud, (2012) Environmental microbiology through the lens of high-throughput DNA sequencing: synopsis of current platforms and bioinformatics approaches. J. Microbiol. Methods, 91, 106–113
https://doi.org/10.1016/j.mimet.2012.07.017
pmid: 22849829
|
48 |
T. J. Treangen, and S. L. Salzberg, (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46
https://doi.org/10.1038/nrg3117
pmid: 22124482
|
49 |
J. M. Heather, and B. Chain, (2016) The sequence of sequencers: The history of sequencing DNA. Genomics, 107, 1–8
https://doi.org/10.1016/j.ygeno.2015.11.003
pmid: 26554401
|
50 |
C. S. Chin, , D. H. Alexander, , P. Marks, , A. A. Klammer, , J. Drake, , C. Heiner, , A. Clum, , A. Copeland, , J. Huddleston, , E. E. Eichler, , et al. (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods, 10, 563–569
https://doi.org/10.1038/nmeth.2474
pmid: 23644548
|
51 |
M. Ferrarini, , M. Moretto, , J. A. Ward, , N. Šurbanovski, , V. Stevanović, , L. Giongo, , R. Viola, , D. Cavalieri, , R. Velasco, , A. Cestaro, , et al. (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics, 14, 670
https://doi.org/10.1186/1471-2164-14-670
pmid: 24083400
|
52 |
S. Goodwin, , J. Gurtowski, , S. Ethe-Sayers, , P. Deshpande, , M. C. Schatz, and W. R. McCombie, (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
https://doi.org/10.1101/gr.191395.115
pmid: 26447147
|
53 |
T. Laver, , J. Harrison, , P. A. O’Neill, , K. Moore, , A. Farbos, , K. Paszkiewicz, and D. J. Studholme, (2015) Assessing the performance of the Oxford Nanopore technologies minion. Biomol Detect. Quantif., 3, 1–8
https://doi.org/10.1016/j.bdq.2015.02.001
pmid: 26753127
|
54 |
W. Turner, (1890) The cell theory, past and present. J. Anat. Physiol., 24(Pt 2), 253–287
|
55 |
C. Gawad, , W. Koh, and S. R. Quake, (2016) Single-cell genome sequencing: current state of the science. Nat. Rev. Genet., 17, 175–188
https://doi.org/10.1038/nrg.2015.16
pmid: 26806412
|
56 |
H. Chitsaz, , J. L. Yee-Greenbaum, , G. Tesler, , M. J. Lombardo, , C. L. Dupont, , J. H. Badger, , M. Novotny, , D. B. Rusch, , L. J. Fraser, , N. A. Gormley, , et al. (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921
https://doi.org/10.1038/nbt.1966
pmid: 21926975
|
57 |
S. Batzoglou, , D. B. Jaffe, , K. Stanley, , J. Butler, , S. Gnerre, , E. Mauceli, , B. Berger, , J. P. Mesirov, and E. S. Lander, (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res., 12, 177–189
https://doi.org/10.1101/gr.208902
pmid: 11779843
|
58 |
P. E. C. Compeau, , P. A. Pevzner, and G. Tesler, (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
https://doi.org/10.1038/nbt.2023
pmid: 22068540
|
59 |
Z. Li, , Y. Chen, , D. Mu, , J. Yuan, , Y. Shi, , H. Zhang, , J. Gan, , N. Li, , X. Hu, , B. Liu, , et al. (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief. Funct. Genomics, 11, 25–37
https://doi.org/10.1093/bfgp/elr035
pmid: 22184334
|
60 |
M. J. P. Chaisson, , R. K. Wilson, and E. E. Eichler, (2015) Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet., 16, 627–640
https://doi.org/10.1038/nrg3933
pmid: 26442640
|
61 |
X. Huang, , J. Wang, , S. Aluru, , S. P. Yang, and L. Hillier, (2003) PCAP: a whole-genome assembly program. Genome Res., 13, 2164–2170
https://doi.org/10.1101/gr.1390403
pmid: 12952883
|
62 |
T. J. Treangen, , D. D. Sommer, , F. E. Angly, , S Koren, . and M Pop, . (2011) Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics. 33, 11.8. 1–11.8. 18
|
63 |
J. Luo, , J. Wang, , Z. Zhang, , F. X. Wu, , M. Li, and Y. Pan, (2015) EPGA: de novo assembly using the distributions of reads and insert size. Bioinformatics, 31, 825–833
https://doi.org/10.1093/bioinformatics/btu762
pmid: 25406329
|
64 |
T. C. Conway, and A. J. Bromage, (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486
https://doi.org/10.1093/bioinformatics/btq697
pmid: 21245053
|
65 |
P. Pevzner, (2000) Computational Molecular Biology: An Algorithmic Approach. Cambridge: MIT press
|
66 |
P. A. Pevzner, , H. Tang, and M. S. Waterman, (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753
https://doi.org/10.1073/pnas.171285098
pmid: 11504945
|
67 |
D. R. Zerbino, and E. Birney, (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829
https://doi.org/10.1101/gr.074492.107
pmid: 18349386
|
68 |
A. Bankevich, , S. Nurk, , D. Antipov, , A. A. Gurevich, , M. Dvorkin, , A. S. Kulikov, , V. M. Lesin, , S. I. Nikolenko, , S. Pham, , A. D. Prjibelski, , et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477
https://doi.org/10.1089/cmb.2012.0021
pmid: 22506599
|
69 |
Y. Peng, , H. C. M. Leung, , S. M. Yiu, and F. Y. Chin, (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420–1428
https://doi.org/10.1093/bioinformatics/bts174
pmid: 22495754
|
70 |
J. Luo, , J. Wang, , W. Li, , Z. Zhang, , F. X. Wu, , M. Li, and Y. Pan, (2015) EPGA2: memory-efficient de novo assembler. Bioinformatics, 31, 3988–3990
pmid: 26315905
|
71 |
A. V. Zimin, , G. Marçais, , D. Puiu, , M. Roberts, , S. L. Salzberg, and J. A. Yorke, (2013) The MaSuRCA genome assembler. Bioinformatics, 29, 2669–2677
https://doi.org/10.1093/bioinformatics/btt476
pmid: 23990416
|
72 |
J. Butler, , I. MacCallum, , M. Kleber, , I. A. Shlyakhter, , M. K. Belmonte, , E. S. Lander, , C. Nusbaum, and D. B. Jaffe, (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18, 810–820
https://doi.org/10.1101/gr.7337908
pmid: 18340039
|
73 |
H. Li, and R. Durbin, (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760
https://doi.org/10.1093/bioinformatics/btp324
pmid: 19451168
|
74 |
J. T. Simpson, and R. Durbin, (2010) Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26, i367–i373
https://doi.org/10.1093/bioinformatics/btq217
pmid: 20529929
|
75 |
S. Koren, and A. M. Phillippy, (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120
https://doi.org/10.1016/j.mib.2014.11.014
pmid: 25461581
|
76 |
C. L. Xiao, , Y. Chen, , S. Q. Xie, , K-N Chen, , Y. Wang, , F. Luo, , and Z. Xie, (2016) MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads. bioRxiv, 089250
|
77 |
Y. Heo, , X. L. Wu, , D. Chen, , J. Ma, and W. M. Hwu, (2014) BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics, 30, 1354–1362
https://doi.org/10.1093/bioinformatics/btu030
pmid: 24451628
|
78 |
X. Li, and M. S. Waterman, (2003) Estimating the repeat structure and length of DNA sequences using L-tuples. Genome Res., 13, 1916–1922
pmid: 12902383
|
79 |
D. R. Kelley, , M. C. Schatz, and S. L. Salzberg, (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
https://doi.org/10.1186/gb-2010-11-11-r116
pmid: 21114842
|
80 |
X. Yang, , K. S. Dorman, and S. Aluru, (2010) Reptile: representative tiling for short read error correction. Bioinformatics, 26, 2526–2533
https://doi.org/10.1093/bioinformatics/btq468
pmid: 20834037
|
81 |
R. Li, , H. Zhu, , J. Ruan, , W. Qian, , X. Fang, , Z. Shi, , Y. Li, , S. Li, , G. Shan, , K. Kristiansen, , et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20, 265–272
https://doi.org/10.1101/gr.097261.109
pmid: 20019144
|
82 |
X. Zhao, , L. E. Palmer, , R. Bolanos, , C. Mircean, , D. Fasulo, and G. M. Wittenberg, (2010) EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol., 17, 1549–1560
https://doi.org/10.1089/cmb.2010.0127
pmid: 20973743
|
83 |
L. Salmela, and J. Schröder, (2011) Correcting errors in short reads by multiple alignments. Bioinformatics, 27, 1455–1461
https://doi.org/10.1093/bioinformatics/btr170
pmid: 21471014
|
84 |
J. D. Thompson, , J. C. Thierry, and O. Poch, (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics, 19, 1155–1161
https://doi.org/10.1093/bioinformatics/btg133
pmid: 12801878
|
85 |
T. Lassmann, and E. L. L. Sonnhammer, (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298
https://doi.org/10.1186/1471-2105-6-298
pmid: 16343337
|
86 |
A. Allam, , P. Kalnis, and V. Solovyev, (2015) Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics, 31, 3421–3428
https://doi.org/10.1093/bioinformatics/btv415
pmid: 26177965
|
87 |
L. Salmela, and E. Rivals, (2014) LoRDEC: accurate and efficient long read error correction. Bioinformatics, 30, 3506–3514
https://doi.org/10.1093/bioinformatics/btu538
pmid: 25165095
|
88 |
H. Li, and R. Durbin, (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760
https://doi.org/10.1093/bioinformatics/btp324
pmid: 19451168
|
89 |
B. Langmead, , C. Trapnell, , M. Pop, and S. L. Salzberg, (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 10, R25
https://doi.org/10.1186/gb-2009-10-3-r25
pmid: 19261174
|
90 |
S. Kurtz, , A. Phillippy, , A. L. Delcher, , M. Smoot, , M. Shumway, , C. Antonescu, and S. L. Salzberg, (2004) Versatile and open software for comparing large genomes. Genome Biol., 5, R12
https://doi.org/10.1186/gb-2004-5-2-r12
pmid: 14759262
|
91 |
Z. Ning, , A. J. Cox, and J. C. Mullikin, (2001) SSAHA: a fast search method for large DNA databases. Genome Res., 11, 1725–1729
https://doi.org/10.1101/gr.194201
pmid: 11591649
|
92 |
K. Berlin, , S. Koren, , C. S. Chin, , J. P. Drake, , J. M. Landolin, and A. M. Phillippy, (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630
https://doi.org/10.1038/nbt.3238
pmid: 26006009
|
93 |
H. Li, (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32, 2103–2110
https://doi.org/10.1093/bioinformatics/btw152
pmid: 27153593
|
94 |
P. Medvedev, , E. Scott, , B. Kakaradov, and P. Pevzner, (2011) Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 27, i137–i141
https://doi.org/10.1093/bioinformatics/btr208
pmid: 21685062
|
95 |
C. B. Do, , M. S. P. Mahabhashyam, , M. Brudno, and S. Batzoglou, (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res., 15, 330–340
https://doi.org/10.1101/gr.2821705
pmid: 15687296
|
96 |
S. I. Nikolenko, , A. I. Korobeynikov, and M. A. Alekseyev, (2013) BayesHammer: Bayesian clustering for error correction in single-cell sequencing, BMC genomics. BioMed Central, 2013, S7
https://doi.org/10.1186/1471-2164-14-S1-S7
|
97 |
W. C. Kao, , A. H. Chan, and Y. S. Song, (2011) ECHO: a reference-free short-read error correction algorithm. Genome Res., 21, 1181–1192
https://doi.org/10.1101/gr.111351.110
pmid: 21482625
|
98 |
M. J. Chaisson, and P. A. Pevzner, (2008) Short read fragment assembly of bacterial genomes. Genome Res., 18, 324–330
https://doi.org/10.1101/gr.7088808
pmid: 18083777
|
99 |
M. Li, , Z. Liao, , Y. He, , J. Wang, , J. Luo, and Y. Pan, (2017) ISEA: iterative seed-extension algorithm for de novo assembly using paired-end information and insert size distribution. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 14, 916–925
https://doi.org/10.1109/TCBB.2016.2550433
pmid: 27076460
|
100 |
J. Luo, , J. Wang, , Z. Zhang, , M. Li, and F. X. Wu, (2017) BOSS: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics, 33, 169–176
https://doi.org/10.1093/bioinformatics/btw597
pmid: 27634951
|
101 |
M. Li, , L. Tang, , F. X. Wu, , Y. Pan, and J. Wang, (2018) SCOP: a novel scaffolding algorithm based on contig classification and optimization. Bioinformatics, doi: 10.1093/bioinformatics/bty773
pmid: 30184046
|
102 |
J. Huddleston, , S. Ranade, , M. Malig, , F. Antonacci, , M. Chaisson, , L. Hon, , P. H. Sudmant, , T. A. Graves, , C. Alkan, , M. Y. Dennis, , et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696
https://doi.org/10.1101/gr.168450.113
pmid: 24418700
|
103 |
Y. Mostovoy, , M. Levy-Sakin, , J. Lam, , E. T. Lam, , A. R. Hastie, , P. Marks, , J. Lee, , C. Chu, , C. Lin, , Ž. Džakula, , et al. (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods, 13, 587–590
https://doi.org/10.1038/nmeth.3865
pmid: 27159086
|
104 |
M. J. Chaisson, and G. Tesler, (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics, 13, 238
https://doi.org/10.1186/1471-2105-13-238
pmid: 22988817
|
105 |
M. Boetzer, and W. Pirovano, (2014) SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics, 15, 211
https://doi.org/10.1186/1471-2105-15-211
pmid: 24950923
|
106 |
K. K. Lam, , K. LaButti, , A. Khalak, and D. Tse, (2015) FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads. Bioinformatics, 31, 3207–3209
https://doi.org/10.1093/bioinformatics/btv280
pmid: 26040454
|
107 |
C. Ye, , C. M. Hill, , S. Wu, , J. Ruan, and Z. S. Ma, (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep., 6, 31900
https://doi.org/10.1038/srep31900
pmid: 27573208
|
108 |
M. D. Muggli, , S. J. Puglisi, , R. Ronen, and C. Boucher, (2015) Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics, 31, i80–i88
https://doi.org/10.1093/bioinformatics/btv262
pmid: 26072512
|
109 |
B. Wu, , M. Li, , X. Liao, , J. Luo, , F. Wu, , Y. Pan, and J. Wang, (2018) MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1
https://doi.org/10.1109/TCBB.2018.2876855
pmid: 30334805
|
110 |
M. Li, , B. Wu, , X. Yan, , J. Luo, , Y. Pan, , F. X. Wu, and J. Wang, (2017) PECC: Correcting contigs based on paired-end read distribution. Comput. Biol. Chem., 69, 178–184
https://doi.org/10.1016/j.compbiolchem.2017.03.012
pmid: 28545961
|
111 |
S. Boisvert, , F. Raymond, , E. Godzaridis, , F. Laviolette, and J. Corbeil, (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol., 13, R122
https://doi.org/10.1186/gb-2012-13-12-r122
pmid: 23259615
|
112 |
M. C. Schatz, , D. Sommer, , D. Kelley, and M. Pop, (2010) De novo assembly of large genomes using cloud computing. In Proceedings of the Cold Spring Harbor Biology of Genomes Conference
|
113 |
Y. J. Chang, , C. C. Chen, , J. M. Ho, and C. –L. Chen, (2012) De novo assembly of high-throughput sequencing data with cloud computing and new operations on string graphs. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference. pp. 155–161
|
114 |
X. Guo, , N. Yu, , X. Ding, , J. Wang, and Y. Pan, (2015) DIME: a novel framework for de novo metagenomic sequence assembly. J. Comput. Biol., 22, 159–177
https://doi.org/10.1089/cmb.2014.0251
pmid: 25684202
|
115 |
R. J. Roberts, , M. O. Carneiro, and M. C. Schatz, (2013) The advantages of SMRT sequencing. Genome Biol., 14, 405
https://doi.org/10.1186/gb-2013-14-6-405
pmid: 23822731
|
116 |
T. R. Sharma, , B. N. Devanna, , K. Kiran, , P. K. Singh, , K. Arora, , P. Jain, , I. M. Tiwari, , H. Dubey, , B. Saklani, , M. Kumari, , et al. (2018) Status and prospects of next generation sequencing technologies in crop plants. Curr. Issues Mol. Biol., 27, 1–36
https://doi.org/10.21775/cimb.027.001
pmid: 28885172
|
117 |
H. Lee, , J. Gurtowski, , S. Yoo, , s. Marcus, , W McCombie, , and M. Schatz, (2014) Error correction and assembly complexity of single molecule sequencing reads. bioRxiv, 006395
|
118 |
A. Bashir, , A. Klammer, , W. P. Robins, , C. S. Chin, , D. Webster, , E. Paxinos, , D. Hsu, , M. Ashby, , S. Wang, , P. Peluso, , et al. (2012) A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol., 30, 701–707
https://doi.org/10.1038/nbt.2288
pmid: 22750883
|
119 |
R. L. Warren, , C. Yang, , B. P. Vandervalk, , B. Behsaz, , A. Lagman, , S. J. Jones, and I. Birol, (2015) LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience, 4, 35
https://doi.org/10.1186/s13742-015-0076-3
pmid: 26244089
|
120 |
S. Gao, , D. Bertrand, , B. K. H. Chia, and N. Nagarajan, (2016) OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biol., 17, 102
https://doi.org/10.1186/s13059-016-0951-y
pmid: 27169502
|
121 |
D. Antipov, , A. Korobeynikov, , J. S. McLean, and P. A. Pevzner, (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics, 32, 1009–1015
https://doi.org/10.1093/bioinformatics/btv688
pmid: 26589280
|
122 |
J. Huddleston, , S. Ranade, , M. Malig, , F. Antonacci, , M. Chaisson, , L. Hon, , P. H. Sudmant, , T. A. Graves, , C. Alkan, , M. Y. Dennis, , et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696
https://doi.org/10.1101/gr.168450.113
pmid: 24418700
|
123 |
J. Luo, , J. Wang, , J. Shang, , H. Luo, , M. Li, , F. Wu, and Y. Pan, (2018) GapReduce: a gap filling algorithm based on partitioned read sets. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1
https://doi.org/10.1109/TCBB.2018.2789909
pmid: 29993951
|
124 |
M. Boetzer, and W. Pirovano, (2012) Toward almost closed genomes with GapFiller. Genome Biol., 13, R56
https://doi.org/10.1186/gb-2012-13-6-r56
pmid: 22731987
|
125 |
D. Paulino, , R. L. Warren, , B. P. Vandervalk, , A. Raymond, , S. D. Jackman, and I. Birol, (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16, 230
https://doi.org/10.1186/s12859-015-0663-4
pmid: 26209068
|
126 |
S. Kosugi, , H. Hirakawa, and S. Tabata, (2015) GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics, 31, 3733–3741
https://doi.org/10.1093/bioinformatics/btv465
pmid: 26261222
|
127 |
A. C. English, , S. Richards, , Y. Han, , M. Wang, , V. Vee, , J. Qu, , X. Qin, , D. M. Muzny, , J. G. Reid, , K. C. Worley, , et al. (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One, 7, e47768
https://doi.org/10.1371/journal.pone.0047768
pmid: 23185243
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|