Please wait a minute...
Protein & Cell

ISSN 1674-800X

ISSN 1674-8018(Online)

CN 11-5886/Q

Postal Subscription Code 80-984

2018 Impact Factor: 7.575

Prot Cell    2012, Vol. 3 Issue (2) : 148-152    https://doi.org/10.1007/s13238-012-2015-8      PMID: 22426983
RESEARCH ARTICLE
CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing
Guoguang Zhao1,4, Dechao Bu1,4, Changning Liu1, Jing Li1, Jian Yang3, Zhiyong Liu1, Yi Zhao1(), Runsheng Chen1,2()
1. Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2. Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; 3. State Key Laboratory for Molecular Virology and Genetic Engineering, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100176, China; 4. Graduate School of the Chinese Academy of Sciences, Beijing 100190, China
 Download: PDF(264 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.

Keywords CloudLCA      metagenome analysis      cloud computing     
Corresponding Author(s): Zhao Yi,Email:biozy@ict.ac.cn; Chen Runsheng,Email:chenrs@sun5.ibp.ac.cn   
Issue Date: 01 February 2012
 Cite this article:   
Guoguang Zhao,Dechao Bu,Changning Liu, et al. CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing[J]. Prot Cell, 2012, 3(2): 148-152.
 URL:  
https://academic.hep.com.cn/pac/EN/10.1007/s13238-012-2015-8
https://academic.hep.com.cn/pac/EN/Y2012/V3/I2/148
1 Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., Taylor, J. (2010). Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol, Chapter 19, Unit 19 . 1011-21 .
2 Blankenberg, D., Taylor, J., Schenck, I., He, J., Zhang, Y., Ghent, M., Veeraraghavan, N., Albert, I., Miller, W., Makova, K.D., . (2007). A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res 17, 960-964 .
doi: 10.1101/gr.5578007
3 Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007). MEGAN analysis of metagenomic data. Genome Res 17, 377-386 .
doi: 10.1101/gr.5969107
4 Huson, D.H., Mitra, S., Ruscheweyh, H.J., Weber, N., and Schuster, S.C. (2011). Integrative analysis of environmental sequences using MEGAN4. Genome Res 21, 1552-1560 .
doi: 10.1101/gr.120618.111
5 Huson, D.H., Richter, D.C., Mitra, S., Auch, A.F., and Schuster, S.C. (2009). Methods for comparative metagenomics. BMC Bioinformatics 10, S12.
doi: 10.1186/1471-2105-10-S1-S12
6 L?mmel, R. (2007). Google's MapReduce programming model- Revisited. Sci Comput Program 68, 208-237 .
7 Langmead, B., Hansen, K.D., and Leek, J.T. (2010). Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11, R83.
doi: 10.1186/gb-2010-11-8-r83
8 Metzker, M.L. (2010). Sequencing technologies- the next generation. Nat Rev Genet 11, 31-46 .
doi: 10.1038/nrg2626
9 Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., , and the MetaHIT Consortium. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59-65 .
doi: 10.1038/nature08821
10 Schatz, M.C. (2009). CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 1363-1369 .
doi: 10.1093/bioinformatics/btp236
11 Sudha Sadasivam, G., and Baktavatchalam, G. (2010). A novel approach to multiple sequence alignment using hadoop data grids. Int J Bioinform Res Appl 6, 472-483 .
doi: 10.1504/IJBRA.2010.037987
12 Yang, J., Yang, F., Ren, L., Xiong, Z., Wu, Z., Dong, J., Sun, L., Zhang, T., Hu, Y., Du, J., . (2011). Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol 49, 3463-3469 .
doi: 10.1128/JCM.00273-11
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed