|
|
|
TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks |
Zohreh Baharvand Irannia1,Ting Chen1,2( ) |
1. Program in Computational Biology and Bioinformatics, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
2. Bioinformatics Division, TNLIST, Tsinghua University, Beijing 100084, China |
|
|
|
|
Abstract Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge.
Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations.
Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels.
Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity.
Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.
|
| Author Summary
This paper proposes a new approach to taxonomic classification of novel metagenomic sequences. Combining sequence similarity information with co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities, we develop a statistical model to predict taxonomic origins of unknown microorganisms at phylum and class levels. The results demonstrate that OTU co-abundant associations are strongly correlated with taxonomic associations. |
| Keywords
metagenomics
16s rRNA gene
taxonomic profiling
taxonomic prediction
Markov Random Field
OTU co-abundance network
|
|
|
| Fund: |
|
Corresponding Author(s):
Ting Chen
|
|
Just Accepted Date: 01 June 2016
Online First Date: 30 June 2016
Issue Date: 07 September 2016
|
|
| 1 |
Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev., 68, 669– 685
https://doi.org/10.1128/MMBR.68.4.669-685.2004
|
| 2 |
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464, 59–65
https://doi.org/10.1038/nature08821
|
| 3 |
Amann, R. I., Ludwig, W. and Schleifer, K. H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev., 59, 143–169
|
| 4 |
Eisen, J. A. (2007) Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol., 5, e82
https://doi.org/10.1371/journal.pbio.0050082
|
| 5 |
Hugenholtz, P., Goebel, B. M. and Pace, N. R. (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol., 180, 4765–4774
|
| 6 |
Riesenfeld, C. S., Schloss, P. D. and Handelsman, J. (2004) Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525–552
https://doi.org/10.1146/annurev.genet.38.072902.091216
|
| 7 |
Wooley, J. C. and Ye, Y. (2010) Metagenomics: facts and artifacts, and computational challenges. J. Comput. Sci. Technol., 25, 71–81
https://doi.org/10.1007/s11390-010-9306-4
|
| 8 |
Thomas, T., Gilbert, J. and Meyer, F. (2012) Metagenomics — a guide from sampling to data analysis. Microb. Inform. Exp., 2, 3
https://doi.org/10.1186/2042-5783-2-3
|
| 9 |
Teeling, H. and Glöckner, F. O. (2012) Current opportunities and challenges in microbial metagenome analysis — a bioinformatic perspective. Brief. Bioinform., 13, 728–742
https://doi.org/10.1093/bib/bbs039
|
| 10 |
Mande, S. S., Mohammed, M. H. and Ghosh, T. S. (2012) Classification of metagenomic sequences: methods and challenges. Brief. Bioinform., 13, 669–681
https://doi.org/10.1093/bib/bbs054
|
| 11 |
Maidak, B. (1996) The Ribosomal Database Project (RDP). Nucleic Acids Res., 24, 82–85
https://doi.org/10.1093/nar/24.1.82
|
| 12 |
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res., 41, D590–D596
https://doi.org/10.1093/nar/gks1219
|
| 13 |
DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol., 72, 5069–5072
https://doi.org/10.1128/AEM.03006-05
|
| 14 |
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410
https://doi.org/10.1016/S0022-2836(05)80360-2
|
| 15 |
Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E. M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., (2008) The metagenomics RAST server — a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9, 386
https://doi.org/10.1186/1471-2105-9-386
|
| 16 |
Huson, D. H., Auch, A. F., Qi, J. and Schuster, S. C. (2007) MEGAN analysis of metagenomic data. Genome Res., 17, 377–386
https://doi.org/10.1101/gr.5969107
|
| 17 |
Schloss, P. D. (2010) The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol., 6, e1000844
https://doi.org/10.1371/journal.pcbi.1000844
|
| 18 |
Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F. O., Ludwig, W., Schleifer, K.-H., Whitman, W. B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol., 12, 635–645
https://doi.org/10.1038/nrmicro3330
|
| 19 |
Freilich, S., Kreimer, A., Meilijson, I., Gophna, U., Sharan, R. and Ruppin, E. (2010) The large-scale organization of the bacterial network of ecological co-occurrence interactions. Nucleic Acids Res., 38, 3857–3868
https://doi.org/10.1093/nar/gkq118
|
| 20 |
Chaffron, S., Rehrauer, H., Pernthaler, J. and von Mering, C. (2010) A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res., 20, 947–959
https://doi.org/10.1101/gr.104521.109
|
| 21 |
Barberán, A., Bates, S. T., Casamayor, E. O. and Fierer, N. (2012) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J., 6, 343–351
https://doi.org/10.1038/ismej.2011.119
|
| 22 |
Faust, K. and Raes, J. (2012) Microbial interactions: from networks to models. Nat. Rev. Microbiol., 10, 538–550
https://doi.org/10.1038/nrmicro2832
|
| 23 |
Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C.-E. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S., (2011) Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J., 5, 1414–1425
https://doi.org/10.1038/ismej.2011.24
|
| 24 |
Gilbert, J. A., Steele, J. A., Caporaso, J. G., Steinbrück, L., Reeder, J., Temperton, B., Huse, S., McHardy, A. C., Knight, R., Joint, I., (2012) Defining seasonal marine microbial community dynamics. ISME J., 6, 298–308
https://doi.org/10.1038/ismej.2011.107
|
| 25 |
Kindermann, R. and Snell, J. L. (1980) Markov Random Fields and Their Applications. V. 1. Of Contemporary Mathematics. Rhode Island: American Mathematical Society
|
| 26 |
Deng M., Zhang K., Mehta S., Chen T., Sun F. (2004) Prediction of protein function using protein-protein interaction data. J. Comp. Biol. 10, 947–960
https://doi.org/10.1089/106652703322756168
|
| 27 |
Human-Intestine-NCBI,
|
| 28 |
Human-Skin NCBI,
|
| 29 |
Soil-NCBI,
|
| 30 |
Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics, 27, 611–618
https://doi.org/10.1093/bioinformatics/btq725
|
| 31 |
Lan, Y., Wang, Q., Cole, J. R. and Rosen, G. L. (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One, 7, e32491
https://doi.org/10.1371/journal.pone.0032491
|
| 32 |
Newman, M. E. J. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582
https://doi.org/10.1073/pnas.0601602103
|
| 33 |
Maslov, S. and Sneppen, K. (2002) Specificity and stability in topology of protein networks. Science, 296, 910–913
https://doi.org/10.1126/science.1065103
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|