TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks

doi:10.1007/s40484-016-0073-2

Quant. Biol.

2016, Vol. 4

Issue (3) : 149-158 https://doi.org/10.1007/s40484-016-0073-2

RESEARCH ARTICLE

TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks

Zohreh Baharvand Irannia¹,Ting Chen^1,²(

)

¹. Program in Computational Biology and Bioinformatics, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
². Bioinformatics Division, TNLIST, Tsinghua University, Beijing 100084, China

Download: PDF(437 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge.

Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations.

Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels.

Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity.

Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.

Author Summary

This paper proposes a new approach to taxonomic classification of novel metagenomic sequences. Combining sequence similarity information with co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities, we develop a statistical model to predict taxonomic origins of unknown microorganisms at phylum and class levels. The results demonstrate that OTU co-abundant associations are strongly correlated with taxonomic associations.

Keywords metagenomics 16s rRNA gene taxonomic profiling taxonomic prediction Markov Random Field OTU co-abundance network

PACS:

Fund:

Corresponding Author(s): Ting Chen

Just Accepted Date: 01 June 2016 Online First Date: 30 June 2016 Issue Date: 07 September 2016

Cite this article:

Zohreh Baharvand Irannia,Ting Chen. TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks[J]. Quant. Biol., 2016, 4(3): 149-158.

URL:

https://academic.hep.com.cn/qb/EN/10.1007/s40484-016-0073-2
https://academic.hep.com.cn/qb/EN/Y2016/V4/I3/149

Tab.1 Network statistics for microbial co-abundance networks from soil, human skin, human intestine, and for the random networks (-R).

Tab.2 Clustering coefficient of each class label in the soil network and the average random network.

Tab.3 Clustering coefficient of each class label in the human skin network and average random network.

Tab.4 Clustering coefficient for each class label in the human intestine network and the average random network.

Tab.5 Estimated parameters for the human skin dataset.

Tab.6 AUC values for major classes in soil, human skin, and human intestine datasets.

Fig.1 An unknown OTU (#3000) was predicted to be in the class of Bacilli according to the taxa of its neighbors in the network.

Fig.2 The neighborhood of node 9 in human intestine network has equal numbers of Bacteroidia (3) and Negativicutes (3).

Fig.3 The first and second level neighbors of node 9 in the human intestine network.

There is a high connectivity among nodes with class label Negativicutes (yellow). The probability of node 9 belonging to class Negativicutes is the highest.

1	Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev., 68, 669– 685 https://doi.org/10.1128/MMBR.68.4.669-685.2004
2	Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464, 59–65 https://doi.org/10.1038/nature08821
3	Amann, R. I., Ludwig, W. and Schleifer, K. H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev., 59, 143–169
4	Eisen, J. A. (2007) Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol., 5, e82 https://doi.org/10.1371/journal.pbio.0050082
5	Hugenholtz, P., Goebel, B. M. and Pace, N. R. (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol., 180, 4765–4774
6	Riesenfeld, C. S., Schloss, P. D. and Handelsman, J. (2004) Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525–552 https://doi.org/10.1146/annurev.genet.38.072902.091216
7	Wooley, J. C. and Ye, Y. (2010) Metagenomics: facts and artifacts, and computational challenges. J. Comput. Sci. Technol., 25, 71–81 https://doi.org/10.1007/s11390-010-9306-4
8	Thomas, T., Gilbert, J. and Meyer, F. (2012) Metagenomics — a guide from sampling to data analysis. Microb. Inform. Exp., 2, 3 https://doi.org/10.1186/2042-5783-2-3
9	Teeling, H. and Glöckner, F. O. (2012) Current opportunities and challenges in microbial metagenome analysis — a bioinformatic perspective. Brief. Bioinform., 13, 728–742 https://doi.org/10.1093/bib/bbs039
10	Mande, S. S., Mohammed, M. H. and Ghosh, T. S. (2012) Classification of metagenomic sequences: methods and challenges. Brief. Bioinform., 13, 669–681 https://doi.org/10.1093/bib/bbs054
11	Maidak, B. (1996) The Ribosomal Database Project (RDP). Nucleic Acids Res., 24, 82–85 https://doi.org/10.1093/nar/24.1.82
12	Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res., 41, D590–D596 https://doi.org/10.1093/nar/gks1219
13	DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol., 72, 5069–5072 https://doi.org/10.1128/AEM.03006-05
14	Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410 https://doi.org/10.1016/S0022-2836(05)80360-2
15	Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E. M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., (2008) The metagenomics RAST server — a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9, 386 https://doi.org/10.1186/1471-2105-9-386
16	Huson, D. H., Auch, A. F., Qi, J. and Schuster, S. C. (2007) MEGAN analysis of metagenomic data. Genome Res., 17, 377–386 https://doi.org/10.1101/gr.5969107
17	Schloss, P. D. (2010) The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol., 6, e1000844 https://doi.org/10.1371/journal.pcbi.1000844
18	Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F. O., Ludwig, W., Schleifer, K.-H., Whitman, W. B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol., 12, 635–645 https://doi.org/10.1038/nrmicro3330
19	Freilich, S., Kreimer, A., Meilijson, I., Gophna, U., Sharan, R. and Ruppin, E. (2010) The large-scale organization of the bacterial network of ecological co-occurrence interactions. Nucleic Acids Res., 38, 3857–3868 https://doi.org/10.1093/nar/gkq118
20	Chaffron, S., Rehrauer, H., Pernthaler, J. and von Mering, C. (2010) A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res., 20, 947–959 https://doi.org/10.1101/gr.104521.109
21	Barberán, A., Bates, S. T., Casamayor, E. O. and Fierer, N. (2012) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J., 6, 343–351 https://doi.org/10.1038/ismej.2011.119
22	Faust, K. and Raes, J. (2012) Microbial interactions: from networks to models. Nat. Rev. Microbiol., 10, 538–550 https://doi.org/10.1038/nrmicro2832
23	Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C.-E. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S., (2011) Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J., 5, 1414–1425 https://doi.org/10.1038/ismej.2011.24
24	Gilbert, J. A., Steele, J. A., Caporaso, J. G., Steinbrück, L., Reeder, J., Temperton, B., Huse, S., McHardy, A. C., Knight, R., Joint, I., (2012) Defining seasonal marine microbial community dynamics. ISME J., 6, 298–308 https://doi.org/10.1038/ismej.2011.107
25	Kindermann, R. and Snell, J. L. (1980) Markov Random Fields and Their Applications. V. 1. Of Contemporary Mathematics. Rhode Island: American Mathematical Society
26	Deng M., Zhang K., Mehta S., Chen T., Sun F. (2004) Prediction of protein function using protein-protein interaction data. J. Comp. Biol. 10, 947–960 https://doi.org/10.1089/106652703322756168
27	Human-Intestine-NCBI,
28	Human-Skin NCBI,
29	Soil-NCBI,
30	Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics, 27, 611–618 https://doi.org/10.1093/bioinformatics/btq725
31	Lan, Y., Wang, Q., Cole, J. R. and Rosen, G. L. (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One, 7, e32491 https://doi.org/10.1371/journal.pone.0032491
32	Newman, M. E. J. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582 https://doi.org/10.1073/pnas.0601602103
33	Maslov, S. and Sneppen, K. (2002) Specificity and stability in topology of protein networks. Science, 296, 910–913 https://doi.org/10.1126/science.1065103

[1]	QB-16074-of-CT_suppl_1	Download
[2]	QB-16074-of-CT_suppl_2	Download

[1]	Krishna Choudhary, Fei Deng, Sharon Aviran. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions[J]. Quant. Biol., 2017, 5(1): 3-24.
[2]	Yijun Guo, Bing Wei, Shiyan Xiao, Dongbao Yao, Hui Li, Huaguo Xu, Tingjie Song, Xiang Li, Haojun Liang. Recent advances in molecular machines based on toehold-mediated strand displacement reaction[J]. Quant. Biol., 2017, 5(1): 25-41.
[3]	Russell Brown, Andreas Lengeling, Baojun Wang. Phage engineering: how advances in molecular biology and synthetic biology are being utilized to enhance the therapeutic potential of bacteriophages[J]. Quant. Biol., 2017, 5(1): 42-54.
[4]	Mehdi Sadeghpour, Alan Veliz-Cuba, Gábor Orosz, Krešimir Josić, Matthew R. Bennett. Bistability and oscillations in co-repressive synthetic microbial consortia[J]. Quant. Biol., 2017, 5(1): 55-66.
[5]	Jingwen Guan, Xu Shi, Roberto Burgos, Lanying Zeng. Visualization of phage DNA degradation by a type I CRISPR-Cas system at the single-cell level[J]. Quant. Biol., 2017, 5(1): 67-75.
[6]	Keith C. Heyde, MaryJoe K. Rice, Sung-Ho Paek, Felicia Y. Scott, Ruihua Zhang, Warren C. Ruder. Modeling information exchange between living and artificial cells[J]. Quant. Biol., 2017, 5(1): 76-89.
[7]	Hailin Meng, Yingfei Ma, Guoqin Mai, Yong Wang, Chenli Liu. Construction of precise support vector machine based models for predicting promoter strength[J]. Quant. Biol., 2017, 5(1): 90-98.
[8]	Weizhong Tu, Shaozhen Ding, Ling Wu, Zhe Deng, Hui Zhu, Xiaotong Xu, Chen Lin, Chaonan Ye, Minlu Han, Mengna Zhao, Juan Liu, Zixin Deng, Junni Chen, Dong-Qing Wei, Qian-Nan Hu. *SynBioEcoli: a comprehensive metabolism network of engineered E. coli* in three dimensional visualization**[J]. Quant. Biol., 2017, 5(1): 99-104.
[9]	Bingxiang Xu, Zhihua Zhang. Computational inference of physical spatial organization of eukaryotic genomes[J]. Quant. Biol., 2016, 4(4): 302-309.
[10]	Guanghui Zhu, Xing-Ming Zhao, Jun Wu. A survey on biomarker identification based on molecular networks[J]. Quant. Biol., 2016, 4(4): 310-319.
[11]	Yasen Jiao, Pufeng Du. Performance measures in evaluating machine learning based bioinformatics predictors for classifications[J]. Quant. Biol., 2016, 4(4): 320-330.
[12]	Zhun Miao, Xuegong Zhang. Differential expression analyses for single-cell RNA-Seq: old questions on new data[J]. Quant. Biol., 2016, 4(4): 243-260.
[13]	Petr Kloucek, Armin von Gunten. On the possibility of identifying human subjects using behavioural complexity analyses[J]. Quant. Biol., 2016, 4(4): 261-269.
[14]	Sebastián Torcida, Paula Gonzalez, Federico Lotto. A resistant method for landmark-based analysis of individual asymmetry in two dimensions[J]. Quant. Biol., 2016, 4(4): 270-282.
[15]	Jing Qin, Bin Yan, Yaohua Hu, Panwen Wang, Junwen Wang. Applications of integrative OMICs approaches to gene regulation studies[J]. Quant. Biol., 2016, 4(4): 283-301.

Viewed

Full text

Abstract

Cited

Shared

Discussed