Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2016, Vol. 4 Issue (3) : 149-158    https://doi.org/10.1007/s40484-016-0073-2
RESEARCH ARTICLE
TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks
Zohreh Baharvand Irannia1,Ting Chen1,2()
1. Program in Computational Biology and Bioinformatics, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
2. Bioinformatics Division, TNLIST, Tsinghua University, Beijing 100084, China
 Download: PDF(437 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge.

Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations.

Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels.

Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity.

Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.

Author Summary   

This paper proposes a new approach to taxonomic classification of novel metagenomic sequences. Combining sequence similarity information with co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities, we develop a statistical model to predict taxonomic origins of unknown microorganisms at phylum and class levels. The results demonstrate that OTU co-abundant associations are strongly correlated with taxonomic associations.

Keywords metagenomics      16s rRNA gene      taxonomic profiling      taxonomic prediction      Markov Random Field      OTU co-abundance network     
PACS:     
Fund: 
Corresponding Author(s): Ting Chen   
Just Accepted Date: 01 June 2016   Online First Date: 30 June 2016    Issue Date: 07 September 2016
 Cite this article:   
Zohreh Baharvand Irannia,Ting Chen. TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks[J]. Quant. Biol., 2016, 4(3): 149-158.
 URL:  
https://academic.hep.com.cn/qb/EN/10.1007/s40484-016-0073-2
https://academic.hep.com.cn/qb/EN/Y2016/V4/I3/149
Soil Skin Intestine Soil-R Skin-R Intestine-R
Number of nodes 572 414 541 572 414 541
Number of edges 4996 3982 4000 4996 3982 4000
Avg. Degree 11 16 14 11 16 14
Clustering coefficient 0.211 0.326 0.265 0.029±0.003 0.164±0.008 0.042±0.004
Modularity 0.827 0.647 0.767 0.298±0.01 0.192±0.005 0.256±0.005
Tab.1  Network statistics for microbial co-abundance networks from soil, human skin, human intestine, and for the random networks (-R).
Class label Soil-CC Soil-RN-CC
Bacilli 0.688677 0.001027
Clostridia 0.450079 0.002208
Gammaproteobacteria 0.303088 0.00425
Alphaproteobacteria 0.125 0.001857
Deltaproteobacteria 0.105263 0.001178
Tab.2  Clustering coefficient of each class label in the soil network and the average random network.
Class label Skin-CC Skin-RN-CC
Clostridia 0.647505 0.030902
Deinococci 0.375 0
Betaproteobacteria 0.352941 0.009074
Bacteroidia 0.301282 0.017062
Actinobacteria 0.293367 0.02438
Bacilli 0.269231 0.013244
Sphingobacteria 0.172414 0.019276
Tab.3  Clustering coefficient of each class label in the human skin network and average random network.
Class label Intestine-CC Intestine-RN-CC
Clostridia 0.444464 0.0201
Negativicutes 0.377273 0.00030303
Bacteroidia 0.282927 0.006898
Actinobacteria 0.000303 0
Tab.4  Clustering coefficient for each class label in the human intestine network and the average random network.
Class labels a b -1 g- b
Bacteroidia −2.33537 −0.04245 0.247362
Actinobacteria −1.83828 −0.0435 0.198456
Clostridia −1.65726 −0.02351 0.214375
Bacilli −2.37669 −0.04131 0.261569
Betaproteobacteria −2.77259 −0.04109 0.491779
Acidobacteria_Gp4 −4.09767 0.025845 −0.09265
Sphingobacteria −2.25672 −0.02481 0.134167
Acidobacteria_Gp3 −4.32413 −0.27638 −30.2856
Flavobacteria −3.38777 0.027906 −0.95404
Epsilonproteobacteria −4.32413 −0.04917 −29.7953
Deinococci −3.61765 −0.04201 0.651011
Cyanobacteria −3.61765 −0.12877 1.07966
Chloroplast −4.32413 −0.05814 −30.0519
Deltaproteobacteria −3.49651 0.074062 −0.26545
Alphaproteobacteria −2.96527 0.051567 −0.27868
Tab.5  Estimated parameters for the human skin dataset.
Soil AUC Skin AUC Intestine AUC
Bacilli 0.76 Clostridia 0.72 Negativicutes 0.70
Betaproteobacteria 0.65 Bacteroidia 0.67 Clostridia 0.62
Deltaproteobacteria 0.59 Betaproteobacteria 0.66 Bacteroidia 0.60
Alphaproteobacteria 0.59 Deinococci 0.59 Erysipelotrichia 0.59
Gammaproteobacteria 0.59 Actinobacteria 0.56
Actinobacteria 0.54
Alphaproteobacteria 0.58
Bacilli 0.57
Tab.6  AUC values for major classes in soil, human skin, and human intestine datasets.
Fig.1  An unknown OTU (#3000) was predicted to be in the class of Bacilli according to the taxa of its neighbors in the network.
Fig.2  The neighborhood of node 9 in human intestine network has equal numbers of Bacteroidia (3) and Negativicutes (3).
Fig.3  The first and second level neighbors of node 9 in the human intestine network.

There is a high connectivity among nodes with class label Negativicutes (yellow). The probability of node 9 belonging to class Negativicutes is the highest.

1 Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev., 68, 669– 685
https://doi.org/10.1128/MMBR.68.4.669-685.2004
2 Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464, 59–65
https://doi.org/10.1038/nature08821
3 Amann, R. I., Ludwig, W. and Schleifer, K. H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev., 59, 143–169
4 Eisen, J. A. (2007) Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol., 5, e82
https://doi.org/10.1371/journal.pbio.0050082
5 Hugenholtz, P., Goebel, B. M. and Pace, N. R. (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol., 180, 4765–4774
6 Riesenfeld, C. S., Schloss, P. D. and Handelsman, J. (2004) Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525–552
https://doi.org/10.1146/annurev.genet.38.072902.091216
7 Wooley, J. C. and Ye, Y. (2010) Metagenomics: facts and artifacts, and computational challenges. J. Comput. Sci. Technol., 25, 71–81
https://doi.org/10.1007/s11390-010-9306-4
8 Thomas, T., Gilbert, J. and Meyer, F. (2012) Metagenomics — a guide from sampling to data analysis. Microb. Inform. Exp., 2, 3
https://doi.org/10.1186/2042-5783-2-3
9 Teeling, H. and Glöckner, F. O. (2012) Current opportunities and challenges in microbial metagenome analysis — a bioinformatic perspective. Brief. Bioinform., 13, 728–742
https://doi.org/10.1093/bib/bbs039
10 Mande, S. S., Mohammed, M. H. and Ghosh, T. S. (2012) Classification of metagenomic sequences: methods and challenges. Brief. Bioinform., 13, 669–681
https://doi.org/10.1093/bib/bbs054
11 Maidak, B. (1996) The Ribosomal Database Project (RDP). Nucleic Acids Res., 24, 82–85
https://doi.org/10.1093/nar/24.1.82
12 Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res., 41, D590–D596
https://doi.org/10.1093/nar/gks1219
13 DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol., 72, 5069–5072
https://doi.org/10.1128/AEM.03006-05
14 Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410
https://doi.org/10.1016/S0022-2836(05)80360-2
15 Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E. M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., (2008) The metagenomics RAST server — a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9, 386
https://doi.org/10.1186/1471-2105-9-386
16 Huson, D. H., Auch, A. F., Qi, J. and Schuster, S. C. (2007) MEGAN analysis of metagenomic data. Genome Res., 17, 377–386
https://doi.org/10.1101/gr.5969107
17 Schloss, P. D. (2010) The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol., 6, e1000844
https://doi.org/10.1371/journal.pcbi.1000844
18 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F. O., Ludwig, W., Schleifer, K.-H., Whitman, W. B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol., 12, 635–645
https://doi.org/10.1038/nrmicro3330
19 Freilich, S., Kreimer, A., Meilijson, I., Gophna, U., Sharan, R. and Ruppin, E. (2010) The large-scale organization of the bacterial network of ecological co-occurrence interactions. Nucleic Acids Res., 38, 3857–3868
https://doi.org/10.1093/nar/gkq118
20 Chaffron, S., Rehrauer, H., Pernthaler, J. and von Mering, C. (2010) A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res., 20, 947–959
https://doi.org/10.1101/gr.104521.109
21 Barberán, A., Bates, S. T., Casamayor, E. O. and Fierer, N. (2012) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J., 6, 343–351
https://doi.org/10.1038/ismej.2011.119
22 Faust, K. and Raes, J. (2012) Microbial interactions: from networks to models. Nat. Rev. Microbiol., 10, 538–550
https://doi.org/10.1038/nrmicro2832
23 Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C.-E. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S., (2011) Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J., 5, 1414–1425
https://doi.org/10.1038/ismej.2011.24
24 Gilbert, J. A., Steele, J. A., Caporaso, J. G., Steinbrück, L., Reeder, J., Temperton, B., Huse, S., McHardy, A. C., Knight, R., Joint, I., (2012) Defining seasonal marine microbial community dynamics. ISME J., 6, 298–308
https://doi.org/10.1038/ismej.2011.107
25 Kindermann, R. and Snell, J. L. (1980) Markov Random Fields and Their Applications. V. 1. Of Contemporary Mathematics. Rhode Island: American Mathematical Society
26 Deng M., Zhang K., Mehta S., Chen T., Sun F. (2004) Prediction of protein function using protein-protein interaction data. J. Comp. Biol. 10, 947–960
https://doi.org/10.1089/106652703322756168
27 Human-Intestine-NCBI,
28 Human-Skin NCBI,
29 Soil-NCBI,
30 Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics, 27, 611–618
https://doi.org/10.1093/bioinformatics/btq725
31 Lan, Y., Wang, Q., Cole, J. R. and Rosen, G. L. (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One, 7, e32491
https://doi.org/10.1371/journal.pone.0032491
32 Newman, M. E. J. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582
https://doi.org/10.1073/pnas.0601602103
33 Maslov, S. and Sneppen, K. (2002) Specificity and stability in topology of protein networks. Science, 296, 910–913
https://doi.org/10.1126/science.1065103
[1] QB-16074-of-CT_suppl_1 Download
[2] QB-16074-of-CT_suppl_2 Download
[1] Krishna Choudhary, Fei Deng, Sharon Aviran. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions[J]. Quant. Biol., 2017, 5(1): 3-24.
[2] Yijun Guo, Bing Wei, Shiyan Xiao, Dongbao Yao, Hui Li, Huaguo Xu, Tingjie Song, Xiang Li, Haojun Liang. Recent advances in molecular machines based on toehold-mediated strand displacement reaction[J]. Quant. Biol., 2017, 5(1): 25-41.
[3] Russell Brown, Andreas Lengeling, Baojun Wang. Phage engineering: how advances in molecular biology and synthetic biology are being utilized to enhance the therapeutic potential of bacteriophages[J]. Quant. Biol., 2017, 5(1): 42-54.
[4] Mehdi Sadeghpour, Alan Veliz-Cuba, Gábor Orosz, Krešimir Josić, Matthew R. Bennett. Bistability and oscillations in co-repressive synthetic microbial consortia[J]. Quant. Biol., 2017, 5(1): 55-66.
[5] Jingwen Guan, Xu Shi, Roberto Burgos, Lanying Zeng. Visualization of phage DNA degradation by a type I CRISPR-Cas system at the single-cell level[J]. Quant. Biol., 2017, 5(1): 67-75.
[6] Keith C. Heyde, MaryJoe K. Rice, Sung-Ho Paek, Felicia Y. Scott, Ruihua Zhang, Warren C. Ruder. Modeling information exchange between living and artificial cells[J]. Quant. Biol., 2017, 5(1): 76-89.
[7] Hailin Meng, Yingfei Ma, Guoqin Mai, Yong Wang, Chenli Liu. Construction of precise support vector machine based models for predicting promoter strength[J]. Quant. Biol., 2017, 5(1): 90-98.
[8] Weizhong Tu, Shaozhen Ding, Ling Wu, Zhe Deng, Hui Zhu, Xiaotong Xu, Chen Lin, Chaonan Ye, Minlu Han, Mengna Zhao, Juan Liu, Zixin Deng, Junni Chen, Dong-Qing Wei, Qian-Nan Hu. SynBioEcoli: a comprehensive metabolism network of engineered E. coli in three dimensional visualization[J]. Quant. Biol., 2017, 5(1): 99-104.
[9] Bingxiang Xu, Zhihua Zhang. Computational inference of physical spatial organization of eukaryotic genomes[J]. Quant. Biol., 2016, 4(4): 302-309.
[10] Guanghui Zhu, Xing-Ming Zhao, Jun Wu. A survey on biomarker identification based on molecular networks[J]. Quant. Biol., 2016, 4(4): 310-319.
[11] Yasen Jiao, Pufeng Du. Performance measures in evaluating machine learning based bioinformatics predictors for classifications[J]. Quant. Biol., 2016, 4(4): 320-330.
[12] Zhun Miao, Xuegong Zhang. Differential expression analyses for single-cell RNA-Seq: old questions on new data[J]. Quant. Biol., 2016, 4(4): 243-260.
[13] Petr Kloucek, Armin von Gunten. On the possibility of identifying human subjects using behavioural complexity analyses[J]. Quant. Biol., 2016, 4(4): 261-269.
[14] Sebastián Torcida, Paula Gonzalez, Federico Lotto. A resistant method for landmark-based analysis of individual asymmetry in two dimensions[J]. Quant. Biol., 2016, 4(4): 270-282.
[15] Jing Qin, Bin Yan, Yaohua Hu, Panwen Wang, Junwen Wang. Applications of integrative OMICs approaches to gene regulation studies[J]. Quant. Biol., 2016, 4(4): 283-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed