Please wait a minute...
Protein & Cell

ISSN 1674-800X

ISSN 1674-8018(Online)

CN 11-5886/Q

Postal Subscription Code 80-984

2018 Impact Factor: 7.575

Protein Cell    2023, Vol. 14 Issue (10) : 713-725    https://doi.org/10.1093/procel/pwad024
REVIEW
The best practice for microbiome analysis using R
Tao Wen1,2, Guoqing Niu2, Tong Chen3, Qirong Shen2, Jun Yuan2(), Yong-Xin Liu1()
1. Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
2. The Key Laboratory of Plant Immunity Jiangsu Provincial Key Lab for Organic Solid Waste Utilization Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, National Engineering Research Center for Organic-based Fertilizers, Nanjing Agricultural University, Nanjing 210095, China
3. National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
 Download: PDF(2009 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

With the gradual maturity of sequencing technology, many microbiome studies have published, driving the emergence and advance of related analysis tools. R language is the widely used platform for microbiome data analysis for powerful functions. However, tens of thousands of R packages and numerous similar analysis tools have brought major challenges for many researchers to explore microbiome data. How to choose suitable, efficient, convenient, and easy-to-learn tools from the numerous R packages has become a problem for many microbiome researchers. We have organized 324 common R packages for microbiome analysis and classified them according to application categories (diversity, difference, biomarker, correlation and network, functional prediction, and others), which could help researchers quickly find relevant R packages for microbiome analysis. Furthermore, we systematically sorted the integrated R packages (phyloseq, microbiome, MicrobiomeAnalystR, Animalcules, microeco, and amplicon) for microbiome analysis, and summarized the advantages and limitations, which will help researchers choose the appropriate tools. Finally, we thoroughly reviewed the R packages for microbiome analysis, summarized most of the common analysis content in the microbiome, and formed the most suitable pipeline for microbiome analysis. This paper is accompanied by hundreds of examples with 10,000 lines codes in GitHub, which can help beginners to learn, also help analysts compare and test different tools. This paper systematically sorts the application of R in microbiome, providing an important theoretical basis and practical reference for the development of better microbiome tools in the future. All the code is available at GitHub github.com/taowenmicro/EasyMicrobiomeR.

Keywords R package      microbiome      data analysis      visualization      amplicon      metagenome     
Corresponding Author(s): Jun Yuan,Yong-Xin Liu   
About author:

Peng Lei and Charity Ngina Mwangi contributed equally to this work.

Issue Date: 16 November 2023
 Cite this article:   
Tao Wen,Guoqing Niu,Tong Chen, et al. The best practice for microbiome analysis using R[J]. Protein Cell, 2023, 14(10): 713-725.
 URL:  
https://academic.hep.com.cn/pac/EN/10.1093/procel/pwad024
https://academic.hep.com.cn/pac/EN/Y2023/V14/I10/713
1 A Amir, D McDonald, JA Navas-Molina et al. Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems 2017;2:e00191–e00116.
https://doi.org/10.1128/mSystems.00191-16
2 KP Aßhauer, B Wemheuer, R Daniel et al. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 2015;31:2882–2884.
https://doi.org/10.1093/bioinformatics/btv287
3 DJ Barnett, IC Arts, J. Penders microViz: an R package for microbiome data visualization and statistics. J Open Source Softw 2021;6:3201.
https://doi.org/10.21105/joss.03201
4 E Bolyen, JR Rideout, MR Dillon et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019;37:852–857.
https://doi.org/10.1038/s41587-019-0209-9
5 BJ Callahan, PJ McMurdie, MJ Rosen et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581–583.
https://doi.org/10.1038/nmeth.3869
6 JG Caporaso, J Kuczynski, J Stombaugh et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335–336.
https://doi.org/10.1038/nmeth.f.303
7 VJ Carrión, J Perez-Jaramillo, V Cordovez et al. Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome. Science 2019;366:606–612.
https://doi.org/10.1126/science.aaw9285
8 H Chen, PC. Boutros VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinf 2011;12:1–7.
https://doi.org/10.1186/1471-2105-12-35
9 T Chen, H Zhang, Y Liu et al. EVenn: easy to create repeatable and editable Venn diagrams and Venn networks online. J Genet Genom 2021;48:863–866.
https://doi.org/10.1016/j.jgg.2021.07.007
10 Y Chen, J Li, Y Zhang et al. Parallel-Meta Suite: interactive and rapid microbiome data analysis on multiple platforms. iMeta 2022;1:e1.
https://doi.org/10.1002/imt2.1
11 J Chong, P Liu, G Zhou et al. Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. Nat Protoc 2020;15:799–821.
https://doi.org/10.1038/s41596-019-0264-1
12 JR Conway, A Lex, NU. Gehlenborg An R package for the visualization of intersecting sets and their properties. Bioinformatics 2017;33:2938–2940.
https://doi.org/10.1093/bioinformatics/btx364
13 E Dimitriadou, K Hornik, F Leisch et al. Misc functions of the Department of Statistics (e1071), TU Wien. R Package 2008;1:5–24.
14 S Dray, A-B. Dufour The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 2007;22:1–20.
https://doi.org/10.18637/jss.v022.i04
15 S Dray, G Blanchet, D Borcard et al. Package ‘adespatial’. R Package 2018;1:3–8.
16 RC. Edgar Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010;26:2460–2461.
https://doi.org/10.1093/bioinformatics/btq461
17 RC Edgar, H. Flyvbjerg Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 2015;31:3476–3482.
https://doi.org/10.1093/bioinformatics/btv401
18 RA. Fisher The use of multiple measurements in taxonomic problems. Ann Eugen 1936;7:179–188.
https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
19 EA Franzosa, LJ McIver, G Rahnavard et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 2018;15:962–968.
https://doi.org/10.1038/s41592-018-0176-y
20 Z. Gu Complex heatmap visualization. iMeta 2022;1:e43.
https://doi.org/10.1002/imt2.43
21 Z Gu, L Gu, R Eils et al. Circlize implements and enhances circular visualization in R. Bioinformatics 2014;30:2811–2812.
https://doi.org/10.1093/bioinformatics/btu393
22 NE Hamilton, M. Ferry ggtern: Ternary diagrams using ggplot2. J Stat Softw 2018;87:1–17.
https://doi.org/10.18637/jss.v087.c03
23 Jr FE Harrell, Jr MFE. Harrell Package ‘hmisc’. CRAN2018 2019;2019:235–236.
24 B Hofner, A Mayr, N Robinzonov et al. Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 2014;29:3–35.
https://doi.org/10.1007/s00180-012-0382-5
25 J Huerta-Cepas, K Forslund, LP Coelho et al. Fast genome-wide functional annotation through orthology assignment by egg-NOG-mapper. Mol Biol Evol 2017;34:2115–2122.
https://doi.org/10.1093/molbev/msx148
26 DH Huson, AF Auch, J Qi et al. MEGAN analysis of metagenomic data. Genome Res 2007;17:377–386.
https://doi.org/10.1101/gr.5969107
27 R Ihaka, R. Gentleman R: a language for data analysis and graphics. J Comput Graph Stat 1996;5:299–314.
https://doi.org/10.1080/10618600.1996.10474713
28 SW Kembel, PD Cowan, MR Helmus et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010;26:1463–1464.
https://doi.org/10.1093/bioinformatics/btq166
29 D Knights, J Kuczynski, ES Charlson et al. Bayesian community-wide culture-independent microbial source tracking. Nat Methods 2011;8:761–763.
https://doi.org/10.1038/nmeth.1650
30 M. Kuhn Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26.
https://doi.org/10.18637/jss.v028.i05
31 ZD Kurtz, CL Müller, ER Miraldi et al. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015;11:e1004226.
https://doi.org/10.1371/journal.pcbi.1004226
32 P Langfelder, S. Horvath WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 2008;9:1–13.
https://doi.org/10.1186/1471-2105-9-559
33 W Li, L Wang, X Li et al. Sequence-based functional metagenomics reveals novel natural diversity of functioning CopA in environmental microbiomes. Genom Proteom Bioinform 2022;20:1–12.
https://doi.org/10.1016/j.gpb.2022.08.006
34 A Liaw, M. Wiener Classification and regression by randomForest. R News 2002;2:18–22.
35 H Lin, SD. Peddada Analysis of microbial compositions: a review of normalization and differential abundance analysis. Npj Biofilms Microbiomes 2020;6:1–13.
https://doi.org/10.1038/s41522-020-00160-w
36 C Liu, Y Cui, X Li et al. microeco: an R package for data mining in microbial community ecology. FEMS Microbiol Ecol 2020;97:fiaa255.
https://doi.org/10.1093/femsec/fiaa255
37 Y Liu, Y Qin, T Chen et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 2021;12:315–330.
https://doi.org/10.1007/s13238-020-00724-8
38 YX Liu, L Chen, T Ma et al. EasyAmplicon: an easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2023;2:e83.
https://doi.org/10.1002/imt2.83
39 S Louca, LW Parfrey, M. Doebeli Decoupling function and taxonomy in the global ocean microbiome. Science 2016;353:1272–1277.
https://doi.org/10.1126/science.aaf4507
40 MI Love, W Huber, S. Anders Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:1–21.
https://doi.org/10.1186/s13059-014-0550-8
41 PJ McMurdie, S. Holmes phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013;8:e61217.
https://doi.org/10.1371/journal.pone.0061217
42 JL Metcalf, ZZ Xu, S Weiss et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science 2016;351:158–162.
https://doi.org/10.1126/science.aad2646
43 JT Nearing, GM Douglas, MG Hayes et al. Microbiome differential abundance methods produce different results across 38 data-sets. Nat Commun 2022;13:342.
https://doi.org/10.1038/s41467-022-28034-z
44 NH Nguyen, Z Song, ST Bates et al. FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecol 2016;20:241–248.
https://doi.org/10.1016/j.funeco.2015.06.006
45 D Ning, M Yuan, L Wu et al. A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nat Commun 2020;11:4717.
https://doi.org/10.1038/s41467-020-18560-z
46 J Oksanen, R Kindt, P Legendre et al. The vegan package. Community Ecol Package 2007;10:719.
47 H Pages, P Aboyoun, R Gentleman et al. Biostrings: string objects representing biological sequences, and matching algorithms. R Package Version 2016;2:10.18129.
48 L Paoli, H-J Ruscheweyh, CC Forneris et al. Biosynthetic potential of the global ocean microbiome. Nature 2022;607:111–118.
https://doi.org/10.1038/s41586-022-04862-3
49 E Pasolli, L Schiffer, P Manghi et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods 2017;14:1023–1024.
https://doi.org/10.1038/nmeth.4468
50 LM Proctor, HH Creasy, JM Fettweis et al. The integrative human microbiome project. Nature 2019;569:641–648.
https://doi.org/10.1038/s41586-019-1238-8
51 W Revelle, MW. Revelle Package ‘psych’. The Compr R Archive Netw 2015;337:338.
52 B Ripley, B Venables, DM Bates et al. Package ‘mass’. Cran R 2013;538:113–120.
53 X Robin, N Turck, A Hainard et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 2011;12:1–8.
https://doi.org/10.1186/1471-2105-12-77
54 MD Robinson, DJ McCarthy, GK. Smyth edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2009;26:139–140.
https://doi.org/10.1093/bioinformatics/btp616
55 T Rognes, T Flouri, B Nichols et al. VSEARCH: a versatile open source tool for metagenomics. PeerJ 2016;4:e2584.
https://doi.org/10.7717/peerj.2584
56 PD Schloss, SL Westcott, T Ryabin et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537–7541.
https://doi.org/10.1128/AEM.01541-09
57 L Shenhav, M Thompson, TA Joseph et al. FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 2019;16:627–632.
https://doi.org/10.1038/s41592-019-0431-x
58 B Si, Y Liang, J Zhao et al. GGraph: an efficient structure-aware approach for iterative graph processing. IEEE Trans Big Data 2022;8:1182–1194.
https://doi.org/10.1109/TBDATA.2020.3019641
59 JC Stegen, X Lin, JK Fredrickson et al. Quantifying community assembly processes and identifying features that impose them. ISME J 2013;7:2069–2079.
https://doi.org/10.1038/ismej.2013.93
60 LR Thompson, JG Sanders, D McDonald et al; Earth Microbiome Project Consortium. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 2017;551:457–463.
https://doi.org/10.1038/nature24621
61 DT Truong, EA Franzosa, TL Tickle et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 2015;12:902–903.
https://doi.org/10.1038/nmeth.3589
62 F Wemheuer, JA Taylor, R Daniel et al. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environ Microbiome 2020;15:11.
https://doi.org/10.1186/s40793-020-00358-7
63 T Wen, P Xie, S Yang et al. ggClusterNet: an R package for microbiome network analysis and modularity-based multiple network layouts. iMeta 2022;1:e32.
https://doi.org/10.1002/imt2.32
64 H. Wickham Reshaping data with the reshape package. J Stat Softw 2007;21:1–20.
https://doi.org/10.18637/jss.v021.i12
65 H. Wickham ggplot2. Wiley Interdiscip Rev Comput Stat 2011a;3:180–185.
https://doi.org/10.1002/wics.147
66 H. Wickham The split-apply-combine strategy for data analysis. J Stat Softw 2011b;40:1–29.
https://doi.org/10.18637/jss.v040.i01
67 J Wirbel, K Zych, M Essex et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol 2021;22:93.
https://doi.org/10.1186/s13059-021-02306-1
68 DE Wood, SL. Salzberg Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014;15:1–12.
https://doi.org/10.1186/gb-2014-15-3-r46
69 S Xu, L Li, X Luo et al. Ggtree: a serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 2022;1:e56.
https://doi.org/10.1002/imt2.56
70 S Xu, L Zhan, W Tang et al. MicrobiotaProcess: a comprehensive R package for deep mining microbiome. Innovation 2023;4:100388.
https://doi.org/10.1016/j.xinn.2023.100388
71 Y Zhao, A Federico, T Faits et al. animalcules: interactive microbiome analytics and visualization in R. Microbiome 2021;9:1–16.
https://doi.org/10.1186/s40168-021-01013-0
[1] PAC-0713-23084-LYX_suppl_1 Download
[1] Yong-Xin Liu, Yuan Qin, Tong Chen, Meiping Lu, Xubo Qian, Xiaoxuan Guo, Yang Bai. A practical guide to amplicon and metagenomic analysis of microbiome data[J]. Protein Cell, 2021, 12(5): 315-330.
[2] Sheng Liu, Wenjing Zhao, Ping Lan, Xiangyu Mou. The microbiome in inflammatory bowel diseases: from pathogenesis to therapy[J]. Protein Cell, 2021, 12(5): 331-345.
[3] Jiayu Wu, Kai Wang, Xuemei Wang, Yanli Pang, Changtao Jiang. The role of the gut microbiome and its metabolites in metabolic diseases[J]. Protein Cell, 2021, 12(5): 360-373.
[4] Abigail Wong-Rolle, Haohan Karen Wei, Chen Zhao, Chengcheng Jin. Unexpected guests in the tumor microenvironment: microbiome in cancer[J]. Protein Cell, 2021, 12(5): 426-435.
[5] Faming Zhang, Bota Cui, Xingxiang He, Yuqiang Nie, Kaichun Wu, Daiming Fan, FMT-standardization Study Group. Microbiota transplantation: concept, methodology and strategy for its modernization[J]. Protein Cell, 2018, 9(5): 462-473.
[6] Lu Gao, Tiansong Xu, Gang Huang, Song Jiang, Yan Gu, Feng Chen. Oral microbiomes: more and more importance in oral cavity and whole body[J]. Protein Cell, 2018, 9(5): 488-500.
[7] Jun Wang, Liang Chen, Na Zhao, Xizhan Xu, Yakun Xu, Baoli Zhu. Of genes and microbes: solving the intricacies in host genomes[J]. Protein Cell, 2018, 9(5): 446-461.
[8] Marwah Doestzada, Arnau Vich Vila, Alexandra Zhernakova, Debby P. Y. Koonen, Rinse K. Weersma, Daan J. Touw, Folkert Kuipers, Cisca Wijmenga, Jingyuan Fu. Pharmacomicrobiomics: a novel route towards personalized medicine?[J]. Protein Cell, 2018, 9(5): 432-445.
[9] Guoguang Zhao, Dechao Bu, Changning Liu, Jing Li, Jian Yang, Zhiyong Liu, Yi Zhao, Runsheng Chen. CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing[J]. Prot Cell, 2012, 3(2): 148-152.
[10] Xiaoguang Zhou, Lufeng Ren, Jun Yu, Qingshu Meng, Yuntao Li, Yude Yu, . The next-generation sequencing technology and application[J]. Protein Cell, 2010, 1(6): 520-536.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed