|
|
|
|
|
|
Differential expression analyses for single-cell RNA-Seq: old questions on new data |
Zhun Miao1,Xuegong Zhang1,2( ) |
1. MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China
2. School of Life Sciences, Tsinghua University, Beijing 100084, China |
|
|
关键词 :
 
|
Abstract: Background: Single-cell RNA sequencing (scRNA-seq) is an emerging technology that enables high resolution detection of heterogeneities between cells. One important application of scRNA-seq data is to detect differential expression (DE) of genes. Currently, some researchers still use DE analysis methods developed for bulk RNA-Seq data on single-cell data, and some new methods for scRNA-seq data have also been developed. Bulk and single-cell RNA-seq data have different characteristics. A systematic evaluation of the two types of methods on scRNA-seq data is needed.
Results: In this study, we conducted a series of experiments on scRNA-seq data to quantitatively evaluate 14 popular DE analysis methods, including both of traditional methods developed for bulk RNA-seq data and new methods specifically designed for scRNA-seq data. We obtained observations and recommendations for the methods under different situations.
Conclusions: DE analysis methods should be chosen for scRNA-seq data with great caution with regard to different situations of data. Different strategies should be taken for data with different sample sizes and/or different strengths of the expected signals. Several methods for scRNA-seq data show advantages in some aspects, and DEGSeq tends to outperform other methods with respect to consistency, reproducibility and accuracy of predictions on scRNA-seq data. |
Author Summary Differential expression (DE) analysis is to find the genes whose expression values are significantly different among the groups of samples compared. Gene expression values could be measured by bulk RNA sequencing (RNA-seq) or single-cell RNA sequencing (scRNA-seq), which is emerging recently and could get the expression values of individual cell, while DE analysis methods designed for bulk RNA-seq are still commonly used on scRNA-seq data. We found that, since the characteristics of the two kinds of data are quite different, different DE analysis methods should be carefully chosen with regard to different situations of data when applied to scRNA-seq. |
Key words:
single-cell
RNA-Seq
differential expression
|
收稿日期: 2016-07-13
出版日期: 2016-12-01
|
|
基金资助: |
Corresponding Author(s):
Xuegong Zhang
|
1 |
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods, 5, 621–628
https://doi.org/10.1038/nmeth.1226
pmid: 18516045
|
2 |
Stegle, O., Teichmann, S. A. and Marioni, J. C. (2015) Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16, 133–145
https://doi.org/10.1038/nrg3833
pmid: 25628217
|
3 |
Shapiro, E., Biezuner, T. and Linnarsson, S. (2013) Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet., 14, 618–630
https://doi.org/10.1038/nrg3542
pmid: 23897237
|
4 |
Macaulay, I. C. and Voet, T. (2014) Single cell genomics: advances and future perspectives. PLoS Genet., 10, e1004126
https://doi.org/10.1371/journal.pgen.1004126
pmid: 24497842
|
5 |
Tang, F., Lao, K. and Surani, M. A. (2011) Development and applications of single-cell transcriptome analysis. Nat. Methods, 8, S6–S11
pmid: 21451510
|
6 |
Kanter, I. and Kalisky, T. (2015) Single cell transcriptomics: methods and applications. Front. Oncol., 5, 53
https://doi.org/10.3389/fonc.2015.00053
pmid: 25806353
|
7 |
Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. and Teichmann, S. A. (2015) The technology and biology of single-cell RNA sequencing. Mol. Cell, 58, 610–620
https://doi.org/10.1016/j.molcel.2015.04.005
pmid: 26000846
|
8 |
Sandberg, R. (2014) Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods, 11, 22–24
https://doi.org/10.1038/nmeth.2764
pmid: 24524133
|
9 |
Saliba, A. E., Westermann, A. J., Gorski, S. A. and Vogel, J. (2014) Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res., 42, 8845–8860
https://doi.org/10.1093/nar/gku555
pmid: 25053837
|
10 |
Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biol., 11, R106
https://doi.org/10.1186/gb-2010-11-10-r106
pmid: 20979621
|
11 |
Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140
https://doi.org/10.1093/bioinformatics/btp616
pmid: 19910308
|
12 |
Wang, L., Feng, Z., Wang, X., Wang, X. and Zhang, X. (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 26, 136–138
https://doi.org/10.1093/bioinformatics/btp612
pmid: 19855105
|
13 |
Kharchenko, P. V., Silberstein, L. and Scadden, D. T. (2014) Bayesian approach to single-cell differential expression analysis. Nat. Methods, 11, 740–742
https://doi.org/10.1038/nmeth.2967
pmid: 24836921
|
14 |
Trapnell, C., Cacchiarelli, D., Grimsby, J., Pokharel, P., Li, S., Morse, M., Lennon, N. J., Livak, K. J., Mikkelsen, T. S. and Rinn, J. L. (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol., 32, 381–386
https://doi.org/10.1038/nbt.2859
pmid: 24658644
|
15 |
Delmans, M. and Hemberg, M. (2016) Discrete distributional differential expression (D3E)—a tool for gene expression analysis of single-cell RNA-seq data. BMC Bioinformatics, 17, 110
https://doi.org/10.1186/s12859-016-0944-6
pmid: 26927822
|
16 |
Vu, T. N., Wills, Q. F., Kalari, K. R., Niu, N., Wang, L., Rantalainen, M. and Pawitan, Y. (2016) Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics, 32, 2128–2135
https://doi.org/10.1093/bioinformatics/btw202
pmid: 27153638
|
17 |
Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., McElrath, M. J., Prlic, M., (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol., 16, 278
https://doi.org/10.1186/s13059-015-0844-5
pmid: 26653891
|
18 |
Wu, L., Zhang, X., Zhao, Z., Wang, L., Li, B., Li, G., Dean, M., Yu, Q., Wang, Y., Lin, X., (2015) Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells. Gigascience, 4, 51
https://doi.org/10.1186/s13742-015-0091-4
pmid: 26550473
|
19 |
Freeman, B. T., Jung, J. P. and Ogle, B. M. (2015) Single-cell RNA-seq of bone marrow-derived mesenchymal stem cells reveals unique profiles of lineage priming. PLoS One, 10, e0136199
https://doi.org/10.1371/journal.pone.0136199
pmid: 26352588
|
20 |
Avraham, R., Haseley, N., Brown, D., Penaranda, C., Jijon, H. B., Trombetta, J. J., Satija, R., Shalek, A. K., Xavier, R. J., Regev, A., (2015) Pathogen cell-to-cell variability drives heterogeneity in host immune responses. Cell, 162, 1309–1321
https://doi.org/10.1016/j.cell.2015.08.027
pmid: 26343579
|
21 |
Blakeley, P., Fogarty, N. M. E., Valle, I. D., Wamaitha, S. E., Hu, T. X., Elder, K., Snell, P., Christie, L., Robson, P. and Niakan, K. K. (2015) Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development, 142, 3613
https://doi.org/10.1242/dev.131235
|
22 |
Fan, X., Zhang, X., Wu, X., Guo, H., Hu, Y., Tang, F. and Huang, Y. (2015) Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol., 16, 148
https://doi.org/10.1186/s13059-015-0706-1
pmid: 26201400
|
23 |
Tasic, B., Menon, V., Nguyen, T. N., Kim, T. K., Jarsky, T., Yao, Z., Levi, B., Gray, L. T., Sorensen, S. A., Dolbeare, T., (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci., 19, 335–346
https://doi.org/10.1038/nn.4216
pmid: 26727548
|
24 |
Hardcastle, T. J. and Kelly, K. A. (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11, 422
https://doi.org/10.1186/1471-2105-11-422
pmid: 20698981
|
25 |
Di, Y., Schafer, D. W., Cumbie, J. S. and Chang, J. H. (2011) The NBP negative binomial model for assessing differential gene expression from RNA-Seq. Stat. Appl. Genet. Mol. Biol., 10, 1–28
https://doi.org/10.2202/1544-6115.1637
|
26 |
Trapnell, C., Hendrickson, D. G., Sauvageau, M., Goff, L., Rinn, J. L. and Pachter, L. (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol., 31, 46–53
https://doi.org/10.1038/nbt.2450
pmid: 23222703
|
27 |
Auer, P. L. and Doerge, R. W. (2011) A two-stage Poisson model for testing RNA-Seq data. Stat. Appl. Genet. Mol. Biol., 10 doi: 10.2202/1544-6115.1627
|
28 |
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W. and Smyth, G. K. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res., 43, e47
https://doi.org/10.1093/nar/gkv007
pmid: 25605792
|
29 |
Frazee, A. C., Pertea, G., Jaffe, A. E., Langmead, B., Salzberg, S. L. and Leek, J. T. (2014) Flexible analysis of transcriptome assemblies with Ballgown. Biorxiv: http://dx.doi.org/10.1101/003665
|
30 |
Li, J. and Tibshirani, R. (2013) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat. Methods Med. Res., 22, 519–536
https://doi.org/10.1177/0962280211428386
pmid: 22127579
|
31 |
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.Durbin, R., and the 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079
https://doi.org/10.1093/bioinformatics/btp352
pmid: 19505943
|
32 |
Shalek, A. K., Satija, R., Shuga, J., Trombetta, J. J., Gennert, D., Lu, D., Chen, P., Gertner, R. S., Gaublomme, J. T., Yosef, N., (2014) Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature, 510, 363–369
pmid: 24919153
|
33 |
Brunskill, E. W., Park, J. S., Chung, E., Chen, F., Magella, B. and Potter, S. S. (2014) Single cell dissection of early kidney development: multilineage priming. Development, 141, 3093–3101
https://doi.org/10.1242/dev.110601
pmid: 25053437
|
34 |
Kimmerling, R. J., Lee Szeto, G., Li, J. W., Genshaft, A. S., Kazer, S. W., Payer, K. R., de Riba Borrajo, J., Blainey, P. C., Irvine, D. J., Shalek, A. K., (2016) A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages. Nat. Commun., 7, 10220
https://doi.org/10.1038/ncomms10220
pmid: 26732280
|
35 |
Su, Z., Łabaj, P. P., Li, S., Thierry-Mieg, J., Thierry-Mieg, D., Shi, W., Wang, C., Schroth, G. P., Setterquist, R. A., and Thompson, J. F. (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol., 32, 903–914
https://doi.org/10.1038/nbt.2957
pmid: 25150838
|
36 |
Tan, P. K., Downey, T. J., Spitznagel, E. L. Jr, Xu, P., Fu, D., Dimitrov, D. S., Lempicki, R. A., Raaka, B. M. and Cam, M. C. (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res., 31, 5676–5684.
https://doi.org/10.1093/nar/gkg763
pmid: 14500831
|
37 |
Shi, L., Shi, L., Reid, L. H., Jones, W. D., Shippy, R., Warrington, J. A., Baker, S. C., Collins, P. J., de Longueville, F., Kawasaki, E. S., (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol., 24, 1151–1161
https://doi.org/10.1038/nbt1239
pmid: 16964229
|
38 |
Trapnell, C., Pachter, L. and Salzberg, S. L. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105–1111
https://doi.org/10.1093/bioinformatics/btp120
pmid: 19289445
|
39 |
Anders, S., Pyl, P.T., Huber, W (2015) HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 2015, 31, 166–169
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|