Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell, which can reflect the heterogeneity between cells. However, batch effects caused by non-biological factors may hinder data integration and downstream analysis. Although the batch effect can be evaluated by visualizing the data, which actually is subjective and inaccurate. In this work, we propose a quantitative method cKBET, which considers the batch and cell type information simultaneously. The cKBET method accesses batch effects by comparing the global and local fraction of cells of different batches in different cell types. We verify the performance of our cKBET method on simulated and real biological data sets. The experimental results show that our cKBET method is superior to existing methods in most cases. In general, our cKBET method can detect batch effect with either balanced or unbalanced cell types, and thus evaluate batch correction methods.
T, Hashimshony F, Wagner N, Sher I Yanai . CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Reports, 2012, 2( 3): 666–673
2
S, Picelli Å K, Björklund O R, Faridani S, Sagasser G, Winberg R Sandberg . Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods, 2013, 10( 11): 1096–1098
3
E Z, Macosko A, Basu R, Satija J, Nemesh K, Shekhar M, Goldman I, Tirosh A R, Bialas N, Kamitaki E M, Martersteck J J, Trombetta D A, Weitz J R, Sanes A K, Shalek A, Regev S A McCarroll . Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015, 161( 5): 1202–1214
4
A M, Klein L, Mazutis I, Akartuna N, Tallapragada A, Veres V, Li L, Peshkin D A, Weitz M W Kirschner . Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 2015, 161( 5): 1187–1201
5
J, Cao J S, Packer V, Ramani D A, Cusanovich C, Huynh R, Daza X, Qiu C, Lee S N, Furlan F J, Steemers A, Adey R H, Waterston C, Trapnell J Shendure . Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 2017, 357( 6352): 661–667
6
G X Y, Zheng J M, Terry P, Belgrader P, Ryvkin Z W, Bent R, Wilson S B, Ziraldo T D, Wheeler G P, McDermott J, Zhu M T, Gregory J, Shuga L, Montesclaros J G, Underwood D A, Masquelier S Y, Nishimura M, Schnall-Levin P W, Wyatt C M, Hindson R, Bharadwaj A, Wong K D, Ness L W, Beppu H J, Deeg C, McFarland K R, Loeb W J, Valente N G, Ericson E A, Stevens J P, Radich T S, Mikkelsen B J, Hindson J H Bielas . Massively parallel digital transcriptional profiling of single cells. Nature Communications, 2017, 8: 14049
7
X, Zhang S L, Marjani Z, Hu S M, Weissman X, Pan S Wu . Single-cell sequencing for precise cancer research: progress and prospects. Cancer Research, 2016, 76( 6): 1305–1312
8
H, Chen F, Ye G Guo . Revolutionizing immunology with single-cell RNA sequencing. Cellular & Molecular Immunology, 2019, 16( 3): 242–249
9
S C, Hicks F W, Townes M, Teng R A Irizarry . Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics, 2018, 19( 4): 562–578
10
P Y, Tung J D, Blischak C J, Hsiao D A, Knowles J E, Burnett J K, Pritchard Y Gilad . Batch effects and the effective design of single-cell gene expression studies. Scientific Reports, 2017, 7: 39921
11
W E, Johnson C, Li A Rabinovic . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 2007, 8( 1): 118–127
12
M E, Ritchie B, Phipson D, Wu Y, Hu C W, Law W, Shi G K Smyth . limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 2015, 43( 7): e47
13
D, Risso J, Ngai T P, Speed S Dudoit . Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014, 32( 9): 896–902
14
J T Leek . Svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Research, 2014, 42( 21): e161
15
L, Haghverdi A T L, Lun M D, Morgan J C Marioni . Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology, 2018, 36( 5): 421–427
16
I, Korsunsky N, Millard J, Fan K, Slowikowski F, Zhang K, Wei Y, Baglaenko M, Brenner P R, Loh S Raychaudhuri . Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods, 2019, 16( 12): 1289–1296
17
E, Aliverti J L, Tilson D L, Filer B, Babcock A, Colaneri J, Ocasio T R, Gershon K C, Wilhelmsen D B Dunson . Projected t-SNE for batch correction. Bioinformatics, 2020, 36( 11): 3522–3527
18
X, Li K, Wang Y, Lyu H, Pan J, Zhang D, Stambolian K, Susztak M P, Reilly G, Hu M Li . Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nature Communications, 2020, 11( 1): 2338
19
T, Wang T S, Johnson W, Shao Z, Lu B R, Helm J, Zhang K Huang . BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biology, 2019, 20( 1): 165
20
U, Shaham K P, Stanton J, Zhao H, Li K, Raddassi R, Montgomery Y Kluger . Removal of batch effects using distribution-matching residual networks. Bioinformatics, 2017, 33( 16): 2539–2546
21
M, Büttner Z, Miao F A, Wolf S A, Teichmann F J Theis . A test metric for assessing single-cell RNA-seq batch correction. Nature Methods, 2019, 16( 1): 43–49
22
K Pearson . LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901, 2( 11): 559–572
23
der Maaten L, Van G Hinton . Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9( 86): 2579–2605
24
P J Rousseeuw . Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987, 20: 53–65
25
W F Massy . Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 1965, 60( 309): 234–256
26
D J, McCarthy K R, Campbell A T L, Lun Q F Wills . Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics, 2017, 33( 8): 1179–1186
27
A A, Kolodziejczyk J K, Kim J C H, Tsang T, Ilicic J, Henriksson K N, Natarajan A C, Tuck X, Gao M, Bühler P, Liu J C, Marioni S A Teichmann . Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell, 2015, 17( 4): 471–485
28
Tabula Muris Consortium The . Single-cell transcriptomics of 20 mouse organs creates a Tabula muris. Nature, 2018, 562( 7727): 367–372