Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

邮发代号 80-971

Quantitative Biology  2020, Vol. 8 Issue (1): 78-94   https://doi.org/10.1007/s40484-019-0192-7
  本期目录
Imputation of single-cell gene expression with an autoencoder neural network
Md. Bahadur Badsha1, Rui Li1, Boxiang Liu2, Yang I. Li3, Min Xian4, Nicholas E. Banovich5, Audrey Qiuyan Fu1()
1. Department of Statistical Science, Institute for Bioinformatics and Evolutionary Studies, Institute for Modeling Collaboration & Innovation, University of Idaho, Moscow, ID 83844, USA
2. Department of Biology, Stanford University, Stanford, CA 94305 , USA
3. Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
4. Department of Computer Science, University of Idaho, Idaho Falls, ID 83401, USA
5. The Translational Genomics Research Institute, Phoenix, AZ 85004, USA
 全文: PDF(3905 KB)   HTML
Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses.

Methods: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates.

Results: On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU.

Conclusions: We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.

Key wordssingle-cell    gene expression    deep learning    autoencoder
收稿日期: 2019-08-01      出版日期: 2020-03-23
Corresponding Author(s): Audrey Qiuyan Fu   
 引用本文:   
. [J]. Quantitative Biology, 2020, 8(1): 78-94.
Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu. Imputation of single-cell gene expression with an autoencoder neural network. Quant. Biol., 2020, 8(1): 78-94.
 链接本文:  
https://academic.hep.com.cn/qb/CN/10.1007/s40484-019-0192-7
https://academic.hep.com.cn/qb/CN/Y2020/V8/I1/78
Fig.1  
Fig.2  
Fig.3  
Fig.4  
Fig.5  
Fig.6  
Fig.7  
Adam adaptive moment estimation
ALRA adaptively-thresholded low-rank approximation
BCSS between-cluster sum of squares
CPU central processing unit
DCA deep count autoencoder
GB gigabyte
GPU graphics processing unit
GTEx genotype-tissue expression
GtMSE mean squared error comparing with the ground truth
gtMSEall mean squared error comparing with the ground truth on all values
gtMSEnz mean squared error comparing with the ground truth only on ?nonzero values
LATE learning with AuToEncoder
MAGIC Markov affinity-based graph imputation of cells
MSE mean squared error
PBMC peripheral blood mononuclear cell
PC principal component
PCA principal component analysis
RAM random access memory
ReLU rectified linear unit
SAVER single-cell analysis via expression recovery
scRNA-seq single-cell RNA-sequencing
scVI single-cell variational inference
SVD singular value decomposition
TB terabyte
TRANS-?LATE TRANSfer learning with LATE
tSNE t-distributed stochastic neighbor embedding
TSS total sum of squares
WCSS within-cluster sum of squares
  
1 A. A. Kolodziejczyk, , J. K. Kim, , V. Svensson, , J. C. Marioni, and S. A. Teichmann, (2015) The technology and biology of single-cell RNA sequencing. Mol. Cell, 58, 610–620
https://doi.org/10.1016/j.molcel.2015.04.005. pmid: 26000846
2 C. Ziegenhain, , B. Vieth, , S. Parekh, , B. Reinius, , A. Guillaumet-Adkins, , M. Smets, , H. Leonhardt, , H. Heyn, , I. Hellmann, and W. Enard, (2017) Comparative analysis of single-cell RNA sequencing methods. Mol. Cell, 65, 631–643.e4
https://doi.org/10.1016/j.molcel.2017.01.023. pmid: 28212749
3 W. V. Li, and J. J. Li, (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat. Commun., 9, 997
https://doi.org/10.1038/s41467-018-03405-7. pmid: 29520097
4 M. Huang, , J. Wang, , E. Torre, , H. Dueck, , S. Shaffer, , R. Bonasio, , J. I. Murray, , A. Raj, , M. Li, and N. R. Zhang, (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods, 15, 539–542
https://doi.org/10.1038/s41592-018-0033-z. pmid: 29941873
5 D. van Dijk, , R. Sharma, , J. Nainys, , K. Yim, , P. Kathail, , A. J. Carr, , C. Burdziak, , K. R. Moon, , C. L. Chaffer, , D. Pattabiraman, , et al. (2018) Recovering gene interactions from single-cell data using data diffusion. Cell, 174, 716–729.e27
https://doi.org/10.1016/j.cell.2018.05.061. pmid: 29961576
6 G. Eraslan, , L. M. Simon, , M. Mircea, , N. S. Mueller, and F. J. Theis, (2019) Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun., 10, 390
https://doi.org/10.1038/s41467-018-07931-2. pmid: 30674886
7 R. Lopez, , J. Regier, , M. B. Cole, , M. I. Jordan, and N. Yosef, (2018) Deep generative modeling for single-cell transcriptomics. Nat. Methods, 15, 1053–1058
https://doi.org/10.1038/s41592-018-0229-2. pmid: 30504886
8 R. Bacher, and C. Kendziorski, (2016) Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol., 17, 63
https://doi.org/10.1186/s13059-016-0927-y. pmid: 27052890
9 O. Stegle, , S. A. Teichmann, and J. C. Marioni, (2015) Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16, 133–145
https://doi.org/10.1038/nrg3833. pmid: 25628217
10 G. E. Hinton, and R. R. Salakhutdinov, (2006) Reducing the dimensionality of data with neural networks. Science, 313, 504–507
https://doi.org/10.1126/science.1127647. pmid: 16873662
11 Y. Bengio, (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning. pp. 17–36. Bellevue
12 Z. Zhu, , X. Wang, , S. Bai, , C. Yao and X. Bai, (2016) Deep learning representation using autoencoder for 3D shape retrieval. Neurocomputing, 204, 41–50
https://doi.org/10.1016/j.neucom.2015.08.127.
13 D. E. Rumelhart, , G. E. Hinton, and R. J. Williams, (1986) Learning representations by back-propagating errors. Nature, 323, 533–536
https://doi.org/10.1038/323533a0.
14 D. P. Kingma, and J. Ba, (2015) Adam: A method for stochastic optimization. In: Proceeding of the 3rd International Conference for Learning Representations. San Diego
15 G. E. Dahl, , T. N. Sainath, and G. E. Hinton, (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of IEEE international conference on acoustics, speech and signal processing, pp. 8609–8613. IEEE Service Center
16 I. Goodfellow, , Y. Bengio, and A. Courville, (2016) Deep Learing. Cambridge: MIT Press
17 G. C. Linderman, , J. Zhao, and Y. Kluger, (2018) Zero-preserving imputation of scRNA-seq data using low-rank approximation. bioRxiv: 397588
18 L. Zappia, , B. Phipson, and A. Oshlack, (2017) Splatter: simulation of single-cell RNA sequencing data. Genome Biol., 18, 174
https://doi.org/10.1186/s13059-017-1305-0. pmid: 28899397
19 K. Shekhar, , S. W. Lapan, , I. E. Whitney, , N. M. Tran, , E. Z. Macosko, , M. Kowalczyk, , X. Adiconis, , J. Z. Levin, , J. Nemesh, , M. Goldman, , et al. (2016) Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell, 166, 1308–1323.e30
https://doi.org/10.1016/j.cell.2016.07.054. pmid: 27565351
20 W. E. Johnson, , C. Li, and A. Rabinovic, (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8, 118–127
https://doi.org/10.1093/biostatistics/kxj037. pmid: 16632515
21 Z. Zhu, , T. Wang, and R. J. Samworth, (2019) High-dimensional principal component analysis with heterogeneous missingness. arXiv:1906.12125
22 F. Paul, , Y. Arkin, , A. Giladi, , D. A. Jaitin, , E. Kenigsberg,, H. Keren-Shaul, , D. Winter, , D. Lara-Astiaso,, M. Gury, , A. Weiner, , et al. (2015) Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell, 163, 1663–1677
https://doi.org/10.1016/j.cell.2015.11.013. pmid: 26627738
23 G. X. Y. Zheng, , J. M. Terry, , P. Belgrader, , P. Ryvkin, , Z. W. Bent, , R. Wilson, , S. B. Ziraldo,, T. D. Wheeler, , G. P. McDermott, , J. Zhu,, et al. (2017) Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8, 14049
https://doi.org/10.1038/ncomms14049. pmid: 28091601
[1] Supplementary Material 1 Download
[2] Supplementary Material 2 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed