Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2021, Vol. 9 Issue (4) : 451-462    https://doi.org/10.15302/J-QB-021-0261
METHOD
Adaptive total variation constraint hypergraph regularized NMF for single-cell RNA-seq data analysis
Ya-Li Zhu, Xiao-Ning Zhang, Chuan-Yuan Wang, Jin-Xing Liu, Xiang-Zhen Kong()
School of Computer Science, Qufu Normal University, Rizhao 276826, China
 Download: PDF(1887 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: Single-cell RNA sequencing (scRNA-seq) data provides a whole new view to study disease and cell differentiation development. With the explosive increment of scRNA-seq data, effective models are demanded for mining the intrinsic biological information.

Methods: This paper proposes a novel non-negative matrix factorization (NMF) method for clustering and gene co-expression network analysis, termed Adaptive Total Variation Constraint Hypergraph Regularized NMF (ATV-HNMF). ATV-HNMF can adaptively select the different schemes to denoise the cluster or preserve the cluster boundary information between clusters based on the gradient information. Besides, ATV-HNMF incorporates hypergraph regularization, which can consider high-order relationships between cells to reserve the intrinsic structure of the space.

Results: Experiments show that the performances on clustering outperform other compared methods, and the network construction results are consistent with previous studies, which illustrate that our model is effective and useful.

Conclusion: From the clustering results, we can see that ATV-HNMF outperforms other methods, which can help us to understand the heterogeneity. We can discover many disease-related genes from the constructed network, and some are worthy of further clinical exploration.

Keywords adaptive total variation      single-cell RNA sequencing      network analysis      nonnegative matrix factorization      hypergraph     
Corresponding Author(s): Xiang-Zhen Kong   
Just Accepted Date: 21 June 2021   Online First Date: 19 July 2021    Issue Date: 01 December 2021
 Cite this article:   
Ya-Li Zhu,Xiao-Ning Zhang,Chuan-Yuan Wang, et al. Adaptive total variation constraint hypergraph regularized NMF for single-cell RNA-seq data analysis[J]. Quant. Biol., 2021, 9(4): 451-462.
 URL:  
https://academic.hep.com.cn/qb/EN/10.15302/J-QB-021-0261
https://academic.hep.com.cn/qb/EN/Y2021/V9/I4/451
Datasets The Number of genes The Number of cells Cell types Species
Islet 39851 1600 4 Homo sapiens
Lung Epithelial 34816 540 2 Homo sapiens
Darmanis 22085 420 8 Homo sapiens
Goolam 40315 124 5 Mus musculus
Treutlein 959 80 5 Mus musculus
Grover 14739 135 2 Mus musculus
Breton 20689 957 4 Homo sapiens
Tab.1  Data information about single-cell datasets
Fig.1  The impact of parameter λ on seven datasets.
Dataset t-SNE PCA Kmeans SSC SC SIMLR NMF ATV-HNMF
Islet 0.5975 0.6008 0.601 0.5115 0.7081 0.0534 0.68 0.7093
Lung Epithelial 0.6031 0.5822 0.5757 0.6125 0.5449 0.0714 0.7137 0.7437
Darmanis 0.5225 0.4594 0.4182 0.4441 0.4445 0.2991 0.7305 0.7617
Goolam 0.5725 0.3278 0.3453 0.5202 0.5258 0.3982 0.5904 0.6593
Treutlein 0.5473 0.5727 0.6172 0.5242 0.6191 0.5114 0.5297 0.6573
Grover 0.2712 0.2712 0.2712 0.1849 0.2261 0.0946 0.2336 0.2749
Breton 0.0902 0.0396 0.0411 0.0397 0.0686 0.0442 0.0801 0.0931
Average 0.4577 0.4076 0.4099 0.4053 0.4481 0.2103 0.5082 0.5570
Tab.2  ARI results on single-cell datasets
Dataset t-SNE PCA Kmeans SSC SC SIMLR NMF ATV-HNMF
Islet 0.3843 0.3864 0.3870 0.3174 0.5592 0.0678 0.7685 0.7247
Lung Epithelial 0.7183 0.6741 0.6617 0.7115 0.6255 0.0567 0.5977 0.6320
Darmanis 0.7043 0.6253 0.5686 0.584 0.591 0.5693 0.7765 0.795
Goolam 0.6021 0.4445 0.4654 0.5871 0.5826 0.6071 0.7279 0.7403
Treutlein 0.7346 0.7530 0.7157 0.7088 0.8196 0.6881 0.7018 0.7558
Grover 0.2197 0.2125 0.2080 0.1382 0.1717 0.0711 0.1835 0.2108
Breton 0.2370 0.1467 0.1486 0.1672 0.1771 0.2097 0.2034 0.1939
Average 0.5143 0.4632 0.4507 0.4591 0.5038 0.3242 0.5656 0.5789
Tab.3  NMI results on single-cell datasets
Fig.2  The clustering result graph of the five single-cell datasets.
Fig.3  Network construction based on human islet cells.
Fig.4  Network construction based on Lung Epithelial cells.
Islet?????????????????? Lung Epithelial????????????
Gene Annotations ???Gene ???????Annotations
HIF1A This gene encodes the alpha subunit of transcription factor hypoxia-inducible factor-1 (HIF-1) ???SDR16C5 Diseases associated with SDR16C5 include psoriasis
TMEM14B Protein coding gene ???RPS16 Protein coding gene
SELENOT This gene encodes a seleno protein, containing a seleno cysteine (Sec) residue at the active site ???ARPP19 Among its related pathways are cell cycle, mitotic and DNA damage
ATP5MC2 Among its related pathways are respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins ???DAPP1 Protein coding gene
EIF6 Diseases associated with EIF6 include pyloric atresiaand shwachman-diamond syndrome 1 ???CXCL5 Diseases associated with CXCL5 include pediatric ulcerative colitis and pulmonary sarcoidosis
ATP6V1E1 This gene encodes a component of vacuolar ATPase (V-ATPase) ???RPL37A Among its related pathways are Viral mRNA translation and influenza viral RNA transcription and replication
MRPS24 Among its related pathways are Mitochondrial translation and organelle biogenesis and maintenance ???EEF1B2 This gene encodes a translation elongation factor
RPS19 Protein coding gene ???BANF1 Among its related pathways are cell cycle, mitotic and HIV life cycle
RPL11 Diseases associated with RPL11 include diamond-blackfan anemia 7 and diamond-blackfan anemia ???BCL2A1 This gene encodes a member of the BCL-2 protein family
RPL27 Among its related pathways are viral mRNA Translation and influenza viral RNA transcription and replication ???USP22 Protein coding gene
Tab.4  Detailed information on the top ten selected genes
Fig.5  The framework of the method ATV-HNMF.
Algorithm: ATV-HNMF
Input: X, parameter λ
Output: U? m×k, F?k× n
Construct hypergraph Laplacian matrix L?n×n
Initialize: U0, F0</m:mrow></m:math>
Repeat
Update U by Eq. (12)
Update F by Eq. (13)
Until convergence
  
1 A. C. Villani, , R. Satija, , G. Reynolds, , S. Sarkizova, , K. Shekhar, , J. Fletcher, , M. Griesbeck, , A. Butler, , S. Zheng, , S. Lazo, , et al.et al. (2017) Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science, 356, eaah4573.
https://doi.org/10.1126/science.aah4573 pmid: 28428369
2 F. Tang, , C. Barbacioru, , Y. Wang, , E. Nordman, , C. Lee, , N. Xu, , X. Wang, , J. Bodeau, , B. B. Tuch, , A. Siddiqui, , et al.et al. (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods, 6, 377–382..
https://doi.org/10.1038/nmeth.1315 pmid: 19349980
3 S. Islam, , U. Kjällquist, , A. Moliner, , P. Zajac, , J. B. Fan, , P. Lönnerberg, and S. Linnarsson, (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res., 21, 1160–1167..
https://doi.org/10.1101/gr.110882.110 pmid: 21543516
4 Y. Xin, , J. Kim, , H. Okamoto, , M. Ni, , Y. Wei, , C. Adler, , A. J. Murphy, , G. D. Yancopoulos, , C. Lin, and J. Gromada, (2016) RNA sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metab., 24, 608–615..
https://doi.org/10.1016/j.cmet.2016.08.018 pmid: 27667665
5 Y. Xu, , T. Mizuno, , A. Sridharan, , Y. Du, , M. Guo, , J. Tang, , K. A. Wikenheiser-Brokamp, , A. T. Perl, , V. A. Funari, , J. J. Gokey, , et al.et al. (2016) Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis. JCI Insight, 1, e90558.
https://doi.org/10.1172/jci.insight.90558 pmid: 27942595
6 D. Usoskin, , A. Furlan, , S. Islam, , H. Abdo, , P. Lönnerberg, , D. Lou, , J. Hjerling-Leffler, , J. Haeggström, , O. Kharchenko, , P. V. Kharchenko, , et al.et al. (2015) Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci., 18, 145–153..
https://doi.org/10.1038/nn.3881 pmid: 25420068
7 A. P. Patel, , I. Tirosh, , J. J. Trombetta, , A. K. Shalek, , S. M. Gillespie, , H. Wakimoto, , D. P. Cahill, , B. V. Nahed, , W. T. Curry, , R. L. Martuza, , et al.et al. (2014) Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science, 344, 1396–1401..
https://doi.org/10.1126/science.1254257 pmid: 24925914
8 B. Treutlein, , D. G. Brownfield, , A. R. Wu, , N. F. Neff, , G. L. Mantalas, , F. H. Espinoza, , T. J. Desai, , M. A. Krasnow, and S. R. Quake, (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature, 509, 371–375..
https://doi.org/10.1038/nature13173 pmid: 24739965
9 L. d. Maaten, and G. Hinton, (2008) Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605.
10 S. Wold, , K. Esbensen, and P. Geladi, (1987) Principal component analysis. Chemom. Intell. Lab. Syst., 2, 37–52..
https://doi.org/10.1016/0169-7439(87)80084-9
11 U. von Luxburg, (2007) A tutorial on spectral clustering. Stat. Comput., 17, 395–416..
https://doi.org/10.1007/s11222-007-9033-z
12 B. Wang, , J. Zhu, , E. Pierson, , D. Ramazzotti, and S. Batzoglou, (2017) Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods, 14, 414–416..
https://doi.org/10.1038/nmeth.4207 pmid: 28263960
13 C.-N. Jiao, , Y.-L. Gao, , N. Yu, , J.-X. Liu, and L.-Y. Qi, (2020) Hyper-graph regularized constrained NMF for selecting differentially expressed genes and tumor classification. IEEE J. Biomed. Health Inform., 24, 3002–3011..
https://doi.org/10.1109/JBHI.2020.2975199 pmid: 32086224
14 X. Lin, and P. C. Boutros, (2020) Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics, 21, 7.
https://doi.org/10.1186/s12859-019-3312-5 pmid: 31906867
15 N. Yu, , M. J. Wu, , J. X. Liu, , C. H. Zheng, and Y. Xu, (2020) Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans. Cybern., .
https://doi.org/10.1109/TCYB.2020.3000799 pmid: 32603306
16 Z. Gao, , Y.-T. Wang, , Q.-W. Wu, , J.-C. Ni, and C.-H. Zheng, (2020) Graph regularized L2,1-nonnegative matrix factorization for miRNA-disease association prediction. BMC Bioinformatics, 21, 61.
https://doi.org/10.1186/s12859-020-3409-x pmid: 32070280
17 X. Zhu, , T. Ching, , X. Pan, , S. M. Weissman, and L. Garmire, (2017) Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization. PeerJ, 5, e2888.
https://doi.org/10.7717/peerj.2888 pmid: 28133571
18 K. R. Moon, , J. S. Stanley, III, D. Burkhardt, , D. van Dijk, , G. Wolf, and S. Krishnaswamy, (2018) Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr. Opin. Syst. Biol., 7, 36–46..
https://doi.org/10.1016/j.coisb.2017.12.008
19 D. Cai, , X. He, , J. Han, and T. S. Huang, (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell., 33, 1548–1560..
https://doi.org/10.1109/TPAMI.2010.231 pmid: 21173440
20 K. Zeng, , J. Yu, , C. Li, , J. You, and T. Jin, (2014) Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing, 138, 209–217..
https://doi.org/10.1016/j.neucom.2014.01.043
21 L. I. Rudin, , S. Osher, and E. Fatemi, (1992) Nonlinear total variation based noise removal algorithms. Physica. D, 60, 259–268..
https://doi.org/10.1016/0167-2789(92)90242-F
22 C. Leng, , G. Cai, , D. Yu, and Z. Wang, (2017) Adaptive total-variation for non-negative matrix factorization on manifold. Pattern Recognit. Lett., 98, 68–74..
https://doi.org/10.1016/j.patrec.2017.08.027
23 S. Darmanis, , S. A. Sloan, , Y. Zhang, , M. Enge, , C. Caneda, , L. M. Shuer, , M. G. Hayden Gephart, , B. A. Barres, and S. R. Quake, (2015) A survey of human brain transcriptome diversity at the single cell level. Proc. Natl. Acad. Sci. USA, 112, 7285–7290..
https://doi.org/10.1073/pnas.1507125112 pmid: 26060301
24 M. Goolam, , A. Scialdone, , S. J. L. Graham, , I. C. Macaulay, , A. Jedrusik, , A. Hupalowska, , T. Voet, , J. C. Marioni, and M. Zernicka-Goetz, (2016) Heterogeneity in oct4 and sox2 targets biases cell fate in 4-cell mouse embryos. Cell, 165, 61–74..
https://doi.org/10.1016/j.cell.2016.01.047 pmid: 27015307
25 B. Treutlein, , D. G. Brownfield, , A. R. Wu, , N. F. Neff, , G. L. Mantalas, , F. H. Espinoza, , T. J. Desai, , M. A. Krasnow, and S. R. Quake, (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature, 509, 371–375..
https://doi.org/10.1038/nature13173 pmid: 24739965
26 A. Grover, , A. Sanjuan-Pla, , S. Thongjuea, , J. Carrelha, , A. Giustacchini, , A. Gambardella, , I. Macaulay, , E. Mancini, , T. C. Luis, , A. Mead, , et al. (2016) Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells. Nat. Commun., 7, 11075.
https://doi.org/10.1038/ncomms11075 pmid: 27009448
27 G. Breton, , S. Zheng, , R. Valieris, , I. Tojal da Silva, , R. Satija, and M. C. Nussenzweig, (2016) Human dendritic cells (DCs) are derived from distinct circulating precursors that are precommitted to become CD1c+ or CD141+ DCs. J. Exp. Med., 213, 2861–2870..
https://doi.org/10.1084/jem.20161135 pmid: 27864467
28 P. Shannon, , A. Markiel, , O. Ozier, , N. S. Baliga, , J. T. Wang, , D. Ramage, , N. Amin, , B. Schwikowski, and T. Ideker, (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res., 13, 2498–2504.doi:10.1101/gr.1239303.
pmid: 14597658
29 N. Yamada, , Y. Horikawa, , N. Oda, , K. Iizuka, , N. Shihara, , S. Kishi, and J. Takeda, (2005) Genetic variation in the hypoxia-inducible factor-1α gene is associated with type 2 diabetes in Japanese. J. Clin. Endocrinol. Metab., 90, 5841–5847..
https://doi.org/10.1210/jc.2005-0991 pmid: 16046581
30 J.-C. Zhou, , J. Zhou, , L. Su, , K. Huang, and X. G. Lei, (2018) Selenium and Diabetes. In: Selenium. MICHALKE, B, 317–344. Cham: Springer International Publishing
31 D. Brina, , A. Miluzio, , S. Ricciardi, , K. Clarke, , P. K. Davidsen, , G. Viero, , T. Tebaldi, , N. Offenhäuser, , J. Rozman, , B. Rathkolb, , et al. (2015) eIF6 coordinates insulin sensitivity and lipid metabolism by coupling translation to transcription. Nat. Commun., 6, 8261.
https://doi.org/10.1038/ncomms9261 pmid: 26383020
32 A. H. Olsson, , B. T. Yang, , E. Hall, , J. Taneera, , A. Salehi, , M. Dekker Nitert, and C.. Ling, (2011) Decreased expression of genes involved in oxidative phosphorylation in human pancreatic islets from patients with type 2 diabetes. Eur. J. Endocrinol., 165, 589–595..
https://doi.org/10.1530/EJE-11-0282 pmid: 21775499
33 M. F. Molina, , H.-Q. Qu, , A. R. Rentfro, , S. Nair, , Y. Lu, , C. L. Hanis, , J. B. McCormick, and S. P. Fisher-Hoch, (2011) Decreased expression of ATP6V1H in type 2 diabetes: a pilot report on the diabetes risk study in Mexican Americans. Biochem. Biophys. Res. Commun., 412, 728–731..
https://doi.org/10.1016/j.bbrc.2011.08.041 pmid: 21871445
34 A. Crétien, , A. Proust, , J. Delaunay, , P. Rincé, , T. Leblanc, , R. Ducrocq, , M. Simansour, , I. Marie, , H. Tamary, , J. Meerpohl, , et al. (2010) Genetic variants in the noncoding region of RPS19 gene in Diamond-Blackfan anemia: potential implications for phenotypic heterogeneity. Am J Hematol, 85, 111–116.
pmid: 20054847
35 S. Liu, , T.-H. Kim, , D. A. Franklin, and Y. Zhang, (2017) Protection against high-fat-diet-induced obesity in mdm2c305f mice due to reduced p53 activity and enhanced energy expenditure. Cell Rep., 18, 1005–1018..
https://doi.org/10.1016/j.celrep.2016.12.086 pmid: 28122227
36 H. Chen, , X. Fang, , H. Zhu, , S. Li, , J. He, , P. Gu, , D. Fan, , F. Han, , Y. Zeng, , X. Yu, , et al. (2014) Gene expression profile analysis for different idiopathic interstitial pneumonias subtypes. Exp. Lung Res., 40, 367–379..
https://doi.org/10.3109/01902148.2014.933985 pmid: 25058599
37 A. Gharbi-Ayachi, , J. C. Labbé, , A. Burgess, , S. Vigneron, , J. M. Strub, , E. Brioudes, , A. Van-Dorsselaer, , A. Castro, and T. Lorca, (2010) The substrate of Greatwall kinase, Arpp19, controls mitosis by inhibiting protein phosphatase 2A. Science, 330, 1673–1677..
https://doi.org/10.1126/science.1197048 pmid: 21164014
38 Y. Gong, , W. Wu, , X. Zou, , F. Liu, , T. Wei, and J. Zhu, (2018) MiR-26a inhibits thyroid cancer cell proliferation by targeting ARPP19. Am J Cancer Res, 8, 1030–1039.
pmid: 30034940
39 H. Miyazaki, , V. Patel, , H. Wang, , R. K. Edmunds, , J. S. Gutkind, and W. A. Yeudall, (2006) Down-regulation of CXCL5 inhibits squamous carcinogenesis. Cancer Res., 66, 4279–4284..
https://doi.org/10.1158/0008-5472.CAN-05-4398 pmid: 16618752
40 L. A. Begley, , S. Kasina, , R. Mehra, , S. Adsule, , A. J. Admon, , R. J. Lonigro, , A. M. Chinnaiyan, and J. A. Macoska, (2008) CXCL5 promotes prostate cancer progression. Neoplasia, 10, 244–254..
https://doi.org/10.1593/neo.07976 pmid: 18320069
41 J. Plowman, , E. Bolderson, , J. Burgess, , D. Richard, and K. O’Byrne, (2019) Banf1 as a marker of lung cancer cell sensitivity to cisplatin. Lung Cancer, 127, S3.
https://doi.org/10.1016/S0169-5002(19)30051-0
42 J. Hu, , D. Yang, , H. Zhang, , W. Liu, , Y. Zhao, , H. Lu, , Q. Meng, , H. Pang, , X. Chen, , Y. Liu, , et al.et al. (2015) USP22 promotes tumor progression and induces epithelial-mesenchymal transition in lung adenocarcinoma. Lung Cancer, 88, 239–245..
https://doi.org/10.1016/j.lungcan.2015.02.019 pmid: 25907317
43 S. Levine, , Y. Chen, , and J. Stanich, (2004) Image restoration via nonstandard diffusion. Duquesne University, Department of Mathematics and Computer Science Technical Report. 04-01
44 S. Huang, , H. Wang, , Y. Ge, , L. Huangfu, , X. Zhang, and D. Yang, (2018) Improved hypergraph regularized nonnegative matrix factorization with sparse representation. Pattern Recognit. Lett., 102, 8–14..
https://doi.org/10.1016/j.patrec.2017.11.017
45 T. Jin, , Z. Yu, , Y. Gao, , S. Gao, , X. Sun, and C. Li, (2019) Robust ℓ2− hypergraph and its applications. Inf. Sci., 501, 708–723..
https://doi.org/10.1016/j.ins.2019.03.012
46 H. Yin, and H. Liu, (2010) Nonnegative matrix factorization with bounded total variational regularization for face recognition. Pattern Recognit. Lett., 31, 2468–2473..
https://doi.org/10.1016/j.patrec.2010.08.001
47 M. Hong, , M. Razaviyayn, , Z.-Q. Luo, and J.-S. Pang, (2016) A unified algorithmic framework for block-structured optimization involving big data: With applications in machine learning and signal processing. IEEE Signal Proc. Mag ., 33, 57–77
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed