Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (6) : 186912    https://doi.org/10.1007/s11704-023-3610-y
Interdisciplinary
DMFVAE: miRNA-disease associations prediction based on deep matrix factorization method with variational autoencoder
Pijing WEI1, Qianqian WANG2, Zhen GAO2, Ruifen CAO2, Chunhou ZHENG3()
1. Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China
2. Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China
3. Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
 Download: PDF(1671 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

MicroRNAs (miRNAs) are closely related to numerous complex human diseases, therefore, exploring miRNA-disease associations (MDAs) can help people gain a better understanding of complex disease mechanism. An increasing number of computational methods have been developed to predict MDAs. However, the sparsity of the MDAs may hinder the performance of many methods. In addition, many methods fail to capture the nonlinear relationships of miRNA-disease network and inadequately leverage the features of network and neighbor nodes. In this study, we propose a deep matrix factorization model with variational autoencoder (DMFVAE) to predict potential MDAs. DMFVAE first decomposes the original association matrix and the enhanced association matrix, in which the enhanced association matrix is enhanced by self-adjusting the nearest neighbor method, to obtain sparse vectors and dense vectors, respectively. Then, the variational encoder is employed to obtain the nonlinear latent vectors of miRNA and disease for the sparse vectors, and meanwhile, node2vec is used to obtain the network structure embedding vectors of miRNA and disease for the dense vectors. Finally, sample features are acquired by combining the latent vectors and network structure embedding vectors, and the final prediction is implemented by convolutional neural network with channel attention. To evaluate the performance of DMFVAE, we conduct five-fold cross validation on the HMDD v2.0 and HMDD v3.2 datasets and the results show that DMFVAE performs well. Furthermore, case studies on lung neoplasms, colon neoplasms, and esophageal neoplasms confirm the ability of DMFVAE in identifying potential miRNAs for human diseases.

Keywords miRNA-disease association      deep matrix factorization      self-adjusted nearest neighbor      variational encoder      network structure     
Corresponding Author(s): Chunhou ZHENG   
Just Accepted Date: 08 December 2023   Issue Date: 10 July 2024
 Cite this article:   
Pijing WEI,Qianqian WANG,Zhen GAO, et al. DMFVAE: miRNA-disease associations prediction based on deep matrix factorization method with variational autoencoder[J]. Front. Comput. Sci., 2024, 18(6): 186912.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-3610-y
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I6/186912
Fig.1  The effects of different feature dimensions on HMDD v2.0 balanced dataset. (a) The dimensions of latent vector feature; (b) the dimensions of embedding feature
Fold AUC AUPR F1 ACC Precision Recall
Balanced Std 0.0012 0.0014 0.0031 0.0033 0.0057 0.0035
Aver 0.9662 0.9619 0.9102 0.9089 0.8966 0.9247
Unbalanced Std 0.0001 0.0031 0.0031 0.0001 0.0082 0.0086
Aver 0.9678 0.6556 0.6055 0.9802 0.7053 0.5313
Tab.1  The 5CV results on HMDD v2.0 balanced and unbalanced datasets, where the Std and Aver represent standard deviation and average value, respectively
AUC AUPR F1 ACC Precision Recall
Balanced-noEASNN 0.9624 0.9585 0.9045 0.9028 0.8890 0.9210
Balanced-withEASNN 0.9662 0.9619 0.9102 0.9089 0.8966 0.9247
Unbalanced-noEASNN 0.9587 0.5836 0.5032 0.9779 0.7081 0.3930
Unbalanced-withEASNN 0.9678 0.6556 0.6055 0.9802 0.7053 0.5313
Tab.2  The results of experiments comparing on HMDD v2.0 balanced and unbalanced datasets with EASNN removed and retained, respectively
Fig.2  The comparison by using different features on HMDD v2.0 balanced and unbalanced datasets. For balanced dataset, the average value is calculated from AUC, AUPR, F1 and Acc. For unbalanced dataset, average value is calculated from AUPR and F1. (a) The balanced dataset; (b) the unbalanced dataset
Fig.3  DMFVAE compared with other models on HMDD v2.0 balanced and unbalanced datasets. For balanced dataset, the average value is calculated from AUC, AUPR, F1 and Acc. For unbalanced dataset, the average value is calculated from AUPR and F1. (a) The balanced dataset; (b) the unbalanced dataset
Fold AUC AUPR F1 ACC Precision Recall
Balanced Std 0.0004 0.0010 0.0019 0.0022 0.0047 0.0041
Aver 0.9682 0.9639 0.9140 0.9123 0.8967 0.9322
Unbalanced Std 0.0005 0.0033 0.0038 0.0002 0.0081 0.0081
Aver 0.9705 0.6853 0.6191 0.9795 0.7142 0.5514
Tab.3  The 5CV results on HMDD v3.2 balanced and unbalanced datasets, where the Std and Aver represent standard deviation and average value, respectively
Fig.4  DMFVAE compared with other models on HMDD v3.2 balanced and unbalanced datasets. For balanced dataset, the average value is calculated from AUC, AUPR, F1 and Acc. For unbalanced dataset, the average value is calculated from AUPR and F1. (a) The balanced dataset; (b) the unbalanced dataset
Rank miRNA Evidence Rank miRNA Evidence
1 hsa-mir-211 H3, DEMC, miR 11 hsa-mir-20b DEMC
2 hsa-mir-130a H3, DEMC, miR 12 hsa-mir-152 H3, db
3 hsa-mir-129 H3, DEMC 13 hsa-mir-99a H3, DEMC, miR
4 hsa-mir-151a DEMC 14 hsa-mir-23b DEMC
5 hsa-mir-208a H3 15 hsa-mir-449a H3
6 hsa-mir-196b H3, DEMC 16 hsa-mir-16 H3, DEMC, miR
7 hsa-mir-378a DEMC 17 hsa-mir-106b H3, DEMC
8 hsa-mir-302c DEMC 18 hsa-mir-10a H3, DEMC
9 hsa-mir-370 DEMC 19 hsa-mir-195 H3, DEMC, miR
10 hsa-mir-296 DEMC 20 hsa-mir-15a H3, DEMC
Tab.4  Top 20 candidate miRNAs associated with LN, where H3, DEMC and miR represent HMDD v3.2, dbDEMC and miR2Disease, respectively
data miRNAs diseases #P #N
H2 balanced 495 383 5430 5430
H2 unbalanced 495 383 5430 184155
H3 balanced 788 374 8968 8968
H3 unbalanced 788 374 8968 285744
Tab.5  Corresponding miRNA-disease association information summarized in the test data, where H2, H3, #P and #N represent HMDD v2.0, HMDD v3.2, the number of positive samples and the number of negative samples, respectively
Fig.5  Overview of the DMFVAE method architecture. (1) Feature extraction. The sparse and dense vector are constructed for each miRNA and disease by matrix factorization of the original association matrix A and the enhanced association matrix A with the EASNN method. (2) Projection layer. A nonlinear latent vector of each miRNA and disease is obtained by using a variational autoencoder. (3) Embedding layer. The embedded features of miRNAs and disease network structures are acquired by extracting dense vectors using node2vec. (4) Prediction layer. Convolution is performed on the features of the samples (where the features represent the concatenated features of the nonlinear latent vectors and network structure embeddings for each miRNA and disease), and channel attention is used to assign different weights to each channel for final prediction
  
  
  
  
  
1 L F R, Gebert I J MacRae . Regulation of microRNA function in animals. Nature Reviews Molecular Cell Biology, 2019, 20( 1): 21–37
2 Meter E N, Van J A, Onyango K A Teske . A review of currently identified small molecule modulators of microRNA function. European Journal of Medicinal Chemistry, 2020, 188: 112008
3 S M Hammond . An overview of microRNAs. Advanced Drug Delivery Reviews, 2015, 87: 3–14
4 S Patanè . The complex miRNAs-p53 signaling network in cardiovascular disease. Journal of the American College of Cardiology, 2017, 69( 16): 2099–2100
5 X, Wang Y, He B, Mackowiak B Gao . MicroRNAs as regulators, biomarkers and therapeutic targets in liver diseases. Gut, 2021, 70( 4): 784–795
6 Y W, Niu G H, Wang G Y, Yan X Chen . Integrating random walk and binary regression to identify novel miRNA-disease association. BMC Bioinformatics, 2019, 20( 1): 59
7 X, Chen D, Xie Q, Zhao Z H You . MicroRNAs and complex diseases: from experimental results to computational models. Briefings in Bioinformatics, 2019, 20( 2): 515–539
8 L, Huang L, Zhang X Chen . Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models. Briefings in Bioinformatics, 2022, 23( 5): bbac358
9 L, Huang L, Zhang X Chen . Updated review of advances in microRNAs and complex diseases: experimental results, databases, webservers and data fusion. Briefings in Bioinformatics, 2022, 23( 6): bbac397
10 L, Huang L, Zhang X Chen . Updated review of advances in microRNAs and complex diseases: towards systematic evaluation of computational models. Briefings in Bioinformatics, 2022, 23( 6): bbac407
11 L Y, Dai J X, Liu R, Zhu J, Wang S S Yuan . Logistic weighted profile-based bi-random walk for exploring miRNA-disease associations. Journal of Computer Science and Technology, 2021, 36( 2): 276–287
12 P, Xuan D, Wang H, Cui T, Zhang T Nakaguchi . Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA−disease association prediction. Briefings in Bioinformatics, 2022, 23( 1): bbab428
13 J, Xu W, Zhu L, Cai B, Liao Y, Meng J, Xiang D, Yuan G, Tian J Yang . LRMCMDA: predicting miRNA-disease association by integrating low-rank matrix completion with miRNA and disease similarity information. IEEE Access, 2020, 8: 80728–80738
14 X, Chen J, Yin J, Qu L Huang . MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction. PLoS Computational Biology, 2018, 14( 8): e1006418
15 X, Chen L G, Sun Y Zhao . NCMCMDA: miRNA–disease association prediction through neighborhood constraint matrix completion. Briefings in Bioinformatics, 2021, 22( 1): 485–496
16 X, Lu J, Li Z, Zhu Y, Yuan G, Chen K He . Predicting miRNA-disease associations via combining probability matrix feature decomposition with neighbor learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19( 6): 3160–3170
17 Y, Zhang X, Lei Z, Fang Y Pan . CircRNA-disease associations prediction based on metapath2vec++ and matrix factorization. Big Data Mining and Analytics, 2020, 3( 4): 280–291
18 P, Xuan H, Sun X, Wang T, Zhang S Pan . Inferring the disease-associated miRNAs based on network representation learning and convolutional neural networks. International Journal of Molecular Sciences, 2019, 20( 15): 3648
19 Z, Li J, Li R, Nie Z H, You W Bao . A graph auto-encoder model for miRNA-disease associations prediction. Briefings in Bioinformatics, 2021, 22( 4): bbaa240
20 Y, Ding L P, Tian X, Lei B, Liao F X Wu . Variational graph auto-encoders for miRNA-disease association prediction. Methods, 2021, 192: 25–34
21 W, Liu H, Lin L, Huang L, Peng T, Tang Q, Zhao L Yang . Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder. Briefings in Bioinformatics, 2022, 23( 3): bbac104
22 M, Zeng C, Lu Z, Fei F X, Wu Y, Li J, Wang M Li . DMFLDA: a deep learning framework for predicting lncRNA−disease associations. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, 18( 6): 2353–2363
23 C, Lu M, Zeng F, Zhang F X, Wu M, Li J Wang . Deep matrix factorization improves prediction of human circRNA-disease associations. IEEE Journal of Biomedical and Health Informatics, 2021, 25( 3): 891–899
24 D, Liu Y, Huang W, Nie J, Zhang L Deng . SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost. BMC Bioinformatics, 2021, 22( 1): 219
25 J, Li X, Chen Q, Huang Y, Wang Y, Xie Z, Dai X, Zou Z Li . Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms. Scientific Reports, 2020, 10( 1): 17901
26 J, Li Z, Li R, Nie Z, You W Bao . FCGCNMDA: predicting miRNA-disease associations by applying fully connected graph convolutional networks. Molecular Genetics and Genomics, 2020, 295( 5): 1197–1209
27 Y, Chu X, Wang Q, Dai Y, Wang Q, Wang S, Peng X, Wei J, Qiu D R, Salahub Y, Xiong D Q Wei . MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Briefings in Bioinformatics, 2021, 22( 6): bbab165
28 L, Zhang X, Chen J Yin . Prediction of potential miRNA−disease associations through a novel unsupervised deep learning framework with variational autoencoder. Cells, 2019, 8( 9): 1040
29 Q, Dai Z, Wang Z, Liu X, Duan J, Song M Guo . Predicting miRNA-disease associations using an ensemble learning framework with resampling method. Briefings in Bioinformatics, 2022, 23( 1): bbab543
30 T, Zhong Z, Li Z H, You R, Nie H Zhao . Predicting miRNA−disease associations based on graph random propagation network and attention network. Briefings in Bioinformatics, 2022, 23( 2): bbab589
31 S, Zhou S, Wang Q, Wu R, Azim W Li . Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression. Computational Biology and Chemistry, 2020, 85: 107200
32 C, Jin Z, Shi K, Lin H Zhang . Predicting miRNA-disease association based on neural inductive matrix completion with graph autoencoders and self-attention mechanism. Biomolecules, 2022, 12( 1): 64
33 Y, Zhao X, Chen J Yin . Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics, 2019, 35( 22): 4730–4738
34 Y, Ding X, Lei B, Liao F X Wu . Predicting miRNA-disease associations based on multi-view variational graph auto-encoder with matrix factorization. IEEE Journal of Biomedical and Health Informatics, 2022, 26( 1): 446–457
35 Y, Ding X, Lei B, Liao F X Wu . MLRDFM: a multi-view Laplacian regularized DeepFM model for predicting miRNA-disease associations. Briefings in Bioinformatics, 2022, 23( 3): bbac079
36 Z, Yang F, Ren C, Liu S, He G, Sun Q, Gao L, Yao Y, Zhang R, Miao Y, Cao Y, Zhao Y, Zhong H Zhao . dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genomics, 2010, 11( S4): S5
37 Q, Jiang Y, Wang Y, Hao L, Juan M, Teng X, Zhang M, Li G, Wang Y Liu . miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Research, 2009, 37: D98–D104
38 L M, Seijo J J Zulueta . Understanding the links between lung cancer, COPD, and emphysema: a key to more effective treatment and screening. Oncology, 2017, 31( 2): 93–102
39 J, Hamamoto K, Soejima S, Yoda K, Naoki S, Nakayama R, Satomi H, Terai S, Ikemura T, Sato H, Yasuda Y, Hayashi M, Sakamoto T, Takebayashi T Betsuyaku . Identification of microRNAs differentially expressed between lung squamous cell carcinoma and lung adenocarcinoma. Molecular Medicine Reports, 2013, 8( 2): 456–462
40 T, Ciuleanu T, Brodowicz C, Zielinski J H, Kim M, Krzakowski E, Laack Y L, Wu I, Bover S, Begbie V, Tzekova B, Cucevic J R, Pereira S H, Yang J, Madhavan K P, Sugarman P, Peterson W J, John K, Krejcy C P Belani . Maintenance pemetrexed plus best supportive care versus placebo plus best supportive care for non-small-cell lung cancer: a randomised, double-blind, phase 3 study. The Lancet, 2009, 374( 9699): 1432–1440
41 M B, Schabath M L Cote . Cancer progress and priorities: lung cancer. Cancer Epidemiology, Biomarkers & Prevention, 2019, 28( 10): 1563–1579
42 M S Cappell . Pathophysiology, clinical presentation, and management of colon cancer. Gastroenterology Clinics of North America, 2008, 37( 1): 1–24
43 M I, Aslam K, Taylor J H, Pringle J S Jameson . MicroRNAs are novel biomarkers of colorectal cancer. British Journal of Surgery, 2009, 96( 7): 702–710
44 A, Yamada T, Horimatsu Y, Okugawa N, Nishida H, Honjo H, Ida T, Kou T, Kusaka Y, Sasaki M, Yagi T, Higurashi N, Yukawa Y, Amanuma O, Kikuchi M, Muto Y, Ueno A, Nakajima T, Chiba C R, Boland A Goel . Serum miR-21, miR-29a, and miR-125b are promising biomarkers for the early detection of colorectal neoplasia. Clinical Cancer Research, 2015, 21( 18): 4234–4242
45 H B, El-Serag S, Sweet C C, Winchester J Dent . Update on the epidemiology of gastro-oesophageal reflux disease: a systematic review. Gut, 2014, 63( 6): 871–880
46 M, Sohda H Kuwano . Current status and future prospects for esophageal cancer treatment. Annals of Thoracic and Cardiovascular Surgery, 2017, 23( 1): 1–11
47 L, Gramantieri M, Ferracin F, Fornari A, Veronese S, Sabbioni C G, Liu G A, Calin C, Giovannini E, Ferrazzi G L, Grazi C M, Croce L, Bolondi M Negrini . Cyclin G1 is a target of miR-122a, a microRNA frequently down-regulated in human hepatocellular carcinoma. Cancer Research, 2007, 67( 13): 6092–6099
48 Y, Li C, Qiu J, Tu B, Geng J, Yang T, Jiang Q Cui . HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Research, 2014, 42( D1): D1070–D1074
49 Z, Huang J, Shi Y, Gao C, Cui S, Zhang J, Li Y, Zhou Q Cui . HMDD v3.0: a database for experimentally supported human microRNA−disease associations. Nucleic Acids Research, 2019, 47( D1): D1013–D1017
50 P, Xuan K, Han M, Guo Y, Guo J, Li J, Ding Y, Liu Q, Dai J, Li Z, Teng Y Huang . Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLoS One, 2013, 8( 8): e70204
51 D, Wang J, Wang M, Lu F, Song Q Cui . Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics, 2010, 26( 13): 1644–1650
52 C, Pasquier J Gardès . Prediction of miRNA-disease associations with a vector space model. Scientific Reports, 2016, 6: 27036
53 Z W, Zhang Z, Gao C H, Zheng Y T, Wang S M Qi . MELPMDA: a new method based on matrix enhancement and label propagation for predicting miRNA-disease association. In: Proceedings of the 17th International Conference on Intelligent Computing Theories and Application. 2021, 536−548
54 F, Xie Z, Yang J, Song Q, Dai X Duan . DHNLDA: a novel deep hierarchical network based method for predicting lncRNA-disease associations. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19( 6): 3395–3403
55 A, Dhillon G K Verma . Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence, 2020, 9( 2): 85–112
56 X, Tang J, Luo C, Shen Z Lai . Multi-view multichannel attention graph convolutional network for miRNA−disease association prediction. Briefings in Bioinformatics, 2021, 22( 6): bbab174
[1] FCS-23610-OF-PW_suppl_1 Download
[1] Yizheng WANG, Xin ZHANG, Ying JU, Qing LIU, Quan ZOU, Yazhou ZHANG, Yijie DING, Ying ZHANG. Identification of human microRNA-disease association via low-rank approximation-based link propagation and multiple kernel learning[J]. Front. Comput. Sci., 2024, 18(2): 182903-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed