Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (5) : 175903    https://doi.org/10.1007/s11704-022-2163-9
RESEARCH ARTICLE
FragDPI: a novel drug-protein interaction prediction model based on fragment understanding and unified coding
Zhihui YANG, Juan LIU(), Xuekai ZHU, Feng YANG, Qiang ZHANG, Hayat Ali SHAH
Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, china
 Download: PDF(3426 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Prediction of drug-protein binding is critical for virtual drug screening. Many deep learning methods have been proposed to predict the drug-protein binding based on protein sequences and drug representation sequences. However, most existing methods extract features from protein and drug sequences separately. As a result, they can not learn the features characterizing the drug-protein interactions. In addition, the existing methods encode the protein (drug) sequence usually based on the assumption that each amino acid (atom) has the same contribution to the binding, ignoring different impacts of different amino acids (atoms) on the binding. However, the event of drug-protein binding usually occurs between conserved residue fragments in the protein sequence and atom fragments of the drug molecule. Therefore, a more comprehensive encoding strategy is required to extract information from the conserved fragments.

In this paper, we propose a novel model, named FragDPI, to predict the drug-protein binding affinity. Unlike other methods, we encode the sequences based on the conserved fragments and encode the protein and drug into a unified vector. Moreover, we adopt a novel two-step training strategy to train FragDPI. The pre-training step is to learn the interactions between different fragments using unsupervised learning. The fine-tuning step is for predicting the binding affinities using supervised learning. The experiment results have illustrated the superiority of FragDPI.

Keywords affinity score      drug-protein interaction      BERT      Bi-Transformer      virtual drug screening     
Corresponding Author(s): Juan LIU   
Just Accepted Date: 12 July 2022   Issue Date: 12 December 2022
 Cite this article:   
Zhihui YANG,Juan LIU,Xuekai ZHU, et al. FragDPI: a novel drug-protein interaction prediction model based on fragment understanding and unified coding[J]. Front. Comput. Sci., 2023, 17(5): 175903.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2163-9
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I5/175903
Fig.1  Overview of model. The left is FU pre-training stage, which predicts masked fragments of the sequences. And the right is fine-tuning stage, which predicts the affinity scores
  
Fig.2  Input of Model, which consists of token embedding and position embedding. Token embedding is used to represent the semantic of conserved fragments, and position embedding is used to provide position information of conserved fragments
Data class Train Test ER Ion-C RTK GPCR
Number of drug-protein pairs 263584 113168 3374 14599 34318 60238
Average length 243 243 233 352 427 186
Tab.1  Statistics informations of datasets about the number of drug-protein pairs and the average length of tokenized sequences
Fig.3  RMSE results of FragDPI with FU pre-training and without FU pre-training in five different datasets. Phrase “with FU” denotes FragDPI with FU pre-training and characters “w/o FU” denotes without FU pre-training
Fig.4  Pearson’s r results of FragDPI with FU pre-training and without FU pre-training
RMSE Test ER Ion-C GPCR RTK
Ridge regression 1.23 1.46 1.26 1.34 1.51
Lasso regression 1.22 1.48 1.32 1.37 1.50
DeepAffinity 0.78 1.53 1.34 1.40 1.24
DeepDTA 0.98 1.48 1.45 1.40 1.25
AttentionDTA 1.18 1.97 1.72 1.85 1.75
BACPI 0.79 0.67 1.53 1.64 0.37
FragDPI(ours) 0.84 1.42 1.47 1.51 1.30
Tab.2  Results of main experiment based on RMSE. The Bold denotes the best performance
Pearson’s r Test ER Ion-C GPCR RTK
Ridge regression 0.54 0.18 0.23 0.20 0.10
Lasso regression 0.55 0.18 0.17 0.17 0.11
DeepAffinity 0.84 0.16 0.17 0.24 0.39
DeepDTA 0.75 0.17 0.11 0.24 0.34
AttentionDTA 0.69 0.13 0.04 0.19 0.23
BACPI 0.84 0.16 ?0.01 0.26 0.43
FragDPI(ours) 0.84 0.26 0.02 0.18 0.39
Tab.3  Results of experiments based on Pearson’s r
RMSE(r) Test ER Ion-C GPCR RTK
FragDPI(SSPro) 0.98(0.76) 1.46(0.26) 1.47(0.065) 1.36(0.24) 1.29(0.35)
FragDPI(FCS) 0.84(0.84) 1.42(0.26) 1.47(0.02) 1.51(0.18) 1.30(0.39)
Tab.4  Results of ablation based on RMSE (and Pearson’s r), which compares different sequence fragments identifying strategy on FragDPI. SSPro means another strategy of identifying fragments
Fig.5  The fragment attention score of the protein kinase C beta in 103-116 token bites and BDBM2591
Number Candidates drug Target name Affinity score Reference
1 BDBM198018::US9221795, 14 PI3-kinase subunit delta 9.5151 Cell growth and division [28]
2 CHEMBL3317818 Histone Deacetylase 2 (HDAC2) 9.5085 Prevention or treatment of COVID-19 [29]
3 CHEMBL3317818 Histone deacetylase 8 9.5085 Prevention or treatment of COVID-19 [29]
4 BDBM198096::US9221795, 91 PI3-kinase subunit delta 9.4446 Cell growth and division [28]
5 US9255098, Ex. 1::US9255098, Ex. 4 Dipeptidyl peptidase 4 (DPP4) 9.3964 Chronic hyperglycemia [28]
6 CHEMBL3605370 Monoamine oxidase 9.3290 Depression [30]
7 CHEMBL3605370 Monoamine oxidase 9.3290 Depression [30]
8 US9499523, 6 PI3-kinase subunit beta 9.3247 DNA replication and repair [31]
9 US9221795, 87 PI3-kinase subunit delta 9.3219 Cell growth and division [28]
10 US9169243, 41 AKT/p21CIP1 9.3119 Unknown
Tab.5  Top 10 drugs with high af?nity score to spike
  
  
  
  
  
  
1 D C, Swinney J Anthony . How were new medicines discovered? Nature Reviews Drug Discovery, 2011, 10(7): 507–519
2 S, Gupta A, Jadaun H, Kumar U, Raj P K, Varadwaj A R Rao . Exploration of new drug-like inhibitors for serine/threonine protein phosphatase 5 of Plasmodium falciparum: a docking and simulation study. Journal of Biomolecular Structure and Dynamics, 2015, 33( 11): 2421–2441
3 E, Yuriev M, Agostino P A Ramsland . Challenges and advances in computational docking: 2009 in review. Journal of Molecular Recognition, 2011, 24( 2): 149–164
4 K, Huang T, Fu L M, Glass M, Zitnik C, Xiao J Sun . DeepPurpose: a deep learning library for drug-target interaction prediction. Bioinformatics, 2020, 36(22−23): 5545−5547
5 K, Huang C, Xiao L M, Glass J Sun . MolTrans: molecular interaction transformer for drug-target interaction prediction. Bioinformatics, 2021, 37( 6): 830–836
6 Q, Zhao F, Xiao M, Yang Y, Li J Wang . AttentionDTA: prediction of drug−target binding affinity using attention model. In: Proceedings of 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2019, 64−69
7 Z, Liao R, You X, Huang X, Yao T, Huang S Zhu . DeepDock: enhancing ligand-protein interaction prediction by a combination of ligand and structure information. In: Proceedings of 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2019, 311−317
8 F, Bai F, Morcos R R, Cheng H, Jiang J N Onuchic . Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis. Proceedings of the National Academy of Sciences of the United States of America, 2016, 113( 50): E8051–E8058
9 H, Yao Y, Song Y, Chen N, Wu J, Xu C, Sun J, Zhang T, Weng Z, Zhang Z, Wu L, Cheng D, Shi X, Lu J, Lei M, Crispin Y, Shi L, Li S Li . Molecular architecture of the SARS-CoV-2 virus. Cell, 2020, 183( 3): 730–738.e13
10 X, Shu A, Royant M Z, Lin T A, Aguilera V, Lev-Ram P A, Steinbach R Y Tsien . Mammalian expression of infrared fluorescent proteins engineered from a bacterial phytochrome. Science, 2009, 324( 5928): 804–807
11 T, Pahikkala A, Airola S, Pietila S, Shakyawar A, Szwajda J, Tang T Aittokallio . Toward more realistic drug-target interaction predictions. Briefings in Bioinformatics, 2015, 16( 2): 325–337
12 X, Zheng H, Ding H, Mamitsuka S Zhu . Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 1025−1033
13 H, Özturk A, Özgür E Ozkirimli . DeepDTA: deep drug-target binding affinity prediction. Bioinformatics, 2018, 34( 17): i821–i829
14 T, Nguyen H, Le S Venkatesh . GraphDTA: prediction of drug−target binding affinity using graph convolutional networks. BioRxiv, 2019, 684662
https://doi.org/10.1101/684662
15 J, Devlin M W, Chang K, Lee K Toutanova . BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171−4186
16 L, Dong N, Yang W, Wang F, Wei X, Liu Y, Wang J, Gao M, Zhou H W Hon . Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1170
17 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog, 2019, 1(8): 9
18 C, Raffel N, Shazeer A, Roberts K, Lee S, Narang M, Matena Y, Zhou W, Li P J Liu . Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020, 21: 1–67
19 A, Vaswani N, Shazeer N, Parmar J, Uszkoreit L, Jones A N, Gomez Ł, Kaiser I Polosukhin . Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010
20 M, Karimi D, Wu Z, Wang Y Shen . DeepAffinity: interpretable deep learning of compound−protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, 2019, 35( 18): 3329–3338
21 T, Liu Y, Lin X, Wen R N, Jorissen M K Gilson . BindingDB: a web-accessible database of experimentally determined protein−ligand binding affinities. Nucleic Acids Research, 2007, 35( S1): D198–D201
22 M, Kuhn Mering C, Von M, Campillos L J, Jensen P Bork . STITCH: interaction networks of chemicals and proteins. Nucleic Acids Research, 2008, 36( S1): D684–D688
23 B E, Suzek Y, Wang H, Huang P B, McGarvey C H, Wu Consortium UniProt . UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 2015, 31( 6): 926–932
24 M, Li Z, Lu Y, Wu Y Li . BACPI: a bi-directional attention neural network for compound−protein interaction and binding affinity prediction. Bioinformatics, 2022, 38( 7): 1995–2002
25 T A, Leonard B, Różycki L F, Saidi G, Hummer J H Hurley . Crystal structure and allosteric activation of protein kinase C βII. Cell, 2011, 144( 1): 55–66
26 R B, Sutton S R Sprang . Structure of the protein kinase cβ phospholipid-binding C2 domain complexed with Ca2+. Structure, 1998, 6( 11): 1395–1405
27 T T N, Thao F, Labroussaa N, Ebert P, V’kovski H, Stalder J, Portmann J, Kelly S, Steiner M, Holwerda A, Kratzel M, Gultom K, Schmied L, Laloli L, Hüsser M, Wider S, Pfaender D, Hirt V, Cippà S, Crespo-Pomar S, Schröder D, Muth D, Niemeyer V M, Corman M A, Müller C, Drosten R, Dijkman J, Jores V Thiel . Rapid reconstruction of SARS-CoV-2 using a synthetic genomics platform. Nature, 2020, 582( 7813): 561–565
28 N, Tzenaki E A Papakonstanti . p110δ PI3 kinase pathway: emerging roles in cancer. Frontiers in Oncology, 2013, 3: 40
29 Y, Takahashi A, Hayakawa R, Sano H, Fukuda M, Harada R, Kubo T, Okawa Y Kominato . Histone deacetylase inhibitors suppress ACE2 and ABO simultaneously, suggesting a preventive potential against COVID-19. Scientific Reports, 2021, 11( 1): 3379
30 H P, Volz C H Gleiter . Monoamine oxidase inhibitors. Drugs & Aging, 1998, 13( 5): 341–355
31 A, Kumar J, Redondo-Muñoz V, Perez-García I, Cortes M, Chagoyen A C Carrera . Nuclear but not cytosolic phosphoinositide 3-kinase beta has an essential function in cell survival. Molecular and Cellular Biology, 2011, 31( 10): 2122–2133
[1] FCS-22163-OF-ZY_suppl_1 Download
[1] Lei LI, Chengyu WANG, Minghui QIU, Cen CHEN, Ming GAO, Aoying ZHOU. Accelerating BERT inference with GPU-efficient exit prediction[J]. Front. Comput. Sci., 2024, 18(3): 183308-.
[2] Jipeng QIANG, Feng ZHANG, Yun LI, Yunhao YUAN, Yi ZHU, Xindong WU. Unsupervised statistical text simplification using pre-trained language modeling for initialization[J]. Front. Comput. Sci., 2023, 17(1): 171303-.
[3] Zhangjie FU, Yan WANG, Xingming SUN, Xiaosong ZHANG. Semantic and secure search over encrypted outsourcing cloud based on BERT[J]. Front. Comput. Sci., 2022, 16(2): 162802-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed