Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (5) : 195903    https://doi.org/10.1007/s11704-024-31060-3
Interdisciplinary
Expanding the sequence spaces of synthetic binding protein using deep learning-based framework ProteinMPNN
Yanlin LI1, Wantong JIAO1, Ruihan LIU1, Xuejin DENG1, Feng ZHU2(), Weiwei XUE1()
1. Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
2. College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
 Download: PDF(18630 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Synthetic binding proteins (SBPs) with small size, marked solubility and stability, and high affinity are important for protein-based research, treatment, and diagnostics. Over the last several decades, site-directed mutagenesis and directed evolution of privileged protein scaffold make up the great majority of SBPs. The groundbreaking advancement of deep learning (DL) in recent years has revolutionized the problem of protein structure prediction and design. Here, for the first time, the cutting-edge DL framework ProteinMPNN was applied to fulfill the de novo design of 7,245 new synthetic proteins covering 55 different scaffolds based on the original SBPs collected in our SYNBIP database. Comprehensive bioinformatics analysis indicated that, in addition to the excellent performance of sequence recovery, the designed synthetic proteins have a significant improvement in solubility and thermal stability compared to the currently known SBPs. Meanwhile, 8 incredibly suitable protein scaffolds for ProteinMPNN have been identified, from which the designed synthetic proteins calculate displayed good performance on binding ability to their corresponding protein targets. Therefore, the DL-based framework shown great potential in target-directed de novo generation of synthetic protein library with high quality, which could assist experimental biologists to rational protein engineering to discover novel functional protein binders.

Keywords synthetic protein      deep learning      de novo protein design      solubility      stability     
Corresponding Author(s): Feng ZHU,Weiwei XUE   
Just Accepted Date: 19 April 2024   Issue Date: 20 June 2024
 Cite this article:   
Yanlin LI,Wantong JIAO,Ruihan LIU, et al. Expanding the sequence spaces of synthetic binding protein using deep learning-based framework ProteinMPNN[J]. Front. Comput. Sci., 2025, 19(5): 195903.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-31060-3
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I5/195903
Fig.1  Sequence recovery of protein sequences designed based on monomers at different temperatures
Fig.2  Comparison of the average value of sequence recovery synthetic proteins designed by ProteinMPNN based on original SBPs in monomer and complex forms
SequenceRepetitionsSource of amino acid sequence
CTLPGYENNPEC2SBP000286
CHPLSTHPEC3SBP000288
CHPTSTHPLC2SBP000288
CHPDSTHPLC2SBP000289
CHPDSTHPDC2SBP000289
LVXEEAS a3SBP000293
SIPGTTL2SBP003419
GCTGPNCANSPGG2SBP003420
GCTGPNCANSAGP2SBP003420
Tab.1  List of 20 duplication among the ProteinMPNN-generated protein sequences
Fig.3  Analysis of the solubility of synthetic proteins designed by ProteinMPNN. (a) Quantity distribution of different proportion of increased solubility (PSL); (b) the distribution of SBPs with PSL of 1 in the corresponding SBP scaffolds; (c) average solubility of all amino acid sequences under the scaffold
Fig.4  Comparison of the solubility of synthetic proteins designed by ProteinMPNN original SBPs in monomer and complex forms. (a) The difference in the number of amino acid sequences with enhanced solubility between two designs; (b) comparison of average values of solubility between two designs
Fig.5  Analysis of the stability of synthetic proteins designed by ProteinMPNN. (a) Quantity distribution of different proportion of increased stability (PST); (b) the distribution of SBPs with PST of 1 in the corresponding SBP scaffolds
Fig.6  Comparison of the stability of synthetic proteins designed by ProteinMPNN original SBPs in monomer and complex forms. (a) The difference in the number of amino acid sequences with enhanced stability between two designs; (b) comparison of average values of stability between two designs
Scaffold name Thermal denaturation temperature range Molecular weight range Fold type
Affilin 56?72 ℃ 17?22 kDa Beta-Sheets + Beta-Turns + Loops
Diabody 25?55 ℃ 21?60 kDa Beta-Sheets + Loops
scFv 59?65 ℃ 14?40 kDa Beta-Sheets + Loops
Neocarzinostatin-based binder 51?57 ℃ 10?14 kDa Beta-Sheets + Loops
Evibody 90 ℃ 12?16 kDa Beta-Sheets + Loops
Fab 60 ℃ 50?57 kDa Beta-Sheets + Loops
Repebody 82 ℃ 27?40 kDa Alpha-Helices + Beta-Sheets + Loops
CI2-based binder ? 7 kDa One Alpha-Helix + Beta-Sheets + Loops
Tab.2  Details of 8 protein scaffolds applicable to ProteinMPNN from the SYNBIP database
Fig.7  Statistics of interaction types between SBPs based on complexes and their protein targets. The potential hydrogen bonds, salt bridges, and disulfide bonds are shown in blue, orange, and gray representations, respectively
Fig.8  Analysis of the interaction interface between SBP001262 and its target protein
Protein nameKd (nM)Predected ΔiGPredected solubilityInstability index
Anticalins N7E7.18 ± 0.12?120.53533.65
Anticalins N9B39.9 ± 1.0?6.90.62229.71
ProteinMPNN-1??60.593*35.54
ProteinMPNN-2??9.5*0.644**31.78*
ProteinMPNN-3??8*0.678**37.88
ProteinMPNN-4??11.1*0.681**34.96
ProteinMPNN-5??13.1**0.52744.85
Tab.3  Comparison of the properties of ProteinMPNN designed proteins with those obtained by conventional protein engineering
Fig.9  Analysis of Lipocalin 2, Anticalins N7E, N9B, and ProteinMPNN-5 in terms of both structure and sequence. (a) Structural comparison of four proteins. The RMSD values of Anticalins N7E (green), N9B (red), and ProteinMPNN-5 (blue) relative to Lipocalin 2 (gray) were 0.467 ?, 0.561 ?, and 0.470 ?, respectively. Disulfide bonds are demonstrated with spheres. The PDBIDs for Lipocalin 2, Anticalins N7E, and N9B are 1DFV, 5N47, and 5N48, respectively; (b) sequence comparison of the four proteins. The amino acids highlighted in red are different from Lipocalin 2, while the positions shaded in yellow indicate the locations of disulfide bonds
  
  
  
  
  
  
1 M, Gebauer A Skerra . Engineered protein scaffolds as next-generation therapeutics. Annual Review of Pharmacology and Toxicology, 2020, 60: 391–415
2 X, Wang F, Li W, Qiu B, Xu Y, Li X, Lian H, Yu Z, Zhang J, Wang Z, Li W, Xue F Zhu . SYNBIP: synthetic binding proteins for research, diagnosis and therapy. Nucleic Acids Research, 2022, 50( D1): D560–D570
3 P S, Huang S E, Boyken D Baker . The coming of age of de novo protein design. Nature, 2016, 537( 7620): 320–327
4 E P, Carpenter K, Beis A D, Cameron S Iwata . Overcoming the challenges of membrane protein crystallography. Current Opinion in Structural Biology, 2008, 18( 5): 581–586
5 C, Zeymer D Hilvert . Directed evolution of protein catalysts. Annual Review of Biochemistry, 2018, 87: 131–157
6 M K M, Engqvist K S Rabe . Applications of protein engineering and directed evolution in plant research. Plant Physiology, 2019, 179( 3): 907–917
7 L, Cao B, Coventry I, Goreshnik B, Huang W, Sheffler J S, Park K M, Jude I, Marković R U, Kadam K H G, Verschueren K, Verstraete S T R, Walsh N, Bennett A, Phal A, Yang L, Kozodoy M, DeWitt L, Picton L, Miller E M, Strauch N D, DeBouver A, Pires A K, Bera S, Halabiya B, Hammerson W, Yang S, Bernard L, Stewart I A, Wilson H, Ruohola-Baker J, Schlessinger S, Lee S N, Savvides K C, Garcia D Baker . Design of protein-binding proteins from the target structure alone. Nature, 2022, 605( 7910): 551–560
8 Baker D. What has de novo protein design taught us about protein folding and biophysics? Protein Science, 2019, 28(4): 678−683
9 T, Liang C, Jiang J, Yuan Y, Othman X Q, Xie Z Feng . Differential performance of RoseTTAFold in antibody modeling. Briefings in Bioinformatics, 2022, 23( 5): bbac152
10 W, Chen G, Qian Y, Wan D, Chen X, Zhou W, Yuan X Duan . Mesokinetics as a tool bridging the microscopic-to-macroscopic transition to rationalize catalyst design. Accounts of Chemical Research, 2022, 55( 22): 3230–3241
11 W, Chen W, Fu X, Duan B, Chen G, Qian R, Si X, Zhou W, Yuan D Chen . Taming electrons in Pt/C catalysts to boost the mesokinetics of hydrogen production. Engineering, 2022, 14: 124–133
12 T, Liang H, Chen J, Yuan C, Jiang Y, Hao Y, Wang Z, Feng X Q Xie . IsAb: a computational protocol for antibody design. Briefings in Bioinformatics, 2021, 22( 5): bbab143
13 B, Kuhlman P Bradley . Advances in protein structure prediction and design. Nature Reviews Molecular Cell Biology, 2019, 20( 11): 681–697
14 H, Khakzad I, Igashov A, Schneuing C, Goverde M, Bronstein B Correia . A new age in protein design empowered by deep learning. Cell Systems, 2023, 14( 11): 925–939
15 F, Wang X, Feng R, Kong S Chang . Generating new protein sequences by using dense network and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20( 2): 4178–4197
16 A, Strokach D, Becerra C, Corbi-Verge A, Perez-Riba P M Kim . Fast and flexible protein design using deep graph neural networks. Cell Systems, 2020, 11( 4): 402–411.e4
17 N, Brandes D, Ofer Y, Peleg N, Rappoport M Linial . ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics, 2022, 38( 8): 2102–2110
18 I, Anishchenko S J, Pellock T M, Chidyausiku T A, Ramelot S, Ovchinnikov J, Hao K, Bafna C, Norn A, Kang A K, Bera F, Dimaio L, Carter C M, Chow G T, Montelione D Baker . De novo protein design by deep network hallucination. Nature, 2021, 600( 7889): 547–552
19 A H W, Yeh C, Norn Y, Kipnis D, Tischer S J, Pellock D, Evans P, Ma G R, Lee J Z, Zhang I, Anishchenko B, Coventry L, Cao J, Dauparas S, Halabiya M, DeWitt L, Carter K N, Houk D Baker . De novo design of luciferases using deep learning. Nature, 2023, 614( 7949): 774–780
20 W, Ding K, Nakai H Gong . Protein design via deep learning. Briefings in Bioinformatics, 2022, 23( 3): bbac102
21 E, Lin C H, Lin H Y Lane . De novo peptide and protein design using generative adversarial networks: an update. Journal of Chemical Information and Modeling, 2022, 62( 4): 761–774
22 R, Yin B Y, Feng A, Varshney B G Pierce . Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Science, 2022, 31( 8): e4379
23 J, Dauparas I, Anishchenko N, Bennett H, Bai R J, Ragotte L F, Milles B I M, Wicky A, Courbet Haas R J, de N, Bethel P J Y, Leung T F, Huddy S, Pellock D, Tischer F, Chan B, Koepnick H, Nguyen A, Kang B, Sankaran A K, Bera N P, King D Baker . Robust deep learning–based protein sequence design using ProteinMPNN. Science, 2022, 378( 6615): 49–56
24 S K, Burley C, Bhikadiya C, Bi S, Bittrich H, Chao L, Chen P A, Craig G V, Crichlow K, Dalenberg J M, Duarte S, Dutta M, Fayazi Z, Feng J W, Flatt S, Ganesan S, Ghosh D S, Goodsell R K, Green V, Guranovic J, Henry B P, Hudson I, Khokhriakov C L, Lawson Y, Liang R, Lowe E, Peisach I, Persikova D W, Piehl Y, Rose A, Sali J, Segura M, Sekharan C, Shao B, Vallat M, Voigt B, Webb J D, Westbrook S, Whetstone J Y, Young A, Zalevsky C Zardecki . RCSB protein data bank (RCSB. org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Research, 2023, 51( D1): D488–D508
25 N R, Bennett B, Coventry I, Goreshnik B, Huang A, Allen D, Vafeados Y P, Peng J, Dauparas M, Baek L, Stewart F, Dimaio Munck S, De S N, Savvides D Baker . Improving de novo protein binder design with deep learning. Nature Communications, 2023, 14( 1): 2625
26 C E, Sequeiros-Borja B, Surpeta J Brezovsky . Recent advances in user-friendly computational tools to engineer protein function. Briefings in Bioinformatics, 2021, 22( 3): bbaa150
27 Z, Du H, Su W, Wang L, Ye H, Wei Z, Peng I, Anishchenko D, Baker J Yang . The trRosetta server for fast and accurate protein structure prediction. Nature Protocols, 2021, 16( 12): 5634–5651
28 A L, Cortajarena T, Kajander W, Pan M J, Cocco L Regan . Protein design to understand peptide ligand recognition by tetratricopeptide repeat proteins. Protein Engineering, Design and Selection, 2004, 17( 4): 399–409
29 A, Mijit X, Wang Y, Li H, Xu Y, Chen W Xue . Mapping synthetic binding proteins epitopes on diverse protein targets by protein structure prediction and protein-protein docking. Computers in Biology and Medicine, 2023, 163: 107183
30 Y, Liu H Liu . Protein sequence design on given backbones with deep learning. Protein Engineering, Design and Selection, 2024, 37: gzad024
31 M, Steinegger J Söding . MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 2017, 35( 11): 1026–1028
32 A, Pierleoni V, Indio C, Savojardo P, Fariselli P L, Martelli R Casadio . MemPype: a pipeline for the annotation of eukaryotic membrane proteins. Nucleic Acids Research, 2011, 39( S2): W375–W380
33 S F, Altschul W, Gish W, Miller E W, Myers D J Lipman . Basic local alignment search tool. Journal of Molecular Biology, 1990, 215( 3): 403–410
34 M, Hebditch M A, Carballo-Amador S, Charonis R, Curtis J Warwicker . Protein–Sol: a web tool for predicting protein solubility from sequence. Bioinformatics, 2017, 33( 19): 3098–3100
35 T, Niwa B W, Ying K, Saito W, Jin S, Takada T, Ueda H Taguchi . Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106( 11): 4201–4206
36 E, Gasteiger C, Hoogland A, Gattiker S, Duvaud M R, Wilkins R D, Appel A Bairoch . Protein identification and analysis tools on the ExPASy server. In: Walker J M, ed. The Proteomics Protocols Handbook. Totowa: Humana, 2005, 571−607
37 C, Chen H, Chen Y, Zhang H R, Thomas M H, Frank Y, He R Xia . TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant, 2020, 13( 8): 1194–1202
38 M A, Lill M L Danielson . Computer-aided drug design platform using PyMOL. Journal of Computer-Aided Molecular Design, 2011, 25( 1): 13–19
39 E, Krissinel K Henrick . Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 2007, 372( 3): 774–797
40 B, Kuhlman D Baker . Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97( 19): 10383–10388
41 J, Jumper R, Evans A, Pritzel T, Green M, Figurnov O, Ronneberger K, Tunyasuvunakool R, Bates A, Žídek A, Potapenko A, Bridgland C, Meyer S A A, Kohl A J, Ballard A, Cowie B, Romera-Paredes S, Nikolov R, Jain J, Adler T, Back S, Petersen D, Reiman E, Clancy M, Zielinski M, Steinegger M, Pacholska T, Berghammer S, Bodenstein D, Silver O, Vinyals A W, Senior K, Kavukcuoglu P, Kohli D Hassabis . Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596( 7873): 583–589
42 C F, Wright S A, Teichmann J, Clarke C M Dobson . The importance of sequence diversity in the aggregation and evolution of proteins. Nature, 2005, 438( 7069): 878–881
43 R M, Kramer V R, Shende N, Motl C N, Pace J M Scholtz . Toward a molecular understanding of protein solubility: increased negative surface charge correlates with increased solubility. Biophysical Journal, 2012, 102( 8): 1907–1915
44 S, Navarro S Ventura . Computational re-design of protein structures to improve solubility. Expert Opinion on Drug Discovery, 2019, 14( 10): 1077–1088
45 P, Smialowski A J, Martin-Galiano A, Mikolajka T, Girschick T A, Holak D Frishman . Protein solubility: sequence based prediction and experimental verification. Bioinformatics, 2007, 23( 19): 2536–2542
46 S K Burley . Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development. Journal of Biological Chemistry, 2021, 296: 100559
47 R, Qing S, Hao E, Smorodina D, Jin A, Zalevsky S Zhang . Protein design: from the aspect of water solubility and stability. Chemical Reviews, 2022, 122( 18): 14085–14179
48 S, Patel P, Mathonet A M, Jaulent C G Ullman . Selection of a high-affinity WW domain against the extracellular region of VEGF receptor isoform-2 from a combinatorial library using CIS display. Protein Engineering, Design and Selection, 2013, 26( 4): 307–315
49 D, Saerens K, Conrath J, Govaert S Muyldermans . Disulfide bond introduction for general stabilization of immunoglobulin heavy-chain variable domains. Journal of Molecular Biology, 2008, 377( 2): 478–488
50 S, Reverdatto D S, Burz A Shekhtman . Peptide aptamers: development and applications. Current Topics in Medicinal Chemistry, 2015, 15( 12): 1082–1101
51 G B, Karlsson A, Jensen L F, Stevenson Y L, Woods D P, Lane M S Sørensen . Activation of p53 by scaffold-stabilised expression of Mdm2-binding peptides: visualisation of reporter gene induction at the single-cell level. British Journal of Cancer, 2004, 91( 8): 1488–1494
52 N Y, Kwon Y, Kim J O Lee . Structural diversity and flexibility of diabodies. Methods, 2019, 154: 136–142
53 T, Hey E, Fiedler R, Rudolph M Fiedler . Artificial, non-antibody binding proteins for pharmaceutical and industrial applications. Trends in Biotechnology, 2005, 23( 10): 514–522
54 D, Leenheer Dijke P, Ten C J Hipolito . A current perspective on applications of macrocyclic-peptide-based high-affinity ligands. Peptide Science, 2016, 106( 6): 889–900
55 M, Nicaise M, Valerio-Lepiniec P, Minard M Desmadril . Affinity transfer by CDR grafting on a nonimmunoglobulin scaffold. Protein Science, 2004, 13( 7): 1882–1891
56 K, Škrlec B, Štrukelj A Berlec . Non-immunoglobulin scaffolds: a focus on their targets. Trends in Biotechnology, 2015, 33( 7): 408–418
57 S, Sandhya R, Mudgal G, Kumar R, Sowdhamini N Srinivasan . Protein sequence design and its applications. Current Opinion in Structural Biology, 2016, 37: 71–80
58 M, Gebauer A, Schiefner G, Matschiner A Skerra . Combinatorial design of an anticalin directed against the extra-domain b for the specific targeting of oncofetal fibronectin. Journal of Molecular Biology, 2013, 425( 4): 780–802
[1] Mengting NIU, Yaojia CHEN, Chunyu WANG, Quan ZOU, Lei XU. Computational approaches for circRNA-disease association prediction: a review[J]. Front. Comput. Sci., 2025, 19(4): 194904-.
[2] Jingyu LIU, Shi CHEN, Li SHEN. A comprehensive survey on graph neural network accelerators[J]. Front. Comput. Sci., 2025, 19(2): 192104-.
[3] Lingling ZHAO, Shitao SONG, Pengyan WANG, Chunyu WANG, Junjie WANG, Maozu GUO. A MLP-Mixer and mixture of expert model for remaining useful life prediction of lithium-ion batteries[J]. Front. Comput. Sci., 2024, 18(5): 185329-.
[4] Enes DEDEOGLU, Himmet Toprak KESGIN, Mehmet Fatih AMASYALI. A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k[J]. Front. Comput. Sci., 2024, 18(4): 184315-.
[5] Hengyu LIU, Tiancheng ZHANG, Fan LI, Minghe YU, Ge YU. A probabilistic generative model for tracking multi-knowledge concept mastery probability[J]. Front. Comput. Sci., 2024, 18(3): 183602-.
[6] Mingzhen LI, Changxi LIU, Jianjin LIAO, Xuegui ZHENG, Hailong YANG, Rujun SUN, Jun XU, Lin GAN, Guangwen YANG, Zhongzhi LUAN, Depei QIAN. Towards optimized tensor code generation for deep learning on sunway many-core processor[J]. Front. Comput. Sci., 2024, 18(2): 182101-.
[7] Mingzhi YUAN, Kexue FU, Zhihao LI, Manning WANG. Decoupled deep hough voting for point cloud registration[J]. Front. Comput. Sci., 2024, 18(2): 182703-.
[8] Hanadi AL-MEKHLAFI, Shiguang LIU. Single image super-resolution: a comprehensive review and recent insight[J]. Front. Comput. Sci., 2024, 18(1): 181702-.
[9] Yufei ZENG, Zhixin LI, Zhenbin CHEN, Huifang MA. Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network[J]. Front. Comput. Sci., 2023, 17(6): 176340-.
[10] Yamin HU, Hao JIANG, Zongyao HU. Measuring code maintainability with deep neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176214-.
[11] Muazzam MAQSOOD, Sadaf YASMIN, Saira GILLANI, Maryam BUKHARI, Seungmin RHO, Sang-Soo YEO. An efficient deep learning-assisted person re-identification solution for intelligent video surveillance in smart cities[J]. Front. Comput. Sci., 2023, 17(4): 174329-.
[12] Tian WANG, Jiakun LI, Huai-Ning WU, Ce LI, Hichem SNOUSSI, Yang WU. ResLNet: deep residual LSTM network with longer input for action recognition[J]. Front. Comput. Sci., 2022, 16(6): 166334-.
[13] Donghong HAN, Yanru KONG, Jiayi HAN, Guoren WANG. A survey of music emotion recognition[J]. Front. Comput. Sci., 2022, 16(6): 166335-.
[14] Yi WEI, Mei XUE, Xin LIU, Pengxiang XU. Data fusing and joint training for learning with noisy labels[J]. Front. Comput. Sci., 2022, 16(6): 166338-.
[15] Pinzhuo TIAN, Yang GAO. Improving meta-learning model via meta-contrastive loss[J]. Front. Comput. Sci., 2022, 16(5): 165331-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed