|
|
Expanding the sequence spaces of synthetic binding protein using deep learning-based framework ProteinMPNN |
Yanlin LI1, Wantong JIAO1, Ruihan LIU1, Xuejin DENG1, Feng ZHU2( ), Weiwei XUE1( ) |
1. Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China 2. College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China |
|
|
Abstract Synthetic binding proteins (SBPs) with small size, marked solubility and stability, and high affinity are important for protein-based research, treatment, and diagnostics. Over the last several decades, site-directed mutagenesis and directed evolution of privileged protein scaffold make up the great majority of SBPs. The groundbreaking advancement of deep learning (DL) in recent years has revolutionized the problem of protein structure prediction and design. Here, for the first time, the cutting-edge DL framework ProteinMPNN was applied to fulfill the de novo design of 7,245 new synthetic proteins covering 55 different scaffolds based on the original SBPs collected in our SYNBIP database. Comprehensive bioinformatics analysis indicated that, in addition to the excellent performance of sequence recovery, the designed synthetic proteins have a significant improvement in solubility and thermal stability compared to the currently known SBPs. Meanwhile, 8 incredibly suitable protein scaffolds for ProteinMPNN have been identified, from which the designed synthetic proteins calculate displayed good performance on binding ability to their corresponding protein targets. Therefore, the DL-based framework shown great potential in target-directed de novo generation of synthetic protein library with high quality, which could assist experimental biologists to rational protein engineering to discover novel functional protein binders.
|
Keywords
synthetic protein
deep learning
de novo protein design
solubility
stability
|
Corresponding Author(s):
Feng ZHU,Weiwei XUE
|
Just Accepted Date: 19 April 2024
Issue Date: 20 June 2024
|
|
1 |
M, Gebauer A Skerra . Engineered protein scaffolds as next-generation therapeutics. Annual Review of Pharmacology and Toxicology, 2020, 60: 391–415
|
2 |
X, Wang F, Li W, Qiu B, Xu Y, Li X, Lian H, Yu Z, Zhang J, Wang Z, Li W, Xue F Zhu . SYNBIP: synthetic binding proteins for research, diagnosis and therapy. Nucleic Acids Research, 2022, 50( D1): D560–D570
|
3 |
P S, Huang S E, Boyken D Baker . The coming of age of de novo protein design. Nature, 2016, 537( 7620): 320–327
|
4 |
E P, Carpenter K, Beis A D, Cameron S Iwata . Overcoming the challenges of membrane protein crystallography. Current Opinion in Structural Biology, 2008, 18( 5): 581–586
|
5 |
C, Zeymer D Hilvert . Directed evolution of protein catalysts. Annual Review of Biochemistry, 2018, 87: 131–157
|
6 |
M K M, Engqvist K S Rabe . Applications of protein engineering and directed evolution in plant research. Plant Physiology, 2019, 179( 3): 907–917
|
7 |
L, Cao B, Coventry I, Goreshnik B, Huang W, Sheffler J S, Park K M, Jude I, Marković R U, Kadam K H G, Verschueren K, Verstraete S T R, Walsh N, Bennett A, Phal A, Yang L, Kozodoy M, DeWitt L, Picton L, Miller E M, Strauch N D, DeBouver A, Pires A K, Bera S, Halabiya B, Hammerson W, Yang S, Bernard L, Stewart I A, Wilson H, Ruohola-Baker J, Schlessinger S, Lee S N, Savvides K C, Garcia D Baker . Design of protein-binding proteins from the target structure alone. Nature, 2022, 605( 7910): 551–560
|
8 |
Baker D. What has de novo protein design taught us about protein folding and biophysics? Protein Science, 2019, 28(4): 678−683
|
9 |
T, Liang C, Jiang J, Yuan Y, Othman X Q, Xie Z Feng . Differential performance of RoseTTAFold in antibody modeling. Briefings in Bioinformatics, 2022, 23( 5): bbac152
|
10 |
W, Chen G, Qian Y, Wan D, Chen X, Zhou W, Yuan X Duan . Mesokinetics as a tool bridging the microscopic-to-macroscopic transition to rationalize catalyst design. Accounts of Chemical Research, 2022, 55( 22): 3230–3241
|
11 |
W, Chen W, Fu X, Duan B, Chen G, Qian R, Si X, Zhou W, Yuan D Chen . Taming electrons in Pt/C catalysts to boost the mesokinetics of hydrogen production. Engineering, 2022, 14: 124–133
|
12 |
T, Liang H, Chen J, Yuan C, Jiang Y, Hao Y, Wang Z, Feng X Q Xie . IsAb: a computational protocol for antibody design. Briefings in Bioinformatics, 2021, 22( 5): bbab143
|
13 |
B, Kuhlman P Bradley . Advances in protein structure prediction and design. Nature Reviews Molecular Cell Biology, 2019, 20( 11): 681–697
|
14 |
H, Khakzad I, Igashov A, Schneuing C, Goverde M, Bronstein B Correia . A new age in protein design empowered by deep learning. Cell Systems, 2023, 14( 11): 925–939
|
15 |
F, Wang X, Feng R, Kong S Chang . Generating new protein sequences by using dense network and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20( 2): 4178–4197
|
16 |
A, Strokach D, Becerra C, Corbi-Verge A, Perez-Riba P M Kim . Fast and flexible protein design using deep graph neural networks. Cell Systems, 2020, 11( 4): 402–411.e4
|
17 |
N, Brandes D, Ofer Y, Peleg N, Rappoport M Linial . ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics, 2022, 38( 8): 2102–2110
|
18 |
I, Anishchenko S J, Pellock T M, Chidyausiku T A, Ramelot S, Ovchinnikov J, Hao K, Bafna C, Norn A, Kang A K, Bera F, Dimaio L, Carter C M, Chow G T, Montelione D Baker . De novo protein design by deep network hallucination. Nature, 2021, 600( 7889): 547–552
|
19 |
A H W, Yeh C, Norn Y, Kipnis D, Tischer S J, Pellock D, Evans P, Ma G R, Lee J Z, Zhang I, Anishchenko B, Coventry L, Cao J, Dauparas S, Halabiya M, DeWitt L, Carter K N, Houk D Baker . De novo design of luciferases using deep learning. Nature, 2023, 614( 7949): 774–780
|
20 |
W, Ding K, Nakai H Gong . Protein design via deep learning. Briefings in Bioinformatics, 2022, 23( 3): bbac102
|
21 |
E, Lin C H, Lin H Y Lane . De novo peptide and protein design using generative adversarial networks: an update. Journal of Chemical Information and Modeling, 2022, 62( 4): 761–774
|
22 |
R, Yin B Y, Feng A, Varshney B G Pierce . Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Science, 2022, 31( 8): e4379
|
23 |
J, Dauparas I, Anishchenko N, Bennett H, Bai R J, Ragotte L F, Milles B I M, Wicky A, Courbet Haas R J, de N, Bethel P J Y, Leung T F, Huddy S, Pellock D, Tischer F, Chan B, Koepnick H, Nguyen A, Kang B, Sankaran A K, Bera N P, King D Baker . Robust deep learning–based protein sequence design using ProteinMPNN. Science, 2022, 378( 6615): 49–56
|
24 |
S K, Burley C, Bhikadiya C, Bi S, Bittrich H, Chao L, Chen P A, Craig G V, Crichlow K, Dalenberg J M, Duarte S, Dutta M, Fayazi Z, Feng J W, Flatt S, Ganesan S, Ghosh D S, Goodsell R K, Green V, Guranovic J, Henry B P, Hudson I, Khokhriakov C L, Lawson Y, Liang R, Lowe E, Peisach I, Persikova D W, Piehl Y, Rose A, Sali J, Segura M, Sekharan C, Shao B, Vallat M, Voigt B, Webb J D, Westbrook S, Whetstone J Y, Young A, Zalevsky C Zardecki . RCSB protein data bank (RCSB. org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Research, 2023, 51( D1): D488–D508
|
25 |
N R, Bennett B, Coventry I, Goreshnik B, Huang A, Allen D, Vafeados Y P, Peng J, Dauparas M, Baek L, Stewart F, Dimaio Munck S, De S N, Savvides D Baker . Improving de novo protein binder design with deep learning. Nature Communications, 2023, 14( 1): 2625
|
26 |
C E, Sequeiros-Borja B, Surpeta J Brezovsky . Recent advances in user-friendly computational tools to engineer protein function. Briefings in Bioinformatics, 2021, 22( 3): bbaa150
|
27 |
Z, Du H, Su W, Wang L, Ye H, Wei Z, Peng I, Anishchenko D, Baker J Yang . The trRosetta server for fast and accurate protein structure prediction. Nature Protocols, 2021, 16( 12): 5634–5651
|
28 |
A L, Cortajarena T, Kajander W, Pan M J, Cocco L Regan . Protein design to understand peptide ligand recognition by tetratricopeptide repeat proteins. Protein Engineering, Design and Selection, 2004, 17( 4): 399–409
|
29 |
A, Mijit X, Wang Y, Li H, Xu Y, Chen W Xue . Mapping synthetic binding proteins epitopes on diverse protein targets by protein structure prediction and protein-protein docking. Computers in Biology and Medicine, 2023, 163: 107183
|
30 |
Y, Liu H Liu . Protein sequence design on given backbones with deep learning. Protein Engineering, Design and Selection, 2024, 37: gzad024
|
31 |
M, Steinegger J Söding . MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 2017, 35( 11): 1026–1028
|
32 |
A, Pierleoni V, Indio C, Savojardo P, Fariselli P L, Martelli R Casadio . MemPype: a pipeline for the annotation of eukaryotic membrane proteins. Nucleic Acids Research, 2011, 39( S2): W375–W380
|
33 |
S F, Altschul W, Gish W, Miller E W, Myers D J Lipman . Basic local alignment search tool. Journal of Molecular Biology, 1990, 215( 3): 403–410
|
34 |
M, Hebditch M A, Carballo-Amador S, Charonis R, Curtis J Warwicker . Protein–Sol: a web tool for predicting protein solubility from sequence. Bioinformatics, 2017, 33( 19): 3098–3100
|
35 |
T, Niwa B W, Ying K, Saito W, Jin S, Takada T, Ueda H Taguchi . Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106( 11): 4201–4206
|
36 |
E, Gasteiger C, Hoogland A, Gattiker S, Duvaud M R, Wilkins R D, Appel A Bairoch . Protein identification and analysis tools on the ExPASy server. In: Walker J M, ed. The Proteomics Protocols Handbook. Totowa: Humana, 2005, 571−607
|
37 |
C, Chen H, Chen Y, Zhang H R, Thomas M H, Frank Y, He R Xia . TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant, 2020, 13( 8): 1194–1202
|
38 |
M A, Lill M L Danielson . Computer-aided drug design platform using PyMOL. Journal of Computer-Aided Molecular Design, 2011, 25( 1): 13–19
|
39 |
E, Krissinel K Henrick . Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 2007, 372( 3): 774–797
|
40 |
B, Kuhlman D Baker . Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97( 19): 10383–10388
|
41 |
J, Jumper R, Evans A, Pritzel T, Green M, Figurnov O, Ronneberger K, Tunyasuvunakool R, Bates A, Žídek A, Potapenko A, Bridgland C, Meyer S A A, Kohl A J, Ballard A, Cowie B, Romera-Paredes S, Nikolov R, Jain J, Adler T, Back S, Petersen D, Reiman E, Clancy M, Zielinski M, Steinegger M, Pacholska T, Berghammer S, Bodenstein D, Silver O, Vinyals A W, Senior K, Kavukcuoglu P, Kohli D Hassabis . Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596( 7873): 583–589
|
42 |
C F, Wright S A, Teichmann J, Clarke C M Dobson . The importance of sequence diversity in the aggregation and evolution of proteins. Nature, 2005, 438( 7069): 878–881
|
43 |
R M, Kramer V R, Shende N, Motl C N, Pace J M Scholtz . Toward a molecular understanding of protein solubility: increased negative surface charge correlates with increased solubility. Biophysical Journal, 2012, 102( 8): 1907–1915
|
44 |
S, Navarro S Ventura . Computational re-design of protein structures to improve solubility. Expert Opinion on Drug Discovery, 2019, 14( 10): 1077–1088
|
45 |
P, Smialowski A J, Martin-Galiano A, Mikolajka T, Girschick T A, Holak D Frishman . Protein solubility: sequence based prediction and experimental verification. Bioinformatics, 2007, 23( 19): 2536–2542
|
46 |
S K Burley . Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development. Journal of Biological Chemistry, 2021, 296: 100559
|
47 |
R, Qing S, Hao E, Smorodina D, Jin A, Zalevsky S Zhang . Protein design: from the aspect of water solubility and stability. Chemical Reviews, 2022, 122( 18): 14085–14179
|
48 |
S, Patel P, Mathonet A M, Jaulent C G Ullman . Selection of a high-affinity WW domain against the extracellular region of VEGF receptor isoform-2 from a combinatorial library using CIS display. Protein Engineering, Design and Selection, 2013, 26( 4): 307–315
|
49 |
D, Saerens K, Conrath J, Govaert S Muyldermans . Disulfide bond introduction for general stabilization of immunoglobulin heavy-chain variable domains. Journal of Molecular Biology, 2008, 377( 2): 478–488
|
50 |
S, Reverdatto D S, Burz A Shekhtman . Peptide aptamers: development and applications. Current Topics in Medicinal Chemistry, 2015, 15( 12): 1082–1101
|
51 |
G B, Karlsson A, Jensen L F, Stevenson Y L, Woods D P, Lane M S Sørensen . Activation of p53 by scaffold-stabilised expression of Mdm2-binding peptides: visualisation of reporter gene induction at the single-cell level. British Journal of Cancer, 2004, 91( 8): 1488–1494
|
52 |
N Y, Kwon Y, Kim J O Lee . Structural diversity and flexibility of diabodies. Methods, 2019, 154: 136–142
|
53 |
T, Hey E, Fiedler R, Rudolph M Fiedler . Artificial, non-antibody binding proteins for pharmaceutical and industrial applications. Trends in Biotechnology, 2005, 23( 10): 514–522
|
54 |
D, Leenheer Dijke P, Ten C J Hipolito . A current perspective on applications of macrocyclic-peptide-based high-affinity ligands. Peptide Science, 2016, 106( 6): 889–900
|
55 |
M, Nicaise M, Valerio-Lepiniec P, Minard M Desmadril . Affinity transfer by CDR grafting on a nonimmunoglobulin scaffold. Protein Science, 2004, 13( 7): 1882–1891
|
56 |
K, Škrlec B, Štrukelj A Berlec . Non-immunoglobulin scaffolds: a focus on their targets. Trends in Biotechnology, 2015, 33( 7): 408–418
|
57 |
S, Sandhya R, Mudgal G, Kumar R, Sowdhamini N Srinivasan . Protein sequence design and its applications. Current Opinion in Structural Biology, 2016, 37: 71–80
|
58 |
M, Gebauer A, Schiefner G, Matschiner A Skerra . Combinatorial design of an anticalin directed against the extra-domain b for the specific targeting of oncofetal fibronectin. Journal of Molecular Biology, 2013, 425( 4): 780–802
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|