|
|
|
Protein acetylation sites with complex-valued polynomial model |
Wenzheng BAO1, Bin YANG2( ) |
1. School of Information Engineering, Xuzhou University of Technology, Xuzhou 221018, China 2. School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China |
|
|
|
|
Abstract Protein acetylation refers to a process of adding acetyl groups (CH3CO-) to lysine residues on protein chains. As one of the most commonly used protein post-translational modifications, lysine acetylation plays an important role in different organisms. In our study, we developed a human-specific method which uses a cascade classifier of complex-valued polynomial model (CVPM), combined with sequence and structural feature descriptors to solve the problem of imbalance between positive and negative samples. Complex-valued gene expression programming and differential evolution are utilized to search the optimal CVPM model. We also made a systematic and comprehensive analysis of the acetylation data and the prediction results. The performances of our proposed method are 79.15% in Sp, 78.17% in Sn, 78.66% in ACC 78.76% in F1, and 0.5733 in MCC, which performs better than other state-of-the-art methods.
|
| Keywords
protein acetylation
complex-valued polynomial model
machine learning
|
|
Corresponding Author(s):
Bin YANG
|
|
Just Accepted Date: 30 December 2022
Issue Date: 17 April 2023
|
|
| 1 |
T Kouzarides . Chromatin modifications and their function. Cell, 2007, 128( 4): 693–705
|
| 2 |
M, Mann O N Jensen . Proteomic analysis of post-translational modifications. Nature Biotechnology, 2003, 21( 3): 255–261
|
| 3 |
Lu CT, Lee TY, Chen YJ, et al. “An intelligent system for identifying acetylated lysine on histones and nonhistone proteins,” BioMed research international, 6(528650), 2014.
|
| 4 |
Deng W, Wang C, Zhang Y, et al. “GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences,” Scientific reports, 6(39787), 2016.
|
| 5 |
J, Wysocka T, Swigut H, Xiao T A, Milne S Y, Kwon J, Landry M, Kauer A J, Tackett B T, Chait P, Badenhorst C, Wu C D Allis . A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature, 2006, 442( 7098): 86–90
|
| 6 |
J, Wysocka T, Swigut T A, Milne Y, Dou X, Zhang A L, Burlingame R G, Roeder A H, Brivanlou C D Allis . WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development. Cell, 2005, 121( 6): 859–872
|
| 7 |
L, Zeng M M Zhou . Bromodomain: an acetyl-lysine binding domain. FEBS Letters, 2002, 513( 1): 124–128
|
| 8 |
T, Jenuwein C D Allis . Translating the histone code. Science, 2001, 293( 5532): 1074–1080
|
| 9 |
R, Marmorstein S Y Roth . Histone acetyltransferases: function, structure, and catalysis. Current Opinion in Genetics & Development, 2001, 11( 2): 155–161
|
| 10 |
A M, Bode Z Dong . Post-translational modification of p53 in tumorigenesis. Nature Reviews Cancer, 2004, 4( 10): 793–805
|
| 11 |
G, Walsh R Jefferis . Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology, 2006, 24( 10): 1241–1252
|
| 12 |
S, Westermann K Weber . Post-translational modifications regulate microtubule function. Nature Reviews Molecular Cell Biology, 2003, 4( 12): 938–948
|
| 13 |
C, Janke J C Bulinski . Post-translational regulation of the microtubule cytoskeleton: mechanisms and functions. Nature Reviews Molecular Cell Biology, 2011, 12( 12): 773–786
|
| 14 |
Y, Xu X J, Shao L Y, Wu N Y, Deng K C Chou . iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 2013, 1: e171
|
| 15 |
W R, Qiu X, Xiao W Z, Lin K C Chou . iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Research International, 2014, 947416
|
| 16 |
Y, Xu X, Wen X J, Shao N Y, Deng K C Chou . iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences, 2014, 15( 5): 7594–7610
|
| 17 |
X, Xiao H X, Ye Z, Liu J H, Jia K C Chou . iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget, 2016, 7( 23): 34180–34189
|
| 18 |
Y, Tu Y, Lin C, Hou S Mao . Complex-valued networks for automatic modulation classification. IEEE Transactions on Vehicular Technology, 2020, 69( 9): 10085–10089
|
| 19 |
S, Rawat K P S, Rana V Kumar . A novel complex-valued convolutional neural network for medical image denoising. Biomedical Signal Processing and Control, 2021, 69: 102859
|
| 20 |
B, Yang W Bao . Complex-valued ordinary differential equation modeling for time series identification. IEEE Access, 2019, 7: 41033–41042
|
| 21 |
W, Chen H, Tang J, Ye H, Lin K C Chou . iRNA-PseU: identifying RNA pseudouridine sites. Molecular Therapy Nucleic Acids, 2016, 5: e332
|
| 22 |
J, Jia Z, Liu X, Xiao B, Liu K C Chou . iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget, 2016, 7( 23): 34558–34570
|
| 23 |
J, Jia L, Zhang Z, Liu X, Xiao K C Chou . pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics, 2016, 32( 20): 3133–3141
|
| 24 |
Z, Liu X, Xiao D J, Yu J, Jia W R, Qiu K C Chou . pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67
|
| 25 |
W R, Qiu B Q, Sun X, Xiao Z C, Xu K C Chou . iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics, 2016, 32( 20): 3116–3123
|
| 26 |
W R, Qiu X, Xiao Z C, Xu K C Chou . iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget, 2016, 7( 32): 51270–51283
|
| 27 |
P, Feng H, Ding H, Yang W, Chen H, Lin K C Chou . iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Molecular Therapy Nucleic Acids, 2017, 7: 155–163
|
| 28 |
W, Bao Z, Huang C A, Yuan D S Huang . Pupylation sites prediction with ensemble classification model. International Journal of Data Mining and Bioinformatics, 2017, 18( 2): 91–104
|
| 29 |
W R, Qiu S Y, Jiang Z C, Xu X, Xiao K C Chou . iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget, 2017, 8( 25): 41178–41188
|
| 30 |
W R, Qiu B Q, Sun X, Xiao D, Xu K C Chou . iPhos‐PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Molecular Informatics, 2017, 36( 5–6): 1600010
|
| 31 |
W R, Qiu B Q, Sun X, Xiao Z C, Xu J H, Jia K C Chou . iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics, 2018, 110( 5): 239–246
|
| 32 |
Y, Xu Z, Wang C, Li K C Chou . iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Medicinal Chemistry, 2017, 13( 6): 544–551
|
| 33 |
W, Bao Z, Jiang D S Huang . Novel human microbe-disease association prediction using network consistency projection. BMC Bioinformatics, 2017, 18( S16): 543
|
| 34 |
K C Chou . Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 1996, 233( 1): 1–14
|
| 35 |
Y D, Khan N, Rasool W, Hussain S A, Khan K C Chou . iPhosT-PseAAC: identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Analytical Biochemistry, 2018, 550: 109–116
|
| 36 |
B, Liu F, Liu X, Wang J, Chen L, Fang K C Chou . Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 2015, 43( W1): W65–W71
|
| 37 |
K C Chou . Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 2015, 11( 3): 218–234
|
| 38 |
L F, Yuan C, Ding S H, Guo H, Ding W, Chen H Lin . Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro, 2013, 27( 2): 852–856
|
| 39 |
W, Chen H, Lin K C Chou . Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Molecular Biosystems, 2015, 11( 10): 2620–2634
|
| 40 |
X, Cheng S G, Zhao W Z, Lin X, Xiao K C Chou . pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics, 2017, 33( 22): 3524–3531
|
| 41 |
X, Cheng X, Xiao K C Chou . pLoc-mGneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics, 2018, 110( 4): 231–239
|
| 42 |
X, Cheng X, Xiao K C Chou . pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics, 2018, 110( 1): 50–58
|
| 43 |
W, Bao Y, Chen D Wang . Prediction of protein structure classes with flexible neural tree. Bio-Medical Materials and Engineering, 2014, 24( 6): 3797–3806
|
| 44 |
W, Bao D, Wang Y Chen . Classification of protein structure classes on flexible neutral tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14( 5): 1122–1133
|
| 45 |
Y, Chen B, Yang J, Dong A Abraham . Time-series forecasting using flexible neural tree model. Information Sciences, 2005, 174( 3–4): 219–235
|
| 46 |
Y, Chen A, Abraham B Yang . Hybrid flexible neural-tree-based intrusion detection systems. International Journal of Intelligent Systems, 2007, 22( 4): 337–352
|
| 47 |
Y, Chen A, Abraham B Yang . Feature selection and classification using flexible neural tree. Neurocomputing, 2006, 70( 1–3): 305–313
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|