Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (3) : 183904    https://doi.org/10.1007/s11704-023-2640-9
Interdisciplinary
Protein acetylation sites with complex-valued polynomial model
Wenzheng BAO1, Bin YANG2()
1. School of Information Engineering, Xuzhou University of Technology, Xuzhou 221018, China
2. School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China
 Download: PDF(7807 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Protein acetylation refers to a process of adding acetyl groups (CH3CO-) to lysine residues on protein chains. As one of the most commonly used protein post-translational modifications, lysine acetylation plays an important role in different organisms. In our study, we developed a human-specific method which uses a cascade classifier of complex-valued polynomial model (CVPM), combined with sequence and structural feature descriptors to solve the problem of imbalance between positive and negative samples. Complex-valued gene expression programming and differential evolution are utilized to search the optimal CVPM model. We also made a systematic and comprehensive analysis of the acetylation data and the prediction results. The performances of our proposed method are 79.15% in Sp, 78.17% in Sn, 78.66% in ACC 78.76% in F1, and 0.5733 in MCC, which performs better than other state-of-the-art methods.

Keywords protein acetylation      complex-valued polynomial model      machine learning     
Corresponding Author(s): Bin YANG   
Just Accepted Date: 30 December 2022   Issue Date: 17 April 2023
 Cite this article:   
Wenzheng BAO,Bin YANG. Protein acetylation sites with complex-valued polynomial model[J]. Front. Comput. Sci., 2024, 18(3): 183904.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2640-9
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I3/183904
Fig.1  The outlines of our proposed method
Fig.2  An example of the chromosome in CVGEP (a), and the expression tree decoded (b)
Fig.3  The flowchart of CVGEP
  
Fig.4  The Sensitivity of each model
Fig.5  The Specificity of each model
Fig.6  The Accuracy of each model
Fig.7  The MCC of each model
Fig.8  The Sensitivity of each feature
Fig.9  The Specificity of each feature
Fig.10  The Accuracy of each feature
Fig.11  The MCC of each feature
MethodsSn/%Sp/%ACCMCCF1
Neural network75.2855.3865.330.31290.6847
SVM60.6262.8861.750.23510.6131
Random forest71.9768.4270.200.40420.7071
CNN73.1472.0472.590.45190.7274
RNN65.6074.2969.940.40040.6858
Proposed algorithm79.1578.1778.660.57330.7876
Tab.1  The performances of different classification models
Fig.12  ROC curves of six methods
  
  
1 T Kouzarides . Chromatin modifications and their function. Cell, 2007, 128( 4): 693–705
2 M, Mann O N Jensen . Proteomic analysis of post-translational modifications. Nature Biotechnology, 2003, 21( 3): 255–261
3 Lu CT, Lee TY, Chen YJ, et al. “An intelligent system for identifying acetylated lysine on histones and nonhistone proteins,” BioMed research international, 6(528650), 2014.
4 Deng W, Wang C, Zhang Y, et al. “GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences,” Scientific reports, 6(39787), 2016.
5 J, Wysocka T, Swigut H, Xiao T A, Milne S Y, Kwon J, Landry M, Kauer A J, Tackett B T, Chait P, Badenhorst C, Wu C D Allis . A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature, 2006, 442( 7098): 86–90
6 J, Wysocka T, Swigut T A, Milne Y, Dou X, Zhang A L, Burlingame R G, Roeder A H, Brivanlou C D Allis . WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development. Cell, 2005, 121( 6): 859–872
7 L, Zeng M M Zhou . Bromodomain: an acetyl-lysine binding domain. FEBS Letters, 2002, 513( 1): 124–128
8 T, Jenuwein C D Allis . Translating the histone code. Science, 2001, 293( 5532): 1074–1080
9 R, Marmorstein S Y Roth . Histone acetyltransferases: function, structure, and catalysis. Current Opinion in Genetics & Development, 2001, 11( 2): 155–161
10 A M, Bode Z Dong . Post-translational modification of p53 in tumorigenesis. Nature Reviews Cancer, 2004, 4( 10): 793–805
11 G, Walsh R Jefferis . Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology, 2006, 24( 10): 1241–1252
12 S, Westermann K Weber . Post-translational modifications regulate microtubule function. Nature Reviews Molecular Cell Biology, 2003, 4( 12): 938–948
13 C, Janke J C Bulinski . Post-translational regulation of the microtubule cytoskeleton: mechanisms and functions. Nature Reviews Molecular Cell Biology, 2011, 12( 12): 773–786
14 Y, Xu X J, Shao L Y, Wu N Y, Deng K C Chou . iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 2013, 1: e171
15 W R, Qiu X, Xiao W Z, Lin K C Chou . iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Research International, 2014, 947416
16 Y, Xu X, Wen X J, Shao N Y, Deng K C Chou . iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences, 2014, 15( 5): 7594–7610
17 X, Xiao H X, Ye Z, Liu J H, Jia K C Chou . iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget, 2016, 7( 23): 34180–34189
18 Y, Tu Y, Lin C, Hou S Mao . Complex-valued networks for automatic modulation classification. IEEE Transactions on Vehicular Technology, 2020, 69( 9): 10085–10089
19 S, Rawat K P S, Rana V Kumar . A novel complex-valued convolutional neural network for medical image denoising. Biomedical Signal Processing and Control, 2021, 69: 102859
20 B, Yang W Bao . Complex-valued ordinary differential equation modeling for time series identification. IEEE Access, 2019, 7: 41033–41042
21 W, Chen H, Tang J, Ye H, Lin K C Chou . iRNA-PseU: identifying RNA pseudouridine sites. Molecular Therapy Nucleic Acids, 2016, 5: e332
22 J, Jia Z, Liu X, Xiao B, Liu K C Chou . iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget, 2016, 7( 23): 34558–34570
23 J, Jia L, Zhang Z, Liu X, Xiao K C Chou . pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics, 2016, 32( 20): 3133–3141
24 Z, Liu X, Xiao D J, Yu J, Jia W R, Qiu K C Chou . pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67
25 W R, Qiu B Q, Sun X, Xiao Z C, Xu K C Chou . iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics, 2016, 32( 20): 3116–3123
26 W R, Qiu X, Xiao Z C, Xu K C Chou . iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget, 2016, 7( 32): 51270–51283
27 P, Feng H, Ding H, Yang W, Chen H, Lin K C Chou . iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Molecular Therapy Nucleic Acids, 2017, 7: 155–163
28 W, Bao Z, Huang C A, Yuan D S Huang . Pupylation sites prediction with ensemble classification model. International Journal of Data Mining and Bioinformatics, 2017, 18( 2): 91–104
29 W R, Qiu S Y, Jiang Z C, Xu X, Xiao K C Chou . iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget, 2017, 8( 25): 41178–41188
30 W R, Qiu B Q, Sun X, Xiao D, Xu K C Chou . iPhos‐PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Molecular Informatics, 2017, 36( 5–6): 1600010
31 W R, Qiu B Q, Sun X, Xiao Z C, Xu J H, Jia K C Chou . iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics, 2018, 110( 5): 239–246
32 Y, Xu Z, Wang C, Li K C Chou . iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Medicinal Chemistry, 2017, 13( 6): 544–551
33 W, Bao Z, Jiang D S Huang . Novel human microbe-disease association prediction using network consistency projection. BMC Bioinformatics, 2017, 18( S16): 543
34 K C Chou . Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 1996, 233( 1): 1–14
35 Y D, Khan N, Rasool W, Hussain S A, Khan K C Chou . iPhosT-PseAAC: identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Analytical Biochemistry, 2018, 550: 109–116
36 B, Liu F, Liu X, Wang J, Chen L, Fang K C Chou . Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 2015, 43( W1): W65–W71
37 K C Chou . Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 2015, 11( 3): 218–234
38 L F, Yuan C, Ding S H, Guo H, Ding W, Chen H Lin . Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro, 2013, 27( 2): 852–856
39 W, Chen H, Lin K C Chou . Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Molecular Biosystems, 2015, 11( 10): 2620–2634
40 X, Cheng S G, Zhao W Z, Lin X, Xiao K C Chou . pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics, 2017, 33( 22): 3524–3531
41 X, Cheng X, Xiao K C Chou . pLoc-mGneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics, 2018, 110( 4): 231–239
42 X, Cheng X, Xiao K C Chou . pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics, 2018, 110( 1): 50–58
43 W, Bao Y, Chen D Wang . Prediction of protein structure classes with flexible neural tree. Bio-Medical Materials and Engineering, 2014, 24( 6): 3797–3806
44 W, Bao D, Wang Y Chen . Classification of protein structure classes on flexible neutral tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14( 5): 1122–1133
45 Y, Chen B, Yang J, Dong A Abraham . Time-series forecasting using flexible neural tree model. Information Sciences, 2005, 174( 3–4): 219–235
46 Y, Chen A, Abraham B Yang . Hybrid flexible neural-tree-based intrusion detection systems. International Journal of Intelligent Systems, 2007, 22( 4): 337–352
47 Y, Chen A, Abraham B Yang . Feature selection and classification using flexible neural tree. Neurocomputing, 2006, 70( 1–3): 305–313
[1] FCS-22640-OF-WB_suppl_1 Download
[1] Haochen ZHAO, Jian ZHONG, Xiao LIANG, Chenliang XIE, Shaokai WANG. Application of machine learning in drug side effect prediction: databases, methods, and challenges[J]. Front. Comput. Sci., 2025, 19(5): 195902-.
[2] Mengting NIU, Yaojia CHEN, Chunyu WANG, Quan ZOU, Lei XU. Computational approaches for circRNA-disease association prediction: a review[J]. Front. Comput. Sci., 2025, 19(4): 194904-.
[3] Yan LIN, Jiashu WANG, Xiaowei LIU, Xueqin XIE, De WU, Junjie ZHANG, Hui DING. A computational model to identify fertility-related proteins using sequence information[J]. Front. Comput. Sci., 2024, 18(1): 181902-.
[4] Bin-Bin JIA, Jun-Ying LIU, Jun-Yi HANG, Min-Ling ZHANG. Learning label-specific features for decomposition-based multi-class classification[J]. Front. Comput. Sci., 2023, 17(6): 176348-.
[5] Lerina AVERSANO, Mario Luca BERNARDI, Marta CIMITILE, Martina IAMMARINO, Debora MONTANO. Forecasting technical debt evolution in software systems: an empirical study[J]. Front. Comput. Sci., 2023, 17(3): 173210-.
[6] Zhengxiong HOU, Hong SHEN, Xingshe ZHOU, Jianhua GU, Yunlan WANG, Tianhai ZHAO. Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions[J]. Front. Comput. Sci., 2022, 16(5): 165107-.
[7] Zhen SONG, Yu GU, Zhigang WANG, Ge YU. DRPS: efficient disk-resident parameter servers for distributed machine learning[J]. Front. Comput. Sci., 2022, 16(4): 164321-.
[8] Yu OU, Lang LI. Side-channel analysis attacks based on deep learning network[J]. Front. Comput. Sci., 2022, 16(2): 162303-.
[9] Xinyu TONG, Ziao YU, Xiaohua TIAN, Houdong GE, Xinbing WANG. Improving accuracy of automatic optical inspection with machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161310-.
[10] Suyu MEI. A framework combines supervised learning and dense subgraphs discovery to predict protein complexes[J]. Front. Comput. Sci., 2022, 16(1): 161901-.
[11] Yi REN, Ning XU, Miaogen LING, Xin GENG. Label distribution for multimodal machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161306-.
[12] Xiaobing SUN, Tianchi ZHOU, Rongcun WANG, Yucong DUAN, Lili BO, Jianming CHANG. Experience report: investigating bug fixes in machine learning frameworks/libraries[J]. Front. Comput. Sci., 2021, 15(6): 156212-.
[13] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[14] Yan-Ping SUN, Min-Ling ZHANG. Compositional metric learning for multi-label classification[J]. Front. Comput. Sci., 2021, 15(5): 155320-.
[15] Jian SUN, Pu-Feng DU. Predicting protein subchloroplast locations: the 10th anniversary[J]. Front. Comput. Sci., 2021, 15(2): 152901-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed