Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2015, Vol. 9 Issue (4) : 643-651    https://doi.org/10.1007/s11704-014-4089-3
RESEARCH ARTICLE
Identification of cytokine via an improved genetic algorithm
Xiangxiang ZENG,Sisi YUAN,Xianxian HUANG,Quan ZOU()
Department of Computer Science, Xiamen University, Xiamen 361005, China
 Download: PDF(476 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

With the explosive growth in the number of protein sequences generated in the postgenomic age, research into identifying cytokines from proteins and detecting their biochemical mechanisms becomes increasingly important. Unfortunately, the identification of cytokines from proteins is challenging due to a lack of understanding of the structure space provided by the proteins and the fact that only a small number of cytokines exists in massive proteins. In view of fact that a proteins sequence is conceptually similar to a mapping of words to meaning, n-gram, a type of probabilistic language model, is explored to extract features for proteins. The second challenge focused on in this work is genetic algorithms, a search heuristic that mimics the process of natural selection, that is utilized to develop a classifier for overcoming the protein imbalance problem to generate precise prediction of cytokines in proteins. Experiments carried on imbalanced proteins data set show that our methods outperform traditional algorithms in terms of the prediction ability.

Keywords n-grams      genetic algorithm      cytokine identification      sampling      imbalanced data     
Corresponding Author(s): Quan ZOU   
Issue Date: 07 September 2015
 Cite this article:   
Xiangxiang ZENG,Sisi YUAN,Xianxian HUANG, et al. Identification of cytokine via an improved genetic algorithm[J]. Front. Comput. Sci., 2015, 9(4): 643-651.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-014-4089-3
https://academic.hep.com.cn/fcs/EN/Y2015/V9/I4/643
1 Zou Q, Li X, Jiang Y, Zhao Y, Wang G. BinMemPredict: a Web server and software for predicting membrane protein types. Current Proteomics, 2013, 10(1): 2―9
https://doi.org/10.2174/1570164611310010002
2 Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M. GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model. Nucleic AcidsResearch, 2005, 33(suppl 2): W148―W153
https://doi.org/10.1093/nar/gki495
3 Nielsen H, Engelbrecht J, Brunak S, Heijne G V. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems, 1997, 8(5-6): 581―599
https://doi.org/10.1142/S0129065797000537
4 Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403―410
https://doi.org/10.1016/S0022-2836(05)80360-2
5 Pearson W R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 1991, 11(3): 635―650
https://doi.org/10.1016/0888-7543(91)90071-L
6 Huang N, Chen H, Sun Z. CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Engineering Design and Selection, 2005, 18(8): 365―368
https://doi.org/10.1093/protein/gzi041
7 Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC bioinformatics, 2009, 10(1): 381
https://doi.org/10.1186/1471-2105-10-381
8 Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q. Hierarchical classification of protein folds using a novel ensemble classifier. PloS one, 2013, 8(2): e56499
https://doi.org/10.1371/journal.pone.0056499
9 Zou Q, Chen W, Huang Y, Liu X, Jiang Y. Identifying multi-functional enzyme by hierarchical multi-label classifier. Journal of Computational and Theoretical Nanoscience, 2013, 10(4): 1038―1043
https://doi.org/10.1166/jctn.2013.2804
10 Chou K C, Shen H B. Recent advances in developing web-servers for predicting protein attributes. Natural Science, 2009, 1(2): 63―92
https://doi.org/10.4236/ns.2009.12011
11 Ganapathiraju M, Weisser D, Rosenfeld R, Carbonell J, Reddy R, Klein-Seetharaman J. Comparative n-gram analysis of whole-genome protein sequences. In: Proceedings of the 2nd International Conference on Human Language Technology Research. 2002, 76―81
https://doi.org/10.3115/1289189.1289259
12 Srinivasan S M, Vural S, King B R, Guda C. Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics, 2013, 14(1): 96
https://doi.org/10.1186/1471-2105-14-96
13 Koza J R. Genetic Programming. MIT press, 1992
14 Sun Y, Kamel M S, Wong A K, Wang Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12): 3358―3378
https://doi.org/10.1016/j.patcog.2007.04.009
15 Lewis D, Gale W. Training text classifiers by uncertainty sampling. In: Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval. 1994.
16 Kubat M, Holte R C, Matwin S. Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998, 30(2-3): 195―215
https://doi.org/10.1023/A:1007452223027
17 Fawcett T. An introduction to ROC analysis. Pattern recognition letters, 2006, 27(8): 861―874
https://doi.org/10.1016/j.patrec.2005.10.010
18 Provost F J, Fawcett T. Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 1997, 97: 43―48
19 Bateman A, Coin L, Durbin R, Finn R D, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer E L L, Studholme D J, Yeats C, Eddy, S. R. The Pfam protein families database. Nucleic Acids Research, 2004, 32: D138―D141
https://doi.org/10.1093/nar/gkh121
[1] Supplementary Material-Highlights in 3-page ppt
Download
[1] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[2] Minqi ZHOU, Rong ZHANG, Weining QIAN, Aoying ZHOU. Distribution-free data density estimation in large-scale networks[J]. Front. Comput. Sci., 2018, 12(6): 1220-1240.
[3] Ruochen HUANG, Xin WEI, Liang ZHOU, Chaoping LV, Hao MENG, Jiefeng JIN. A survey of data-driven approach on multimedia QoE evaluation[J]. Front. Comput. Sci., 2018, 12(6): 1060-1075.
[4] Ziting ZHOU, Pengpeng ZHAO, Victor S. SHENG, Jiajie XU, Zhixu LI, Jian WU, Zhiming CUI. Efficient sampling methods for characterizing POIs on maps based on road networks[J]. Front. Comput. Sci., 2018, 12(3): 582-592.
[5] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[6] Qian LI, Gang LI, Wenjia NIU, Yanan CAO, Liang CHANG, Jianlong TAN, Li GUO. Boosting imbalanced data learning with Wiener process oversampling[J]. Front. Comput. Sci., 2017, 11(5): 836-851.
[7] Zhenxue HE, Limin XIAO, Fei GU, Tongsheng XIA, Shubin SU, Zhisheng HUO, Rong ZHANG, Longbing ZHANG, Li RUAN, Xiang WANG. An efficient and fast polarity optimization approach for mixed polarity Reed-Muller logic circuits[J]. Front. Comput. Sci., 2017, 11(4): 728-742.
[8] Sedigheh KHOSHNEVIS, Fereidoon SHAMS. Automating identification of services and their variability for product lines using NSGA-II[J]. Front. Comput. Sci., 2017, 11(3): 444-464.
[9] Chenchen SUN,Derong SHEN,Yue KOU,Tiezheng NIE,Ge YU. A genetic algorithm based entity resolution approach with active learning[J]. Front. Comput. Sci., 2017, 11(1): 147-159.
[10] Lamia SADEG-BELKACEM,Zineb HABBAS,Wassila AGGOUNE-MTALAA. Adaptive genetic algorithms guided by decomposition for PCSPs: application to frequency assignment problems[J]. Front. Comput. Sci., 2016, 10(6): 1012-1025.
[11] Lijin WANG,Yilong YIN,Yiwen ZHONG. Cuckoo search with varied scaling factor[J]. Front. Comput. Sci., 2015, 9(4): 623-635.
[12] Priyanka CHAWLA,Inderveer CHANA,Ajay RANA. A novel strategy for automatic test data generation using soft computing technique[J]. Front. Comput. Sci., 2015, 9(3): 346-363.
[13] Dunwei GONG, Yan ZHANG. Generating test data for both path coverage and fault detection using genetic algorithms[J]. Front Comput Sci, 2013, 7(6): 822-837.
[14] Cheqing JIN, Jingwei ZHANG, Aoying ZHOU. Continuous ranking on uncertain streams[J]. Front Comput Sci, 2012, 6(6): 686-699.
[15] Dion DETTERER, Paul KWAN, Cedric GONDRO. A co-evolving memetic wrapper for prediction of patient outcomes in TCM informatics[J]. Front Comput Sci, 2012, 6(5): 621-629.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed