Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (2) : 451-460    https://doi.org/10.1007/s11704-018-8094-9
RESEARCH ARTICLE
iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition
Shahid AKBAR, Maqsood HAYAT(), Muhammad IQBAL, Muhammad TAHIR
Department of Computer Science, AbdulWali Khan University Mardan, KP 23200, Pakistan
 Download: PDF(312 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

RNA 5-methylcytosine (m5C) sites perform a major role in numerous biological processes and commonly reported in both DNA and RNA cellular. The enzymatic mechanism and biological functions of m5C sites in DNA remain the focusing area of researchers for last few decades. Likewise, the investigators also targeted m5C sites in RNA due to its cellular functions, positioning and formation mechanism. Currently, several rudimentary roles of the m5C in RNA have been explored, but a lot of improvements are still under consideration. Initially, the identification of RNA methylcytosine sites was carried out via experimental methods, which were very hard, erroneous and time consuming owing to partial availability of recognized structures. Looking at the significance of m5C role in RNA, scientists have diverted their attention from structure to sequence-based prediction. In this regards, an intelligent computational model is proposed in order to identify m5C sites in RNA with high precision. Three RNA sequences formulation methods namely: pseudo dinucleotide composition,pseudo trinucleotide composition and pseudo tetra nucleotide composition are applied to extract variant and high profound numerical features. In a sequel, the vector spaces are fused to build a hybrid space in order to compensate the weakness of each other. Various learning hypotheses are examined to select the best operational engine, which can truly identify the pattern of the target class. The strength and generalization of the proposed model are measured using two different cross validation tests. The reported outcomes reveal that the proposed model achieved 3% better accuracy than that of the highest present approach in the literature so far.

Keywords methylcytosine sites      PseTNC      PseTetraNC      hybrid features      SVM      cross validation test     
Corresponding Author(s): Maqsood HAYAT   
Just Accepted Date: 04 September 2018   Online First Date: 17 September 2019    Issue Date: 16 October 2019
 Cite this article:   
Shahid AKBAR,Maqsood HAYAT,Muhammad IQBAL, et al. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition[J]. Front. Comput. Sci., 2020, 14(2): 451-460.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-8094-9
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I2/451
1 Y Yue, J Liu, C He. RNA N6-methyladenosine methylation in posttranscriptional gene expression regulation. Genes & Development, 2015, 29(13): 1343–1355
https://doi.org/10.1101/gad.262766.115
2 S Edelheit, S Schwartz, M R Mumbach, O Wurtzel, R Sorek. Transcriptome-wide mapping of 5-methylcytidine RNA modifications in bacteria, archaea, and yeast reveals m5C within archaeal mRNAs. PLoS Genetics, 2013, 9(6): e1003602
https://doi.org/10.1371/journal.pgen.1003602
3 P Feng, H Ding, W Chen, H Lin. Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions. Molecular BioSystems, 2016, 12(11): 3307–3311
https://doi.org/10.1039/C6MB00471G
4 P F Agris. Bringing order to translation: the contributions of transfer RNA anticodon-domain modifications. EMBO Reports, 2008, 9(7): 629–635
https://doi.org/10.1038/embor.2008.104
5 M Helm. Post-transcriptional nucleotide modification and alternative folding of RNA. Nucleic Acids Research, 2006, 34(2): 721–733
https://doi.org/10.1093/nar/gkj471
6 Y Motorin, M Helm. tRNA stabilization by modified nucleotides. Biochemistry, 2010, 49(24): 4934–4944
https://doi.org/10.1021/bi100408z
7 M Schaefer, T Pollex, K Hanna, F Lyko. RNA cytosine methylation analysis by bisulfite sequencing. Nucleic Acids Research, 2008, 37(2): e12
https://doi.org/10.1093/nar/gkn954
8 S Hussain, A A Sajini, S Blanco, S Dietmann, P Lombard, Y Sugimoto, M Paramor, J G Gleeson, D T Odom, J Ule. NSun2-mediated cytosine-5 methylation of vault noncoding RNA determines its processing into regulatory small RNAs. Cell Reports, 2013, 4(2): 255–261
https://doi.org/10.1016/j.celrep.2013.06.029
9 Q Zou, J Guo, Y Ju, M Wu, X Zeng, Z Hong. Improving tRNAscan- SE annotation results via ensemble classifiers. Molecular Informatics, 2015, 34(11–12): 761–770
https://doi.org/10.1002/minf.201500031
10 V Khoddami, B R Cairns. Identification of direct targets and modified bases of RNA cytosine methyltransferases. Nature Biotechnology, 2013, 31(5): 458–464
https://doi.org/10.1038/nbt.2566
11 P Feng, H Ding, H Yang, W Chen, H Lin, K C Chou. iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Molecular Therapy-Nucleic Acids, 2017, 7: 155–163
https://doi.org/10.1016/j.omtn.2017.03.006
12 S Wan, Y Duan, Q Zou. HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source. Proteomics, 2017, 17(17–18): 1700262
https://doi.org/10.1002/pmic.201700262
13 Z Liao, Y Ju, Q Zou. Prediction of G protein-coupled receptors with SVM-prot features and random forest. Scientifica, 2016, 2016: 8309253
https://doi.org/10.1155/2016/8309253
14 W Chen, P Xing, Q Zou. Detecting N 6-methyladenosine sites from RNA transcriptomes using ensemble support vector machines. Scientific Reports, 2017, 7: 40242
https://doi.org/10.1038/srep40242
15 C Lin, Y Zou, J Qin, X Liu, Y Jiang, C Ke, Q Zou. Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One, 2013, 8(2): e56499
https://doi.org/10.1371/journal.pone.0056499
16 M Zhang, Y Xu, L Li, Z Liu, X Yang, D J Yu. Accurate RNA 5- methylcytosine site prediction based on heuristic physical-chemical properties reduction and classifier ensemble. Analytical Biochemistry, 2018, 550: 41–48
https://doi.org/10.1016/j.ab.2018.03.027
17 W R Qiu, S Y Jiang, Z C Xu, X Xiao, K C Chou. iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physicalchemical properties into pseudo dinucleotide composition. Oncotarget, 2017, 8(25): 41178
https://doi.org/10.18632/oncotarget.17104
18 M Iqbal, M Hayat. “iSS-Hyb-mRMR”: identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition. Computer Methods and Programs in Biomedicine, 2016, 128: 1–11
https://doi.org/10.1016/j.cmpb.2016.02.006
19 J E Squires, H R Patel, M Nousch, T Sibbritt, D T Humphreys, B J Parker, C M Suter, T Preiss. Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic Acids Research, 2012, 40(11): 5023–5033
https://doi.org/10.1093/nar/gks144
20 W J Sun, J H Li, S Liu, J Wu, H Zhou, L H Qu, J H Yang. RMBase: a resource for decoding the landscape of RNA modifications from highthroughput sequencing data. Nucleic Acids Research, 2015, 44(D1): D259–D265
https://doi.org/10.1093/nar/gkv1036
21 L Fu, B Niu, Z Zhu, S Wu, W Li. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28(23): 3150–3152
https://doi.org/10.1093/bioinformatics/bts565
22 S Akbar, M Hayat, M Iqbal, M A Jan. iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artificial Intelligence in Medicine, 2017, 79: 62–70
https://doi.org/10.1016/j.artmed.2017.06.008
23 M Hayat, A Khan. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. Journal of Theoretical Biology, 2011, 271(1): 10–17
https://doi.org/10.1016/j.jtbi.2010.11.017
24 M Kabir, D J Yu. Predicting DNase I hypersensitive sites via un-biased pseudo trinucleotide composition. Chemometrics and Intelligent Laboratory Systems, 2017, 167: 78–84
https://doi.org/10.1016/j.chemolab.2017.05.001
25 M Tahir, M Hayat, M Kabir. Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou’s trinucleotide composition. Computer Methods and Programs in Biomedicine, 2017, 146: 69–75
https://doi.org/10.1016/j.cmpb.2017.05.008
26 Z Liu, X Xiao, W R Qiu, K C Chou. iDNA-methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Analytical Biochemistry, 2015, 474: 69–77
https://doi.org/10.1016/j.ab.2014.12.009
27 M Kabir, M Hayat. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Molecular Genetics and Genomics, 2016, 291(1): 285–296
https://doi.org/10.1007/s00438-015-1108-5
28 W Chen, T Y Lei, D C Jin, H Lin, K C Chou. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Analytical Biochemistry, 2014, 456: 53–60
https://doi.org/10.1016/j.ab.2014.04.001
29 M Hayat, A Khan.WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids. Amino Acids, 2013, 44(5): 1317–1328
https://doi.org/10.1007/s00726-013-1466-4
30 F Ali, M Hayat. Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. Journal of Theoretical Biology, 2015, 384: 78–83
https://doi.org/10.1016/j.jtbi.2015.07.034
31 S Akbar, M Hayat. iMethyl-STTNC: identification of N6-methyladenosine sites by extending the idea of SAAC into Chou’s PseAAC to formulate RNA sequences. Journal of Theoretical Biology, 2018, 455: 205–211
https://doi.org/10.1016/j.jtbi.2018.07.018
32 A Khan, A Majid, M Hayat. CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. Computational Biology and Chemistry, 2011, 35(4): 218–229
https://doi.org/10.1016/j.compbiolchem.2011.05.003
33 J Hu, K Han, Y Li, J Y Yang, H B Shen, D J Yu. TargetCrys: protein crystallization prediction by fusing multi-view features with twolayered SVM. Amino Acids, 2016, 48(11): 2533–2547
https://doi.org/10.1007/s00726-016-2274-4
34 M Hayat, A Khan. Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s PseAAC. Protein and Peptide Letters, 2012, 19(4): 411–421
https://doi.org/10.2174/092986612799789387
35 S Ahmad, M Kabir, M Hayat. Identification of heat shock protein families and J-protein types by incorporating dipeptide composition into Chou’s general PseAAC. Computer Methods and Programs in Biomedicine, 2015, 122(2): 165–174
https://doi.org/10.1016/j.cmpb.2015.07.005
36 B Liu, S Wang, R Long, K C Chou. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics, 2016, 33(1): 35–41
https://doi.org/10.1093/bioinformatics/btw539
37 X Xiao, J L Min, W Z Lin, Z Liu, X Cheng, K C Chou. iDrugtarget: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. Journal of Biomolecular Structure and Dynamics, 2015, 33(10): 2221–2233
https://doi.org/10.1080/07391102.2014.998710
38 S Akbar, M Hayat, M Kabir, M Iqbal. iAFP-gap-SMOTE: an efficient feature extraction scheme gapped dipeptide composition is coupled with an oversampling technique for identification of antifreeze proteins. Letters in Organic Chemistry, 2019, 16(4): 294–302
https://doi.org/10.2174/1570178615666180816101653
39 W Z Lin, J A Fang, X Xiao, K C Chou. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One, 2011, 6(9): e24756
https://doi.org/10.1371/journal.pone.0024756
40 Y F Huang, L Y Chiu, C C Huang, C K Huang. Predicting RNAbinding residues from evolutionary information and sequence conservation. BMC Genomics, 2010, 11(4): S2
https://doi.org/10.1186/1471-2164-11-S4-S2
41 W Chen, H Ding, P Feng, H Lin, K C Chou. iACP: a sequencebased tool for identifying anticancer peptides. Oncotarget, 2016, 7(13): 16895
https://doi.org/10.18632/oncotarget.7815
42 S Akbar, A Ahmad, M Hayat, F Ali. Face recognition using hybrid feature space in conjunction with support vector machine. Journal of Applied Environmental and Biological Sciences, 2015, 5(7): 28–36
43 J Hu, X Yan. BS-KNN: an effective algorithm for predicting protein subchloroplast localization. Evolutionary Bioinformatics Online, 2012, 8: 79
https://doi.org/10.4137/EBO.S8681
44 S Arlot, A Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 2010, 4: 40–79
https://doi.org/10.1214/09-SS054
45 A Y Ng. Preventing “overfitting” of cross-validation data. In: Proceedings of the 14th International Conference on Machine Learning. 1997, 245–253
46 A Vehtari, A Gelman, J Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 2017, 27(5): 1413–1432
https://doi.org/10.1007/s11222-016-9696-4
47 J Ahmad, F Javed, M Hayat. Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods. Artificial Intelligence in Medicine, 2017, 78: 14–22
https://doi.org/10.1016/j.artmed.2017.05.001
48 M Tahir, M Hayat. Machine learning based identification of proteinprotein interactions using derived features of physiochemical properties and evolutionary profiles. Artificial Intelligence in Medicine, 2017, 78: 61–71
https://doi.org/10.1016/j.artmed.2017.06.006
49 W Zhang, K Robbins, Y Wang, K Bertrand, R Rekaya. A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information. BMC Genomics, 2010, 11(1): 273
https://doi.org/10.1186/1471-2164-11-273
50 M Elloumi, C Iliopoulos, J T Wang, A Y Zomaya. Pattern Recognition in Computational Molecular Biology: Techniques and Approaches. John Wiley & Sons, 2015
https://doi.org/10.1002/9781119078845
51 L Wasserman. All of Statistics: a Concise course in Statistical Inference. Springer Science & Business Media, 2013
52 Y Bengio, Y Grandvalet. No unbiased estimator of the variance of Kfold cross-validation. Journal of Machine Learning Research, 2004, 5(Sep): 1089–1105
53 R Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intellgence-Volum 2. 1995, 1137–1145
54 T Fushiki. Estimation of prediction error by using K-fold crossvalidation. Statistics and Computing, 2011, 21(2): 137–146
https://doi.org/10.1007/s11222-009-9153-8
55 H K Doreswamy. Performance evaluation of predictive classifiers for knowledge discovery from engineering materials data sets. 2012, arXiv preprint arXiv:1209.2501
56 W R Qiu, X Xiao, W Z Lin, K C Chou. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Research International, 2014, 2014: 947416
https://doi.org/10.1155/2014/947416
57 X Xiao, P Wang, K C Chou. iNR-PhysChem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physicalchemical property matrix. PLoS One, 2012, 7(2): e30869
https://doi.org/10.1371/journal.pone.0030869
58 X Xiao, P Wang, W Z Lin, J H Jia, K C Chou. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Analytical Biochemistry, 2013, 436(2): 168–177
https://doi.org/10.1016/j.ab.2013.01.019
59 P Feng, H Yang, H Ding, H Lin, W Chen, K C Chou. iDNA6mAPseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1): 96–102
https://doi.org/10.1016/j.ygeno.2018.01.005
60 W Chen, H Yang, P Feng, H Ding, H Lin. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics, 2017, 33(22): 3518–3523
https://doi.org/10.1093/bioinformatics/btx479
61 Y W Zhao, Z D Su, W Yang, H Lin, W Chen, H Tang. IonchanPred 2.0: a tool to predict Ion channels and their types. International Journal of Molecular Sciences, 2017, 18(9): 1838
https://doi.org/10.3390/ijms18091838
62 F Y Dao, H Yang, Z D Su, W Yang, Y Wu, D Hui, W Chen, H Tang, H Lin. Recent advances in conotoxin classification by using machine learning methods. Molecules, 2017, 22(7): 1057
https://doi.org/10.3390/molecules22071057
[1] Article highlights Download
[1] Zhenyang SU, Jing LI, Jun CHANG, Bo DU, Yafu XIAO. Real-time visual tracking using complementary kernel support correlation filters[J]. Front. Comput. Sci., 2020, 14(2): 417-429.
[2] Ning CHEN, Jun ZHU, Jianfei CHEN, Ting CHEN. Dropout training for SVMs with data augmentation[J]. Front. Comput. Sci., 2018, 12(4): 694-713.
[3] Ke TU,Hongbo LI,Fuchun SUN. A statistical learning based image denoising approach[J]. Front. Comput. Sci., 2015, 9(5): 713-719.
[4] Ali KHAZAEE,A. E. ZADEH. ECG beat classification using particle swarm optimization and support vector machine[J]. Front. Comput. Sci., 2014, 8(2): 217-231.
[5] Lean YU, Shouyang WANG, Kin Keung LAI. Developing an SVM-based ensemble learning system for customer risk identification collaborating with customer relationship management[J]. Front Comput Sci Chin, 2010, 4(2): 196-203.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed