Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (4) : 813-823    https://doi.org/10.1007/s11704-016-6287-7
RESEARCH ARTICLE
Identification and prioritization of differentially expressed genes for time-series gene expression data
Linlin XING, Maozu GUO(), Xiaoyan LIU, Chunyu WANG
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
 Download: PDF(1073 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Identification of differentially expressed genes (DEGs) in time course studies is very useful for understanding gene function, and can help determine key genes during specific stages of plant development. A few existing methods focus on the detection of DEGs within a single biological group, enabling to study temporal changes in gene expression. To utilize a rapidly increasing amount of single-group time-series expression data, we propose a two-step method that integrates the temporal characteristics of time-series data to obtain a B-spline curve fit. Firstly, a flat gene filter based on the Ljung–Box test is used to filter out flat genes. Then, a B-spline model is used to identify DEGs. For use in biological experiments, these DEGs should be screened, to determine their biological importance. To identify high-confidence promising DEGs for specific biological processes, we propose a novel gene prioritization approach based on the partner evaluation principle. This novel gene prioritization approach utilizes existing co-expression information to rank DEGs that are likely to be involved in a specific biological process/condition. The proposed method is validated on the Arabidopsis thaliana seed germination dataset and on the rice anther development expression dataset.

Keywords time-series gene expression      flat gene filter      gene prioritization      co-expression      differentially expressed genes     
Corresponding Author(s): Maozu GUO   
Just Accepted Date: 25 November 2016   Online First Date: 20 December 2017    Issue Date: 14 June 2018
 Cite this article:   
Linlin XING,Maozu GUO,Xiaoyan LIU, et al. Identification and prioritization of differentially expressed genes for time-series gene expression data[J]. Front. Comput. Sci., 2018, 12(4): 813-823.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-6287-7
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I4/813
1 Dudoit S, Yang Y H, Callow M J, Speed T P. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 2002, 12(1): 111–139
2 Tusher V G, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(9): 5116–5121
https://doi.org/10.1073/pnas.091062498
3 Smyth G K. Limma: linear models for microarray data. In: Gentleman R, Carey V J, Huber W, et al, eds. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer, 2005, 397–420
https://doi.org/10.1007/0-387-29362-0_23
4 ElBakry O, Ahmad M O, Swamy M N. Identification of differentially expressed genes for time-course microarray data based on modified RM ANOVA. IEEE/ACMTransactions on Computational Biology and Bioinformatics, 2012, 9(2): 451–466
https://doi.org/10.1109/TCBB.2011.65
5 Bar-Joseph Z. Analyzing time series gene expression data. Bioinformatics, 2004, 20(16): 2493–2503
https://doi.org/10.1093/bioinformatics/bth283
6 Ernst J, Nau G J, Bar-Joseph Z. Clustering short time series gene expression data. Bioinformatics, 2005, 21(suppl_1): 159–168
7 Chaiboonchoe A, Samarasinghe S, Kulasiri G D. Using emergent clustering methods to analyse short time series gene expression data from childhood leukemia treated with glucocorticoids. In: Proceedings of the 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation. 2009, 741–747
8 Bar-Joseph Z, Gerber G, Simon L, Gifford D K, Jaakkola T S. Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100(18): 10146–10151
https://doi.org/10.1073/pnas.1732547100
9 Conesa A, Nueda M J, Ferrer A, Talon M. maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics, 2006, 22(9): 1096–1102
https://doi.org/10.1093/bioinformatics/btl056
10 Storey J D, Xiao W Z, Leek J T, Tompkins R G, Davis R W. Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(36): 12837–12842
https://doi.org/10.1073/pnas.0504609102
11 Kim J, Ogden R, Kim H. A method to identify differential expression profiles of time-course gene data with Fourier transformation. BMC Bioinformatics, 2013, 14(1): 310
https://doi.org/10.1186/1471-2105-14-310
12 Han X U, Sung W-K, Feng L I N. Identifying differentially expressed genes in time-course microarray experiment without replicate. Journal of Bioinformatics and Computational Biology, 2007, 5(02a): 281–296
https://doi.org/10.1142/S0219720007002655
13 Angelini C, Cutillo L, De Canditiis D, Mutarelli M, Pensky M. BATS: a Bayesian user-friendly software for analyzing time series microarray experiments. BMC Bioinformatics, 2008, 9: 415
https://doi.org/10.1186/1471-2105-9-415
14 Wu S, Wu H L. More powerful significant testing for time course gene expression data using functional principal component analysis approaches. BMC Bioinformatics, 2013, 14(1): 6
https://doi.org/10.1186/1471-2105-14-6
15 Yang E W, Girke T, Jiang T. Differential gene expression analysis using coexpression and RNA-Seq data. Bioinformatics, 2013, 29(17): 2153–2161
https://doi.org/10.1093/bioinformatics/btt363
16 Pan J B, Hu S C, Wang H, Zou Q, Ji Z L. PaGeFinder: quantitative identification of spatiotemporal pattern genes. Bioinformatics, 2012, 28(11): 1544–1545
https://doi.org/10.1093/bioinformatics/bts169
17 Xiao S J, Zhang C, Zou Q, J i Z L. TiSGeD: a database for tissuespecific genes. Bioinformatics, 2010, 26(9): 1273–1275
https://doi.org/10.1093/bioinformatics/btq109
18 Pan J B, Hu S C, Shi D, Cai M C, Li Y B, Zou Q, Ji Z L. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PloS One, 2013, 8(12): E80747
https://doi.org/10.1371/journal.pone.0080747
19 Moreau Y, Tranchevent L C. Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Reviews Genetics, 2012, 13(8): 523–536
https://doi.org/10.1038/nrg3253
20 Yu W, Wulf A, Liu T B, Khoury M J, Gwinn M. Gene Prospector: an evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases. BMC Bioinformatics, 2008, 9(1): 528
https://doi.org/10.1186/1471-2105-9-528
21 Chen J, Bardes E E, Aronow B J, Jegga A G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Research, 2009, 37(suppl_2): W305–W311
22 Adie E A, Adams R R, Evans K L, Porteous D J, Pickard B S. Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 2005, 6(1): 55
https://doi.org/10.1186/1471-2105-6-55
23 Usadel B, Obayashi T, Mutwil M, Giorgi F M, Bassel G W, Tanimoto M, Chow A, Steinhauser D, Persson S, Provart N J. Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ, 2009, 32(12): 1633–1651
https://doi.org/10.1111/j.1365-3040.2009.02040.x
24 Obayashi T, Okamura Y, Ito S, Tadaka S, Aoki Y, Shirota M, Kinoshita K. ATTED-II in 2014: evaluation of gene coexpression in agriculturally important plants. Plant and Cell Physiology, 2014, 55(1): e6
https://doi.org/10.1093/pcp/pct178
25 Storey J D, Tibshirani R. Statistical significance for genome wide studies. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100(16): 9440–9445
https://doi.org/10.1073/pnas.1530509100
26 Howe E, Holton K, Nair S, Schlauch D, Sinha R, Quackenbush J. MeV: multiexperiment viewer. In: Ochs M F, Casagrande J T, Davuluri R V, eds. Biomedical Informatics for Cancer Research. Springer US, 2010, 267–277
https://doi.org/10.1007/978-1-4419-5714-6_15
27 Du Z, Zhou X, Ling Y, Zhang Z H, Su Z. agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Research, 2010, 38(suppl_2): W64–W70
28 Narsai R, Law S R, Carrie C, Xu L, Whelan J. In-depth temporal transcriptome profiling reveals a crucial developmental switch with roles for RNA processing and organelle metabolism that are essential for germination in Arabidopsis. Plant Physiology, 2011, 157(3): 1342–1362
https://doi.org/10.1104/pp.111.183129
29 Yeung K Y, Haynor D R, Ruzzo W L. Validating clustering for gene expression data. Bioinformatics, 2001, 17(4): 309–318
https://doi.org/10.1093/bioinformatics/17.4.309
30 Fujita M, Horiuchi Y, Ueda Y, Mizuta Y, Kubo T, Yano K, Yamaki S, Tsuda K, Nagata T, Niihama M, Kato H, Kikuchi S, Hamada K, Mochizuki T, Ishimizu T, Iwai H, Tsutsumi N, Kurata N. Rice expression atlas in reproductive development. Plant and Cell Physiology, 2010, 51(12): 2060–2081
https://doi.org/10.1093/pcp/pcq165
[1] Li ZHANG,Songcan CHEN,Xuejun LIU. Detecting differential expression from RNA-seq data with expression measurement uncertainty[J]. Front. Comput. Sci., 2015, 9(4): 652-663.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed