Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (6) : 1052-1066    https://doi.org/10.1007/s11704-016-5487-5
RESEARCH ARTICLE
Constrained query of order-preserving submatrix in gene expression data
Tao JIANG,Zhanhuai LI(),Xuequn SHANG,Bolin CHEN,Weibang LI,Zhilei YIN
School of Computer Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
 Download: PDF(1129 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSMdatasets. However, improvingOPSMquery relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.

Keywords gene expression data      OPSM      constrained query      brute-force search      feature sequence      cIndex     
Corresponding Author(s): Zhanhuai LI   
Just Accepted Date: 23 March 2016   Online First Date: 01 June 2016    Issue Date: 11 October 2016
 Cite this article:   
Tao JIANG,Zhanhuai LI,Xuequn SHANG, et al. Constrained query of order-preserving submatrix in gene expression data[J]. Front. Comput. Sci., 2016, 10(6): 1052-1066.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5487-5
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I6/1052
1 Pensa R G, Boulicaut J F. Constrained coclustering of gene expression data. In: Proceedings of the 8th SIAM International Conference on Data Mining. 2008, 25–36
2 Alqadah F, Bader J S, Anand R, Reddy C K. Query-based biclustering using formal concept analysis. In: Proceedings of the 12th SIAM International Conference on Data Mining. 2012, 648–659
https://doi.org/10.1137/1.9781611972825.56
3 Jiang T, Li Z H, Chen Q, Li K W, Wang Z, Pan W. Towards orderpreserving submatrix search and indexing. In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications. 2015, 309–326
4 Gao B J, Griffith O L, Ester M, Jones S J M. Discovering significant OPSMsubspace clusters in massive gene expression data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 922–928
https://doi.org/10.1145/1150402.1150529
5 Gao B J, Griffith O L, Ester M, Xiong H, Zhao Q, Jones S J M. On the deep order-preserving submatrix problem: a best effort approach. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(2): 309–325
https://doi.org/10.1109/TKDE.2010.244
6 Sim K, Gopalkrishnan V, Zimek A, Cong G. A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 2013, 26(2): 332–397
https://doi.org/10.1007/s10618-012-0258-x
7 Madeira S C, Oliveira A L. Biclustering algorithms for biological data analysis: a survey. IEEE/ACMTransactions on Computational Biology and Bioinformatics, 2004, 1(1): 24–45
https://doi.org/10.1109/TCBB.2004.2
8 Jiang D X, Tang C, Zhang A D. Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1370–1386
https://doi.org/10.1109/TKDE.2004.68
9 Kriegel H P, Kröger P, Zimek A. Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): 337–348
https://doi.org/10.1145/1497577.1497578
10 Yue F, Sun L, Wang K Q, Wang Y J, Zuo W M. State-of-the-art of cluster analysis of gene expression data. Acta Automatica Sinica, 2008, 34(2): 113–120
https://doi.org/10.3724/SP.J.1004.2008.00113
11 Zou Q, Li X B, Jiang W R, Lin Z Y, Li G L, Chen K. Survey ofMapReduce frame operation in bioinformatics. Briefings in Bioinformatics, 2014, 15(4): 637–647
https://doi.org/10.1093/bib/bbs088
12 Zou Q, Guo M Z, Liu Y, Wang J. A classification method for classimbalanced data and its application on bioinformatics. Journal of Computer Research and Development, 2010, 47(8): 1407–1414
13 Dhollander T, Sheng Q, Lemmens K, Moor B D, Marchal K, Moreau Y. Query-driven module discovery in microarray data. Bioinformatics, 2007, 23(19): 2573–2580
https://doi.org/10.1093/bioinformatics/btm387
14 Zhao H, Cloots L, Bulcke T V D, Wu Y, Smet R D, Storms V, Meysman P, Engelen K, Marchal K. Query-based biclustering of gene expression data using probabilistic relational models. BMC Bioinformatics, 2011, 12(S-1): S37
https://doi.org/10.1186/1471-2105-12-S1-S37
15 Pensa R G, Robardet C, Boulicaut J F. Towards constrained coclustering in ordered 0/1 data sets. In: Proceedings of the 16th International Symposium on Foundations of Intelligent Systems. 2006, 425–434
https://doi.org/10.1007/11875604_49
16 Pensa R G, Robardet C, Boulicaut J F. Constraint-driven co-clustering of 0/1 data. Constrained Clustering: Advances in Algorithms, Theory and Applications, 2008, 145–170
17 Cheng Y, Church G M. Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. 2000, 93–103
18 Yang J, Wang W, Wang H, Yu P S. Delta-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering. 2002, 517–528
https://doi.org/10.1109/ICDE.2002.994771
19 Wang H, Wang W, Yang J, Yu P S. Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of Data. 2002, 394–405
https://doi.org/10.1145/564691.564737
20 Wang H, Pei J, Yu P S. Pattern-based similarity search for microarray data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005, 814–819
https://doi.org/10.1145/1081870.1081978
21 Ben-Dor A, Chor B, Karp R M, Yakhini Z. Discovering local structure in gene expression data: the order-preserving submatrix problem. Journal of Computational Biology, 2003, 10(3-4): 373–384
https://doi.org/10.1089/10665270360688075
22 Liu J, Wang W. OP-cluster: clustering by tendency in high dimensional space. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 187–194
https://doi.org/10.1109/ICDM.2003.1250919
23 Zhao Y, Yu J X, Wang G, Chen L, Wang B, Yu G. Maximal subspace coregulated gene clustering. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 83–98
https://doi.org/10.1109/TKDE.2007.190670
24 Kriegel H P, Kröger P, Renz M, Wurst S H R. A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 250–257
https://doi.org/10.1109/icdm.2005.5
25 An P. Research on biclustering methods for gene expression data analysis. Disseration for the Master Degree. Suzhou: Soochow University, 2013
26 Jiang T, Li Z H, Chen Q, Wang Z, Pan W, Wang Z. Parallel partitioning and mining gene expression data with butterfly network. In: Proceedings of the 24th International Conference on Database and Expert Systems Applications. 2013, 129–144
https://doi.org/10.1007/978-3-642-40285-2_13
27 Jiang T, Li Z H, Chen Q, Wang Z, Li K, Pan W. OMEGA: an orderpreserving submatrix mining, indexing and search tool. In: Proceedings of the European Conference onMachine Learning and Knowledge Discovery in Databases. 2015, 303–307
https://doi.org/10.1007/978-3-319-23461-8_35
[1]  Supplementary Material Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed