Feature selection on probabilistic symbolic objects

doi:10.1007/s11704-014-3359-4

Front. Comput. Sci.

2014, Vol. 8

Issue (6) : 933-947 https://doi.org/10.1007/s11704-014-3359-4

RESEARCH ARTICLE

Feature selection on probabilistic symbolic objects

Djamal ZIANI(

)

Information Systems Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

Download: PDF(436 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

In data analysis tasks, we are often confronted to very high dimensional data. Based on the purpose of a data analysis study, feature selection will find and select the relevant subset of features from the original features. Many feature selection algorithms have been proposed in classical data analysis, but very few in symbolic data analysis (SDA) which is an extension of the classical data analysis, since it uses rich objects instead to simple matrices. A symbolic object, compared to the data used in classical data analysis can describe not only individuals, but also most of the time a cluster of individuals. In this paper we present an unsupervised feature selection algorithm on probabilistic symbolic objects (PSOs), with the purpose of discrimination. A PSO is a symbolic object that describes a cluster of individuals by modal variables using relative frequency distribution associated with each value. This paper presents new dissimilarity measures between PSOs, which are used as feature selection criteria, and explains how to reduce the complexity of the algorithm by using the discrimination matrix.

Keywords symbolic data analysis feature selection probabilistic symbolic object discrimination criteria data and knowledge visualization

Corresponding Author(s): Djamal ZIANI

Issue Date: 27 November 2014

Cite this article:

Djamal ZIANI. Feature selection on probabilistic symbolic objects[J]. Front. Comput. Sci., 2014, 8(6): 933-947.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-014-3359-4
https://academic.hep.com.cn/fcs/EN/Y2014/V8/I6/933

1	L Billard, E Diday. Symbolic data analysis. John Wiley & Sons, Ltd., 2006 https://doi.org/10.1002/9780470090183
2	E Diday, F Esposito. An introduction to symbolic data analysis and the SODAS software. Intelligent Data Analysis, 2003, 7(6): 583−601
3	E Diday. Probabilist, possibilist and belief objects for knowledge analysis. Annals of Operations Research, 1995, 55(2): 227−276 https://doi.org/10.1007/BF02030862
4	D Ziani. Sélection de variables sur un ensemble d’objets symboliques: traitement des dépendances entre variables. Paris: University of Paris Dauphine, Dissertation for the Doctoral Degree 1996 (in French)
5	J Lebbe. Représentation des concepts en biologie et en médecine. Dissertation for the Doctoral Degree, 1991 (in French)
6	H H Bock, E Diday. Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, 2000, 389−391 https://doi.org/10.1007/978-3-642-57155-8
7	D Ziani. Feature selection on Boolean symbolic objects. International Journal of Computer Science & Information Technology, 2013, 5(6): 1 https://doi.org/10.5121/ijcsity.2013.1401
8	D Malerba, F Esposito, M Monopoli. Comparing dissimilarity measures for probabilistic symbolic objects. Data mining III, Series Management Information Systems, 2002, 6: 31−40
9	Z Rached, F Alajaji, L L Campbell. Rényi’s divergence and entropy rates for finite alphabet Markov sources. IEEE Transactions on Information Theory, 2001, 47(4): 1553−1561 https://doi.org/10.1109/18.923736
10	S Kullback, R A Leibler. On information and sufficiency. Annals of Mathematical Statistics, 1951, 22(1): 79−86 https://doi.org/10.1214/aoms/1177729694
11	J Beirlant, L Devroye, L Györfi, I Vajda. Large deviations of divergence measures on partitions. Journal of Statistical Planning and Inference, 2001, 93(1): 1−16 https://doi.org/10.1016/S0378-3758(00)00202-0
12	D Ziani, Z Khalil, R Vignes. Recherche de sous-ensembles minimaux de variables à partir d’objets symboliques. In: Proceedings of the 5th èmes Journées “Symbolique-Numérique”. 1994, 794−799 (in French)
13	F Esposito, D Malerba, A Appice. Dissimilarity and matching. Symbolic Data Analysis and the SODAS Software, 2008, 61−66
14	A Frank, A Asuncion. Uci machine learning repository. irvine, ca: University of california. School of Information and Computer Science, 2010, 213
15	C Browne, I Düntsch, G Gediga. Iris revisited: a comparison of discriminant and enhanced rough set data analysis. Rough Sets in Knowledge Discovery 2, 1998, 19: 345−368
16	M Dash, K Choi, P Scheuermann, H Liu. Feature selection for clustering —a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining. 2002, 115−122
17	J G Dy, C E Brodley. Feature selection for unsupervised learning. The Journal of Machine Learning Research, 2004, 5: 845−889

[1]	Momo MATSUDA, Yasunori FUTAMURA, Xiucai YE, Tetsuya SAKURAI. Distortion-free PCA on sample space for highly variable gene detection from single-cell RNA-seq data[J]. Front. Comput. Sci., 2023, 17(1): 171310-.
[2]	Zaheer Ullah KHAN, Dechang PI, Shuanglong YAO, Asif NAWAZ, Farman ALI, Shaukat ALI. piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156904-.
[3]	Lei CHEN, Kai SHAO, Xianzhong LONG, Lingsheng WANG. Multi-task regression learning for survival analysis via prior information guided transductive matrix completion[J]. Front. Comput. Sci., 2020, 14(5): 145312-.
[4]	Parnika PARANJAPE, Meera DHABU, Parag DESHPANDE. A novel classifier for multivariate instance using graph class signatures[J]. Front. Comput. Sci., 2020, 14(4): 144307-.
[5]	Farid FEYZI, Saeed PARSA. Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference[J]. Front. Comput. Sci., 2019, 13(4): 735-759.
[6]	Nannan XIE, Xing WANG, Wei WANG, Jiqiang LIU. Fingerprinting Android malware families[J]. Front. Comput. Sci., 2019, 13(3): 637-646.
[7]	Xuegang HU, Peng ZHOU, Peipei LI, Jing WANG, Xindong WU. A survey on online feature selection with streaming features[J]. Front. Comput. Sci., 2018, 12(3): 479-493.
[8]	Zhisong PAN,Zhantao DENG,Yibing WANG,Yanyan ZHANG. Dimensionality reduction via kernel sparse representation[J]. Front. Comput. Sci., 2014, 8(5): 807-815.
[9]	Dion DETTERER, Paul KWAN, Cedric GONDRO. A co-evolving memetic wrapper for prediction of patient outcomes in TCM informatics[J]. Front Comput Sci, 2012, 6(5): 621-629.

Viewed

Full text

Abstract

Cited

Shared

Discussed