Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

   优先出版

合作单位

2012年, 第6卷 第2期 出版日期:2012-04-01

选择: 合并摘要 显示/隐藏图片
RESEARCH ARTICLE
Multi-stream join answering for mining significant cross-stream correlations
Robert GWADERA
Frontiers of Computer Science. 2012, 6 (2): 131-142.  
https://doi.org/10.1007/s11704-012-2862-8

摘要   HTML   PDF (684KB)

Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from different streams. We provide a solution to the problem of assessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ result. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rainfall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.

参考文献 | 相关文章 | 多维度评价
Active improvement of hierarchical object features under budget constraints
Nicolas CEBRON
Frontiers of Computer Science. 2012, 6 (2): 143-153.  
https://doi.org/10.1007/s11704-012-2857-5

摘要   HTML   PDF (657KB)

When we think of an object in a supervised learning setting, we usually perceive it as a collection of fixed attribute values. Although this setting may be suited well for many classification tasks, we propose a new object representation and therewith a new challenge in data mining; an object is no longer described by one set of attributes but is represented in a hierarchy of attribute sets in different levels of quality. Obtaining a more detailed representation of an object comes with a cost. This raises the interesting question of which objects we want to enhance under a given budget and cost model. This new setting is very useful whenever resources like computing power, memory or time are limited. We propose a new active adaptive algorithm (AAA) to improve objects in an iterative fashion. We demonstrate how to create a hierarchical object representation and prove the effectiveness of our new selection algorithm on these datasets.

参考文献 | 相关文章 | 多维度评价
An effective framework for characterizing rare categories
Jingrui HE, Hanghang TONG, Jaime CARBONELL
Frontiers of Computer Science. 2012, 6 (2): 154-165.  
https://doi.org/11.1007/s11704-012-2861-9

摘要   HTML   PDF (830KB)

Rare categories become more and more abundant and their characterization has received little attention thus far. Fraudulent banking transactions, network intrusions, and rare diseases are examples of rare classes whose detection and characterization are of high value. However, accurate characterization is challenging due to high-skewness and nonseparability from majority classes, e.g., fraudulent transactions masquerade as legitimate ones. This paper proposes the RACH algorithm by exploring the compactness property of the rare categories. This algorithm is semi-supervised in nature since it uses both labeled and unlabeled data. It is based on an optimization framework which encloses the rare examples by a minimum-radius hyperball. The framework is then converted into a convex optimization problem, which is in turn effectively solved in its dual form by the projected subgradient method. RACH can be naturally kernelized. Experimental results validate the effectiveness of RACH.

参考文献 | 相关文章 | 多维度评价
Resilient k-d trees: k-means in space revisited
Fabian GIESEKE, Gabriel MORUZ, Jan VAHRENHOLD
Frontiers of Computer Science. 2012, 6 (2): 166-178.  
https://doi.org/10.1007/s11704-012-2870-8

摘要   HTML   PDF (618KB)

We propose a k-d tree variant that is resilient to a pre-described number of memory corruptions while still using only linear space. While the data structure is of independent interest, we demonstrate its use in the context of highradiation environments. Our experimental evaluation demonstrates that the resulting approach leads to a significantly higher resiliency rate compared to previous results. This is especially the case for large-scale multi-spectral satellite data, which renders the proposed approach well-suited to operate aboard today’s satellites.

参考文献 | 相关文章 | 多维度评价
Stratified sampling for data mining on the deep web
Tantan LIU, Fan WANG, Gagan AGRAWAL
Frontiers of Computer Science. 2012, 6 (2): 179-196.  
https://doi.org/10.1007/s11704-012-2859-3

摘要   HTML   PDF (650KB)

In recent years, the deep web has become extremely popular. Like any other data source, data mining on the deep web can produce important insights or summaries of results. However, data mining on the deep web is challenging because the databases cannot be accessed directly, and therefore, data mining must be performed by sampling the datasets. The samples, in turn, can only be obtained by querying deep web databases with specific inputs. In this paper, we target two related data mining problems, association mining and differential rulemining. These are proposed to extract high-level summaries of the differences in data provided by different deep web data sources in the same domain. We develop stratified sampling methods to perform these mining tasks on a deep web source. Our contributions include a novel greedy stratification approach, which recursively processes the query space of a deep web data source, and considers both the estimation error and the sampling costs. We have also developed an optimized sample allocation method that integrates estimation error and sampling costs. Our experimental results show that our algorithms effectively and consistently reduce sampling costs, compared with a stratified sampling method that only considers estimation error. In addition, compared with simple random sampling, our algorithm has higher sampling accuracy and lower sampling costs.

参考文献 | 相关文章 | 多维度评价
Two of a kind or the ratings game? Adaptive pairwise preferences and latent factor models
Suhrid BALAKRISHNAN, Sumit CHOPRA
Frontiers of Computer Science. 2012, 6 (2): 197-208.  
https://doi.org/10.1007/s11704-012-2871-7

摘要   HTML   PDF (388KB)

Latent factor models have become a workhorse for a large number of recommender systems.While these systems are built using ratings data, which is typically assumed static, the ability to incorporate different kinds of subsequent user feedback is an important asset. For instance, the user might want to provide additional information to the system in order to improve his personal recommendations. To this end, we examine a novel scheme for efficiently learning (or refining) user parameters from such feedback. We propose a scheme where users are presented with a sequence of pairwise preference questions: “Do you prefer item A over B?” User parameters are updated based on their response, and subsequent questions are chosen adaptively after incorporating the feedback. We operate in a Bayesian framework and the choice of questions is based on an information gain criterion. We validate the scheme on the Netflix movie ratings data set and a proprietary television viewership data set. A user study and automated experiments validate our findings.

参考文献 | 相关文章 | 多维度评价
Managing advertising campaigns—an approximate planning approach
Sertan GIRGIN, Jérémie MARY, Philippe PREUX, Olivier NICOL
Frontiers of Computer Science. 2012, 6 (2): 209-229.  
https://doi.org/10.1007/s11704-012-2873-5

摘要   HTML   PDF (999KB)

We consider the problem of displaying commercial advertisements on web pages, in the “cost per click” model. The advertisement server has to learn the appeal of each type of visitor for the different advertisements in order to maximize the profit. Advertisements have constraints such as a certain number of clicks to draw, as well as a lifetime. This problem is thus inherently dynamic, and intimately combines combinatorial and statistical issues. To set the stage, it is also noteworthy that we deal with very rare events of interest, since the base probability of one click is in the order of 10-4. Different approaches may be thought of, ranging from computationally demanding ones (use of Markov decision processes, or stochastic programming) to very fast ones.We introduce NOSEED, an adaptive policy learning algorithm based on a combination of linear programming and multi-arm bandits. We also propose a way to evaluate the extent to which we have to handle the constraints (which is directly related to the computation cost). We investigate the performance of our system through simulations on a realistic model designed with an important commercial web actor.

参考文献 | 相关文章 | 多维度评价
An approach for automatic sleep stage scoring and apnea-hypopnea detection
Tim SCHLüTER, Stefan CONRAD
Frontiers of Computer Science. 2012, 6 (2): 230-241.  
https://doi.org/10.1007/s11704-012-2872-6

摘要   HTML   PDF (407KB)

In this article we present an application of data mining to the medical domain sleep research, an approach for automatic sleep stage scoring and apnea-hypopnea detection. By several combined techniques (Fourier and wavelet transform, derivative dynamic time warping, and waveform recognition), our approach extracts meaningful features (frequencies and special patterns like k-complexes and sleep spindles) from physiological recordings containing EEG, ECG, EOG and EMG data. Based on these pieces of information, an ensemble of decision trees is constructed using the principle of bagging, which classifies sleep epochs in their sleep stages according to the rules by Rechtschaffen and Kales and annotates occurrences of apnea-hypopnea (total or partial cessation of respiration). After that, casebased reasoning is applied in order to improve quality. We tested and evaluated our approach on several large public databases from PhysioBank, which showed an overall accuracy of 95.2% for sleep stage scoring and 94.5% for classifying minutes as apneic or non-apneic.

参考文献 | 相关文章 | 多维度评价
8篇文章