|
Learning with privileged information using Bayesian networks
Shangfei WANG, Menghua HE, Yachen ZHU, Shan HE, Yue LIU, Qiang JI
Front. Comput. Sci.. 2015, 9 (2): 185-199.
https://doi.org/10.1007/s11704-014-4031-8
For many supervised learning applications, additional information, besides the labels, is often available during training, but not available during testing. Such additional information, referred to the privileged information, can be exploited during training to construct a better classifier. In this paper, we propose a Bayesian network (BN) approach for learning with privileged information. We propose to incorporate the privileged information through a three-node BN. We further mathematically evaluate different topologies of the three-node BN and identify those structures, through which the privileged information can benefit the classification. Experimental results on handwritten digit recognition, spontaneous versus posed expression recognition, and gender recognition demonstrate the effectiveness of our approach.
References |
Supplementary Material |
Related Articles |
Metrics
|
|
Understanding taxi drivers’ routing choices from spatial and social traces
Siyuan LIU,Shuhui WANG,Ce LIU,Ramayya KRISHNAN
Front. Comput. Sci.. 2015, 9 (2): 200-209.
https://doi.org/10.1007/s11704-014-4177-4
Most of our learning comes from other people or from our own experience. For instance, when a taxi driver is seeking passengers on an unknown road in a large city, what should the driver do? Alternatives include cruising around the road or waiting for a time period at the roadside in the hopes of finding a passenger or just leaving for another road enroute to a destination he knows (e.g., hotel taxi rank)? This is an interesting problem that arises everyday in cities all over the world. There could be different answers to the question poised above, but one fundamental problem is how the driver learns about the likelihood of finding passengers on a road that is new to him (as he has not picked up or dropped off passengers there before). Our observation from large scale taxi driver trace data is that a driver not only learns from his own experience but through interactions with other drivers. In this paper, we first formally define this problem as socialized information learning (SIL), second we propose a framework including a series of models to study how a taxi driver gathers and learns information in an uncertain environment through the use of his social network. Finally, the large scale real life data and empirical experiments confirm that our models are much more effective, efficient and scalable that prior work on this problem.
References |
Related Articles |
Metrics
|
|
Efficient query processing framework for big data warehouse: an almost join-free approach
Huiju WANG,Xiongpai QIN,Xuan ZHOU,Furong LI,Zuoyan QIN,Qing ZHU,Shan WANG
Front. Comput. Sci.. 2015, 9 (2): 224-236.
https://doi.org/10.1007/s11704-014-4025-6
The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.
References |
Related Articles |
Metrics
|
|
Efficient subtree results computation for XML keyword queries
Ziyang CHEN,Jia LIU,Xingmin ZHAO,Junfeng ZHOU
Front. Comput. Sci.. 2015, 9 (2): 253-264.
https://doi.org/10.1007/s11704-014-3473-3
In this paper, we focus on efficient construction of restricted subtree (RSubtree) results for XML keyword queries on a multicore system. We firstly show that the performance bottlenecks for existing methods lie in 1) computing the set of relevant keyword nodes(RKNs) for each subtree root node, 2) constructing the corresponding RSubtree, and 3) parallel execution. We then propose a two-step generic top-down subtree construction algorithm, which computes SLCA/ELCA nodes in the first step, and parallelly gets RKNs and generates RSubtree results in the second step, where generic means that 1) our method can be used to compute different kinds of subtree results, 2) our method is independent of the query semantics; top-down means that our method constructs each RSubtree by visiting nodes of the subtree constructed based on an RKN set level-by-level from left to right, such that to avoid visiting as many useless nodes as possible. The experimental results show that our method is much more efficient than existing ones according to various metrics.
References |
Supplementary Material |
Related Articles |
Metrics
|
|
Discovering admissibleWeb services with uncertain QoS
Xiaodong FU,Kun YUE,Li LIU,Ping ZOU,Yong FENG
Front. Comput. Sci.. 2015, 9 (2): 265-279.
https://doi.org/10.1007/s11704-014-4059-9
Open and dynamic environments lead to inherent uncertainty of Web service QoS (Quality of Service), and the QoS-aware service selection problem can be looked upon as a decision problem under uncertainty. We use an empirical distribution function to describe the uncertainty of scores obtained from historical transactions. We then propose an approach to discovering the admissible set of services including alternative services that are not dominated by any other alternatives according to the expected utility criterion. Stochastic dominance (SD) rules are used to compare two services with uncertain scores regardless of the distribution form of their uncertain scores. By using the properties of SD rules, an algorithm is developed to reduce the number of SD tests, by which the admissible services can be reported progressively. We prove that the proposed algorithm can be run on partitioned or incremental alternative services. Moreover, we achieve some useful theoretical conclusions for correct pruning of unnecessary calculations and comparisons in each SD test, by which the efficiency of the SD tests can be improved. We make a comprehensive experimental study using real datasets to evaluate the effectiveness, efficiency, and scalability of the proposed algorithm.
References |
Supplementary Material |
Related Articles |
Metrics
|
|
A survey on trust based detection and isolation of malicious nodes in ad-hoc and sensor networks
Adnan AHMED,Kamalrulnizam ABU BAKAR,Muhammad Ibrahim CHANNA,Khalid HASEEB,Abdul Waheed KHAN
Front. Comput. Sci.. 2015, 9 (2): 280-296.
https://doi.org/10.1007/s11704-014-4212-5
Mobile ad-hoc networks (MANETs) and wireless sensor networks (WSNs) have gained remarkable appreciation and technological development over the last few years. Despite ease of deployment, tremendous applications and significant advantages, security has always been a challenging issue due to the nature of environments in which nodes operate. Nodes’ physical capture, malicious or selfish behavior cannot be detected by traditional security schemes. Trust and reputation based approaches have gained global recognition in providing additional means of security for decision making in sensor and ad-hoc networks. This paper provides an extensive literature review of trust and reputation based models both in sensor and ad-hoc networks. Based on the mechanism of trust establishment, we categorize the stateof-the-art into two groups namely node-centric trust models and system-centric trust models. Based on trust evidence, initialization, computation, propagation and weight assignments, we evaluate the efficacy of the existing schemes. Finally, we conclude our discussion with identification of some unresolved issues in pursuit of trust and reputation management.
References |
Related Articles |
Metrics
|
|
Cloud authorization: exploring techniques and approach towards effective access control framework
Rahat MASOOD,Muhammad Awais SHIBLI,Yumna GHAZI,Ayesha KANWAL,Arshad ALI
Front. Comput. Sci.. 2015, 9 (2): 297-321.
https://doi.org/10.1007/s11704-014-3160-4
Despite the various attractive features that Cloud has to offer, the rate of Cloud migration is rather slow, primarily due to the serious security and privacy issues that exist in the paradigm. One of the main problems in this regard is that of authorization in the Cloud environment, which is the focus of our research. In this paper, we present a systematic analysis of the existing authorization solutions in Cloud and evaluate their effectiveness against well-established industrial standards that conform to the unique access control requirements in the domain. Our analysis can benefit organizations by helping them decide the best authorization technique for deployment in Cloud; a case study along with simulation results is also presented to illustrate the procedure of using our qualitative analysis for the selection of an appropriate technique, as per Cloud consumer requirements. From the results of this evaluation, we derive the general shortcomings of the extant access control techniques that are keeping them from providing successful authorization and, therefore, widely adopted by the Cloud community. To that end, we enumerate the features an ideal access control mechanisms for the Cloud should have, and combine them to suggest the ultimate solution to this major security challenge – access control as a service (ACaaS) for the software as a service (SaaS) layer. We conclude that a meticulous research is needed to incorporate the identified authorization features into a generic AcaaS framework that should be adequate for providing high level of extensibility and security by integrating multiple access control models.
References |
Supplementary Material |
Related Articles |
Metrics
|
|
Virtual machine selection and placement for dynamic consolidation in Cloud computing environment
Xiong FU,Chen ZHOU
Front. Comput. Sci.. 2015, 9 (2): 322-330.
https://doi.org/10.1007/s11704-015-4286-8
Dynamic consolidation of virtual machines (VMs) in a data center is an effective way to reduce the energy consumption and improve physical resource utilization. Determining which VMs should be migrated from an overloaded host directly influences the VM migration time and increases energy consumption for the whole data center, and can cause the service level of agreement (SLA), delivered by providers and users, to be violated. So when designing a VM selection policy, we not only consider CPU utilization, but also define a variable that represents the degree of resource satisfaction to select the VMs. In addition, we propose a novel VM placement policy that prefers placing a migratable VM on a host that has the minimum correlation coefficient. The bigger correlation coefficient a host has, the greater the influence will be on VMs located on that host after the migration. Using CloudSim, we run simulations whose results let draw us to conclude that the policies we propose in this paper perform better than existing policies in terms of energy consumption, VM migration time, and SLA violation percentage.
References |
Supplementary Material |
Related Articles |
Metrics
|
10 articles
|