Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (2) : 330-352    https://doi.org/10.1007/s11704-015-4534-y
RESEARCH ARTICLE
Skyline-join query processing in distributed databases
Mei BAI1,*(),Junchang XIN1,Guoren WANG1,Roger ZIMMERMANN2,Xite WANG1
1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
2. School of Computing, National University of Singapore, Singapore 117417, Singapore
 Download: PDF(1075 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.

Keywords skyline-join      distributed      filtering strategy      scheduling plan      rotation     
Corresponding Author(s): Mei BAI   
Issue Date: 16 March 2016
 Cite this article:   
Mei BAI,Junchang XIN,Guoren WANG, et al. Skyline-join query processing in distributed databases[J]. Front. Comput. Sci., 2016, 10(2): 330-352.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-015-4534-y
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I2/330
1 Borzsony S, Kossmann D, and Stocker K. The skyline operator. In: Proceedings of the 17th IEEE International Conference on Data Engineering. 2001, 421–430
https://doi.org/10.1109/ICDE.2001.914855
2 Balke W-T, Güntzer U, Zheng J X. Efficient distributed skylining for web information systems. In: Proceedings of the 9th International Conference on Extending Database Technology. 2004, 256–273
https://doi.org/10.1007/978-3-540-24741-8_16
3 Afrati F-N, Koutris P, Suciu D, Ullman J-D. Parallel skyline queries. In: Proceedings of the 15th International Conference on Database Theory. 2012, 274–284
https://doi.org/10.1145/2274576.2274605
4 Chen L, Lian X. Dynamic skyline queries in metric spaces. In: Proceedings of the 11th International Conference on Extending Database Technology. 2008, 333–343
https://doi.org/10.1145/1353343.1353386
5 Sun D L, Wu S, Li J Z, Tung A K H. Skyline-join in distributed databases. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 176–181
https://doi.org/10.1109/icdew.2008.4498313
6 Nagendra M, Candan K S. Skyline-sensitive joins with LR-pruning. In: Proceedings of the 15th ACM International Conference on Extending Database Technology. 2012, 252–263
https://doi.org/10.1145/2247596.2247627
7 Jin W, Ester M, Hu Z, Han J. The multi-relational skyline operator. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 1276–1280
https://doi.org/10.1109/icde.2007.368992
8 Vlachou A, Doulkeridis C, Polyzotis N. Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD International conference on Management Of Data. 2011, 73–84
https://doi.org/10.1145/1989323.1989332
9 Jin W, Morse MD, Patel JM, Ester M, Hu Z. Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th IEEE International Conference on Data Engineering. 2010, 249–260
https://doi.org/10.1109/icde.2010.5447841
10 Kung H-T, Luccio F, Preparata F P. On finding the maxima of a set of vectors. Journal of the ACM, 1975, 22(4): 469–476
https://doi.org/10.1145/321906.321910
11 Chomicki J, Godfrey P, Gryz J, Liang D. Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 717–719
https://doi.org/10.1109/icde.2003.1260846
12 Tan K-L, Eng P-K, Ooi B C. Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 301–310
13 Kossmann D, Ramsak F, Rost S. Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases. 2002, 275–286
https://doi.org/10.1016/B978-155860869-6/50032-9
14 Papadias D, Tao Y, Fu G, Seeger B. An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management Of Data. 2003, 467–478
https://doi.org/10.1145/872757.872814
15 Lee K C, Zheng B, Li H, Lee WC. Approaching the skyline in Z order. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 279–290
16 Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 2004, 137–150
17 Wang H J, Qin X P, Zhou X, Li F R, Qin Z Y, Zhu Q,Wang S. Efficient query processing framework for big data warehouse: an almost joinfree approach. Frontiers of Computer Science, 2015, 9(2): 224–236
https://doi.org/10.1007/s11704-014-4025-6
18 Vlachou A, Doulkeridis C, Kotidis Y, Vazirgiannis M. Skypeer: Efficient subspace skyline computation over distributed data. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 416–425
https://doi.org/10.1109/icde.2007.367887
19 Chen L, Cui B, Lu H, Xu L H, Xu Q Q. iSky: Efficient and progressive skyline computing in a structured P2P network. In: Proceedings of the 28th IEEE International Conference on Distributed Computing Systems. 2008, 160–167
20 Cui B, Lu H, Xu Q Q, Chen L J, Dai Y F, Zhou Y L. Parallel distributed processing of constrained skyline queries by filtering. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 546–555
https://doi.org/10.1109/icde.2008.4497463
21 Afrati F N, Koutris P, Suciu D, Ullman J D. Parallel skyline queries. In: Proceedings of the 15 th ACM International Conference on Digital Telecommunications. 2012, 274–284
https://doi.org/10.1145/2274576.2274605
22 Köhler H, Yang J, Zhou X F. Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management Of Data. 2011, 85–96
https://doi.org/10.1145/1989323.1989333
23 Park Y, Min J-K, Shim K. Parallel computation of skyline and reverse skyline queries using mapreduce. In: Proceedings of International Conference on Very Large Data Bases. 2013, 2002–2013
https://doi.org/10.14778/2556549.2556580
24 Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004, 137–150
25 Bartolini I, Ciaccia P, Patella M. SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 405–414
https://doi.org/10.1145/1183614.1183674
26 Lin X M, Zhang Y, Zhang W J, Cheema M A. Stochastic skyline operator. In: Proceedings of the 27th IEEE International Conference on Data Engineering. 2011, 721–732
https://doi.org/10.1109/icde.2011.5767896
27 Godfrey P, Shipley R, Gryz J. Maximal vector computation in large data sets. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 229–240
28 Khalefa M E, Mokbel M F, Levandoski J J. Skyline query processing for uncertain data. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1293–1296
https://doi.org/10.1145/1871437.1871604
29 Lian X, Chen L. Efficient processing of probabilistic group subspace skyline queries in uncertain databases. Information Systems, 2013, 38(3): 265–285
https://doi.org/10.1016/j.is.2012.08.006
30 Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422–426
https://doi.org/10.1145/362686.362692
[1] FCS-0330-14534-MB_suppl_1 Download
[1] Tingting CHEN, Haikun LIU, Xiaofei LIAO, Hai JIN. Resource abstraction and data placement for distributed hybrid memory pool[J]. Front. Comput. Sci., 2021, 15(3): 153103-.
[2] Jintao GAO, Wenjie LIU, Zhanhuai LI. An adaptive strategy for statistics collecting in distributed database[J]. Front. Comput. Sci., 2020, 14(5): 145610-.
[3] Jingwei ZHANG, Chao YANG, Qing YANG, Yuming LIN, Yanchun ZHANG. HGeoHashBase: an optimized storage model of spatial objects for location-based services[J]. Front. Comput. Sci., 2020, 14(1): 208-218.
[4] Wenjie LIU, Zhanhuai LI. An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment[J]. Front. Comput. Sci., 2019, 13(6): 1309-1325.
[5] Tao ZHU, Huiqi HU, Weining QIAN, Huan ZHOU, Aoying ZHOU. Fault-tolerant precise data access on distributed log-structured merge-tree[J]. Front. Comput. Sci., 2019, 13(4): 760-777.
[6] Shichen ZOU, Junyu LIN, Huiqiang WANG, Hongwu LV, Guangsheng FENG. An effective method for service components selection based on micro-canonical annealing considering dependability assurance[J]. Front. Comput. Sci., 2019, 13(2): 264-279.
[7] Haiyang ZHOU, Yunzhi YU. Applying rotation-invariant star descriptor to deep-sky image registration[J]. Front. Comput. Sci., 2018, 12(5): 1013-1025.
[8] Yu ZHOU, Nvqi ZHOU, Tingting HAN, Jiayi GU, Weigang WU. Probabilistic verification of hierarchical leader election protocol in dynamic systems[J]. Front. Comput. Sci., 2018, 12(4): 763-776.
[9] Wuyang JU,Jianxin LI,Weiren YU,Richong ZHANG. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.
[10] Shuai MA,Jia LI,Chunming HU,Xuelian LIN,Jinpeng HUAI. Big graph search: challenges and techniques[J]. Front. Comput. Sci., 2016, 10(3): 387-398.
[11] Hongpeng YIN,Jinxing LI,Yi CHAI,Simon X. YANG. A survey on distributed compressed sensing: theory and applications[J]. Front. Comput. Sci., 2014, 8(6): 893-904.
[12] Weidong MIN, Ke CHEN, Yongzhen KE. A matrix grammar approach for automatic distributed network resource management[J]. Front Comput Sci, 2013, 7(4): 583-594.
[13] Qinma KANG, Hong HE. Task assignment for minimizing application completion time using honeybee mating optimization[J]. Front Comput Sci, 2013, 7(3): 404-415.
[14] Jian LIN, Li ZHA, Zhiwei XU. Consolidated cluster systems for data centers in the cloud age: a survey and analysis[J]. Front. Comput. Sci., 2013, 7(1): 1-19.
[15] Ying ZHANG, Gang HUANG, Wei ZHANG, Xuanzhe LIU, Hong MEI. Towards module-based automatic partitioning of Java applications[J]. Front Comput Sci, 2012, 6(6): 725-740.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed