Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (4) : 154611    https://doi.org/10.1007/s11704-020-0122-x
RESEARCH ARTICLE
Efficient k-dominant skyline query over incomplete data using MapReduce
Linlin DING, Shu WANG, Baoyan SONG()
School of Information, Liaoning University, Shenyang 110036, China
 Download: PDF(685 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects. Sometimes, a skyline query may return so many results because it cannot control the retrieval conditions especially for highdimensional datasets. As an extension of skyline query, the kdominant skyline query reduces the control of the dimension by controlling the value of the parameter kto achieve the purpose of reducing the retrieval objects. In addition, with the continuous promotion of Bigdata applications, the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure, no power of battery, accidental loss, so that the data might be incomplete with missing values in some attributes. Obviously, the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared. Meanwhile, the existing algorithms are unsuitable for directly used to the incomplete big data. Based on the above situations, this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment. First, we propose an index structure over incomplete data, named incomplete data index based on dominate hierarchical tree (ID-DHT). Applying the bucket strategy, the incomplete data is divided into different buckets according to the dimensions of missing attributes. Second, we also put forward query algorithm for incomplete data in MapReduce environment, named MapReduce incomplete data based on dominant hierarchical tree algorithm (MR-ID-DHTA). The data in the bucket is allocated to the subspace according to the dominant condition by Map function. Reduce function controls the data according to the key value and returns the k-dominant skyline query result. The effective experiments demonstrate the validity and usability of our index structure and the algorithm.

Keywords k-dominant skyline query      incomplete data      MapReduce      index structure      big data     
Corresponding Author(s): Baoyan SONG   
Just Accepted Date: 25 August 2020   Issue Date: 08 May 2021
 Cite this article:   
Linlin DING,Shu WANG,Baoyan SONG. Efficient k-dominant skyline query over incomplete data using MapReduce[J]. Front. Comput. Sci., 2021, 15(4): 154611.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-0122-x
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I4/154611
1 X Y Miao, Y J Gao, G Chen, T Y Zhang. K-dominant skyline queries on incomplete data. Information Sciences, 2016, 367: 990–1011
https://doi.org/10.1016/j.ins.2016.07.034
2 Y Wang, Z Shi, J Wang, L F Sun, B Y Song. Skyline preference query based on massive and incomplete dataset. IEEE Access, 2017, 5: 3183–3192
https://doi.org/10.1109/ACCESS.2016.2639558
3 Y F Zeng, K L Li, S Yu, Y T Zhou, K Q Li. Parallel and progressive approaches for skyline query over probabilistic incomplete database. IEEE Access, 2018, 6: 13289–13301
https://doi.org/10.1109/ACCESS.2018.2806379
4 C Y Chan, H V Jagadish, K L Tan, K H A Tung, Z J Zhang. Finding k-dominant skylines in high dimensional space. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2006, 503–514
https://doi.org/10.1145/1142473.1142530
5 M A Siddique, Y Morimoto. K-dominant skyline computation by using sorting-filtering method. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 839–848
https://doi.org/10.1007/978-3-642-01307-2_87
6 M A Siddique, Y Morimoto. Efficient maintenance of k-dominant skyline for frequently updated database. In: Proceedings of International Conference on Advances in Databases, Knowledge and Data Applications. 2010, 107–110
https://doi.org/10.1109/DBKDA.2010.16
7 M A Siddique, H Tian, Y Morimoto. K-dominant skyline query computation in MapReduce environment. IEICE Transactions on Information and Systems, 2015, 98(5): 1027–1034
https://doi.org/10.1587/transinf.2014DAP0010
8 L G Dong, X W Cui, Z F Wang, S W Cheng. Finding k-dominant skyline cube based on sharing-strategy. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery. 2010, 1694–1698
https://doi.org/10.1109/FSKD.2010.5569387
9 A Awasthi, A Bhattacharya, S Gupta, U H Singh. K-dominant skyline join queries: extending the join paradigm to k-dominant skylines. In: Proceedings of the 33rd IEEE International Conference on Data Engineering. 2017, 99–102
https://doi.org/10.1109/ICDE.2017.49
10 J M Huang, J C Xin, G R Wang, M Li. Efficient k-dominant skyline processing in wireless sensor networks. In: Proceedings of the 9th International Conference on Hybrid Intelligent Systems. 2009, 289–294
https://doi.org/10.1109/HIS.2009.273
11 C S Park, S M Jang, J S Yoo. An energy-efficient method for processing a k-dominant skyline query in wireless sensor networks. Transactions on Communications, 2013, 96(7): 1857–1864
https://doi.org/10.1587/transcom.E96.B.1857
12 Y J Gao, X Y Miao, H Y Cui, G Chen, Q Li. Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data. Expert System, 2014, 41(10): 4959–4974
https://doi.org/10.1016/j.eswa.2014.02.033
13 Y Gulzar, A A Alwan, N Salleh, I F A Shaikhli, S I M Alvi. A framework for evaluating skyline queries over incomplete data. In: Proceedings of International Conference on Mobile Systems and Pervasive Computing. 2016, 191–198
https://doi.org/10.1016/j.procs.2016.08.030
14 X Y Miao, Y J Gao, S Guo, L Chen, J W Yin, Q Li. Answering skyline queries over incomplete data with crowdsourcing. In: Proceedings of International Conference on Data Engineering. 2020, 2032–2033
https://doi.org/10.1109/ICDE48307.2020.00235
15 X Y Miao, Y J Gao, B H Zheng, G Chen, H Y Cui. Top-k dominating queries on incomplete data. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 252–266
https://doi.org/10.1109/TKDE.2015.2460742
16 X Y Miao, Y J Gao, G Chen, B H Zheng, H Y Cui. Processing incomplete knearest neighbor search. IEEE Transactions on Fuzzy Systems, 2016, 24(6): 1349–1363
https://doi.org/10.1109/TFUZZ.2016.2516562
17 K Q Zhang, H Gao, H Z Wang, J Z Li. ISSA: efficient skyline computation for incomplete data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2016, 321–328
https://doi.org/10.1007/978-3-319-32055-7_26
18 Y F Zeng, K L Li, S Yu, Y T Zhou, K Q Li. Parallel and progressive approaches for skyline query over probabilistic incomplete database. IEEE Access, 2018, 6: 13289–13301
https://doi.org/10.1109/ACCESS.2018.2806379
19 K Q Zhang, H Gao, X X Han, Z P Cai, J Z Li. Probabilistic skyline on incomplete data. In: Proceedings of ACM International Conference on Information and Knowledge Management. 2017, 427–436
https://doi.org/10.1145/3132847.3132930
20 A A Ali, I Hamidah, I U Nur, S Fatimah. Processing skyline queries in incomplete distributed databases. Journal Intelligent Information Systems, 2016, 48(2): 399–420
https://doi.org/10.1007/s10844-016-0419-2
21 H Z Wang, S J Yin, M Sun, Y E Wang, H P Wang, J Z Li, H Gao. Efficient computation of skyline queries on incomplete dynamic data. IEEE Access, 2018, 6: 52741–52753
https://doi.org/10.1109/ACCESS.2018.2869819
22 B Y Li, Y R Cheng, Y Yuan, G R Wang, L Chen. Three-dimensional stable matching problem for spatial crowdsourcing platforms. In: Proceedings of ACM Conference on Knowledge Discovery and Data Mining. 2019, 1643–1653
https://doi.org/10.1145/3292500.3330879
23 K Mullesgaad, J L Pederseny, H Lu, Y L Zhou. Efficient skyline computation in MapReduce. In: Proceedings of International Conference on Extending Database Technology. 2014, 37–48
24 Y Y Li, WY Qu, Z Y Li, Y J Xu, C Q Ji, J F Wu. Parallel dynamic skyline query using MapReduce. In: Proceedings of International Conference on Cloud Computing and Big Data. 2014, 95–100
https://doi.org/10.1109/CCBD.2014.20
25 J Zhang, X F Jiang, W S Ku, X Qin. Efficient parallel skyline evaluation using MapReduce. IEEE Transactions on Parallel & Distributed Systems, 2016, 27(7): 1996–2009
https://doi.org/10.1109/TPDS.2015.2472016
26 W L Wang, J Zhang, M T Sun, W S Ku. Efficient parallel spatial skyline evaluation using MapReduce. In: Proceedings of International Conference on Extending Database Technology. 2017, 426–437
27 Y Park, J K Min, K Shim. Efficient processing of skyline queries using MapReduce. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(5): 1031–1044
https://doi.org/10.1109/TKDE.2017.2654459
28 J S Kim, M H Kim. An efficient parallel processing method for skyline queries in MapReduce. The Journal of Supercomputing, 2018, 74(2): 886–935
https://doi.org/10.1007/s11227-017-2171-y
29 M Y Jang, Y H Song, JW Chang. A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce. Distributed and Parallel Databases, 2017, 35(3–4): 383–409
https://doi.org/10.1007/s10619-017-7202-4
30 M A Siddique, H Tian, Y Morimoto. Distributed skyline computation of vertically split databases by using MapReduce. In: Proceedings of International Conference on Database Systems for Advanced Applications. 2014, 33–45
https://doi.org/10.1007/978-3-662-43984-5_3
31 L Chen, L Kuang, J Wu. MapReduce based skyline services selection for QoS-aware composition. In: Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium. 2012, 2035–2042
https://doi.org/10.1109/IPDPSW.2012.253
32 L Ding, G R Wang, J C Xin, Y Yuan. Efficient probabilistic skyline query processing in MapReduce. In: Proceedings of IEEE International Congress on Big Data. 2013, 203–210
https://doi.org/10.1109/BigData.Congress.2013.35
33 B Y Song, A L Liu, L L Ding. Efficient top-k skyline computation in MaprReduce. In: Proceedings of IEEE International Workshop on Wireless Sensor. 2015, 67–70
https://doi.org/10.1109/WISA.2015.57
34 L L Ding, X Zhang, M X Sun, A L Liu, B Y Song. Efficient user preferences-based top-k skyline using MapReduce. In: Proceedings of International Conference of Pioneering Computer Scientists, Engineers and Educators. 2018, 74–87
https://doi.org/10.1007/978-981-13-2203-7_7
35 A Zaman, M A Siddique, Y Annisa, Morimoto. Selecting key person of social network using skyline query in MapReduce framework. In: Proceedings of International Symposium on Computing and Networking. 2015, 213–219
https://doi.org/10.1109/CANDAR.2015.84
36 L Chen, K Hang, J Wu. MapReduce skyline query processing with a new angular partitioning approach. In: Proceedings of IEEE International Parallel & Distributed Processing Symposium. 2012, 2262–2270
https://doi.org/10.1109/IPDPSW.2012.279
[1] Article highlights Download
[1] Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data[J]. Front. Comput. Sci., 2020, 14(5): 145310-.
[2] Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO. Diversification on big data in query processing[J]. Front. Comput. Sci., 2020, 14(4): 144607-.
[3] Xingyue CHEN, Tao SHANG, Feng ZHANG, Jianwei LIU, Zhenyu GUAN. Dynamic data auditing scheme for big data storage[J]. Front. Comput. Sci., 2020, 14(1): 219-229.
[4] Samuel IRVING, Bin LI, Shaoming CHEN, Lu PENG, Weihua ZHANG, Lide DUAN. Computer comparisons in the presence of performance variation[J]. Front. Comput. Sci., 2020, 14(1): 21-41.
[5] Zhuo WANG, Qun CHEN, Bo SUO, Wei PAN, Zhanhuai LI. Reducing partition skew on MapReduce: an incremental allocation approach[J]. Front. Comput. Sci., 2019, 13(5): 960-975.
[6] Min NIE, Lei YANG, Jun SUN, Han SU, Hu XIA, Defu LIAN, Kai YAN. Advanced forecasting of career choices for college students based on campus big data[J]. Front. Comput. Sci., 2018, 12(3): 494-503.
[7] Xuegang HU, Peng ZHOU, Peipei LI, Jing WANG, Xindong WU. A survey on online feature selection with streaming features[J]. Front. Comput. Sci., 2018, 12(3): 479-493.
[8] Xiaoye MIAO, Yunjun GAO, Su GUO, Wanqi LIU. Incomplete data management: a survey[J]. Front. Comput. Sci., 2018, 12(1): 4-25.
[9] Cheqing JIN, Jie CHEN, Huiping LIU. MapReduce-based entity matching with multiple blocking functions[J]. Front. Comput. Sci., 2017, 11(5): 895-911.
[10] Wuyang JU,Jianxin LI,Weiren YU,Richong ZHANG. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.
[11] Shuai MA,Jia LI,Chunming HU,Xuelian LIN,Jinpeng HUAI. Big graph search: challenges and techniques[J]. Front. Comput. Sci., 2016, 10(3): 387-398.
[12] Xite WANG,Derong SHEN,Mei BAI,Tiezheng NIE,Yue KOU,Ge YU. SAMES: deadline-constraint scheduling in MapReduce[J]. Front. Comput. Sci., 2015, 9(1): 128-141.
[13] Huiju WANG,Furong LI,Xuan ZHOU,Yu CAO,Xiongpai QIN,Jidong CHEN,Shan WANG. HC-Store: putting MapReduce’s foot in two camps[J]. Front. Comput. Sci., 2014, 8(6): 859-871.
[14] Jinchuan CHEN, Yueguo CHEN, Xiaoyong DU, Cuiping LI, Jiaheng LU, Suyun ZHAO, Xuan ZHOU. Big data challenge: a data management perspective[J]. Front Comput Sci, 2013, 7(2): 157-164.
[15] Ling LIU. Computing infrastructure for big data processing[J]. Front Comput Sci, 2013, 7(2): 165-170.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed