Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2018, Vol. 12 Issue (3): 582-592   https://doi.org/10.1007/s11704-016-6146-6
  本期目录
Efficient sampling methods for characterizing POIs on maps based on road networks
Ziting ZHOU1, Pengpeng ZHAO1(), Victor S. SHENG2,3, Jiajie XU1, Zhixu LI1, Jian WU1, Zhiming CUI1
1. School of Computer Science and Technology, Soochow University, Suzhou 215006, China
2. Department of Computer Science, University of Central Arkansas, Conway AR 72035, USA
3. Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, China
 全文: PDF(592 KB)  
Abstract

With the rapid development of location-based services, a particularly important aspect of start-up marketing research is to explore and characterize points of interest (PoIs) such as restaurants and hotels on maps. However, due to the lack of direct access to PoI databases, it is necessary to rely on existing APIs to query PoIs within a region and calculate PoI statistics. Unfortunately, public APIs generally impose a limit on the maximum number of queries. Therefore, we propose effective and efficient sampling methods based on road networks to sample PoIs on maps and provide unbiased estimators for calculating PoI statistics. In general, the more intense the roads, the denser the distribution of PoIs is within a region. Experimental results show that compared with state-of-the-art methods, our sampling methods improve the efficiency of aggregate statistical estimations.

Key wordssampling    aggregate statistical estimation    road networks
收稿日期: 2016-03-12      出版日期: 2018-05-02
Corresponding Author(s): Pengpeng ZHAO   
 引用本文:   
. [J]. Frontiers of Computer Science, 2018, 12(3): 582-592.
Ziting ZHOU, Pengpeng ZHAO, Victor S. SHENG, Jiajie XU, Zhixu LI, Jian WU, Zhiming CUI. Efficient sampling methods for characterizing POIs on maps based on road networks. Front. Comput. Sci., 2018, 12(3): 582-592.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-016-6146-6
https://academic.hep.com.cn/fcs/CN/Y2018/V12/I3/582
1 Dalvi N, Kumar R, Machanavajjhala A, Rastogi V. Sampling hidden objects using nearest-neighbor oracles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2011, 1325–1333
https://doi.org/10.1145/2020408.2020606
2 Li Y H, Steiner M, Wang L M, Zhang Z L, Bao J. Dissecting foursquare venue popularity via random region sampling. In: Proceedings of ACM Conference on CoNEXT Student Workshop. 2012, 21–22
https://doi.org/10.1145/2413247.2413261
3 Wang P H, He W B, Liu X. An efficient sampling method for characterizing points of interests on maps. In: Proceeding of the 30th IEEE International Conference on Data Engineering. 2014, 1012–1023
https://doi.org/10.1109/ICDE.2014.6816719
4 Bar-Yossef Z, Gurevich M. Efficient search engine measurements. In: Proceedings of the 16th International Conference onWorld Wide Web. 2007, 401–410
https://doi.org/10.1145/1242572.1242627
5 Bar-Yossef Z, Gurevich M. Mining search engine query logs via suggestion sampling. Proceedings of the VLDB Endowment, 2008, 1(1): 54–65
https://doi.org/10.14778/1453856.1453868
6 Bar-Yossef Z, Gurevich M. Random sampling from a search engine’s index. Journal of the ACM, 2008, 55(5): 24
https://doi.org/10.1145/1411509.1411514
7 Brin S, Page L. Reprint of: the anatomy of a large-scale hypertextual Web search engine. Computer Networks, 2012, 56(18): 3825–3833
https://doi.org/10.1016/j.comnet.2012.10.007
8 Gulli A, Signorini A. The indexable Web is more than 11.5 billion pages. In: Proceeding of the 14th International Conference on World Wide Web. 2005, 902–903
https://doi.org/10.1145/1062745.1062789
9 Henzinger M R, Heydon A, Mitzenmacher M, Najork M. On nearuniform URL sampling. Computer Networks, 2000, 33(1): 295–308
https://doi.org/10.1016/S1389-1286(00)00055-4
10 Rusmevichientong P, Pennock D M, Lawrence S, Giles C L. Methods for sampling pages uniformly from the World Wide Web. In: Proceeding of AAAI Fall Symposium on Using Uncertainty Within Computation. 2001, 121–128
11 Zhang M Y, Zhang N, Das G. Mining a search engine’s corpus: efficient yet unbiased sampling and aggregate estimation. In: Proceedings of ACM SIGMOD International Conference on Management of data. 2011, 793–804
https://doi.org/10.1145/1989323.1989406
12 Agichtein E, Ipeirotis P, Gravano L. Modeling query-based access to text databases. In: Proceeding of WebDB. 2003
13 Barbosa L, Freire J. Siphoning hidden-Web data through keywordbased interfaces. Journal of Information and Data Management, 2010, 1(1): 133
14 Barbosa L, Freire J. Searching for hidden-Web databases. In: Proceeding of WebDB. 2005
15 Callan J, Connell M. Query-based sampling of text databases. ACM Transactions on Information Systems, 2001, 19(2): 97–130
https://doi.org/10.1145/382979.383040
16 Ipeirotis P G, Gravano L. Distributed search over the hidden Web: hierarchical database sampling and selection. In: Proceedings of the 28th International Conference on Very Large Data Bases. 2002, 394–405
https://doi.org/10.1016/B978-155860869-6/50042-1
17 Jin X, Zhang N, Das G. Attribute domain discovery for hidden Web databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2011, 553–564
https://doi.org/10.1145/1989323.1989381
18 Lu J G. Ranking bias in deep Web size estimation using capture recapture method. Data & Knowledge Engineering, 2010, 69(8): 866–879
https://doi.org/10.1016/j.datak.2010.03.007
19 Zerfos P, Cho J, Ntoulas A. Downloading textual hidden Web content through keyword queries. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries. 2005, 100–109
20 ĺćlvarez M, Raposo J, Pan A, Cacheda F, Bellas F, Carneiro V. Crawling the content hidden behind Web forms. In: Proceeding of International Conference on Computational Science and Its Applications. 2007, 322–333
21 Bruno N, Gravano L, Marian A. Evaluating top-k queries over Webaccessible databases. In: Proceedings of the 18th International Conference on Data Engineering. 2002, 369–380
https://doi.org/10.1109/ICDE.2002.994751
22 Chang K C C, He B, Li C K, Patel M, Zhang Z. Structured databases on the Web: observations and implications. ACM SIGMOD Record, 2004, 33(3): 61–70
https://doi.org/10.1145/1031570.1031584
23 Raghavan S, Garcia-Molina H. Crawling the hidden Web. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001
24 Sheng C, Zhang N, Tao Y F, Jin X. Optimal algorithms for crawling a hidden database in the Web. Proceedings of the VLDB Endowment, 2012, 5(11): 1112–1123
https://doi.org/10.14778/2350229.2350232
25 Thirumuruganathan S, Zhang N, Das G. Digging deeper into deep Web databases by breaking through the top-k barrier. 2012, ArXiv Preprint ArXiv: 1208.3876
26 Hedley Y L, Younas M, James A, Sanderson M. A two-phase sampling technique for information extraction from hidden Web databases. In: Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management. 2004, 1–8
https://doi.org/10.1145/1031453.1031456
27 Hedley Y L, Younas M, James A, Sanderson M. Sampling, information extraction and summarisation of hidden Web databases. Data & Knowledge Engineering, 2006, 59(2): 213–230
https://doi.org/10.1016/j.datak.2006.01.009
28 Shokouhi M, Zobel J, Scholer F, Tahaghoghi S M. Capturing collection size for distributed non-cooperative retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 316–323
https://doi.org/10.1145/1148170.1148227
29 Dasgupta A, Zhang N, Das G. Leveraging count information in sampling hidden databases. In: Proceeding of the 25th IEEE International Conference on Data Engineering. 2009, 329–340
https://doi.org/10.1109/ICDE.2009.112
30 Dasgupta A, Jin X, Jewell B, Zhang N, Das G. Unbiased estimation of size and other aggregates over hidden Web databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 855–866
https://doi.org/10.1145/1807167.1807259
31 Dasgupta A, Das G, Mannila H. A random walk approach to sampling hidden databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2007, 629–640
https://doi.org/10.1145/1247480.1247550
32 Dasgupta A, Zhang N, Das G. Turbo-charging hidden database samplers with overflowing queries and skew reduction. In: Proceedings of the 13th International Conference on Extending Database Technology. 2010, 51–62
https://doi.org/10.1145/1739041.1739051
33 Drummond A J, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 2007, 7(1): 1
https://doi.org/10.1186/1471-2148-7-214
34 Wang F, Agrawal G. Effective and efficient sampling methods for deep Web aggregation queries. In: Proceedings of the 14th International Conference on Extending Database Technology. 2011, 425–436
https://doi.org/10.1145/1951365.1951416
[1] Supplementary Material Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed