Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (4) : 154315    https://doi.org/10.1007/s11704-020-9364-x
RESEARCH ARTICLE
Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing
Tao HAN1,2, Hailong SUN1,2(), Yangqiu SONG3, Yili FANG4, Xudong LIU1,2
1. SKLSDE Lab, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
2. Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China
3. Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong 999077, China
4. School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China
 Download: PDF(690 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilisticmodel to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement thanmajority voting and other algorithms when more specific answers are expected, especially for sparse data.

Keywords crowdsourcing      knowledge acquisition      EM algorithm      label aggregation     
Corresponding Author(s): Hailong SUN   
Just Accepted Date: 07 April 2020   Issue Date: 10 October 2020
 Cite this article:   
Tao HAN,Hailong SUN,Yangqiu SONG, et al. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing[J]. Front. Comput. Sci., 2021, 15(4): 154315.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-9364-x
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I4/154315
1 J Howe. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
2 J Wang, G Li, T Kraska, M J Franklin, J Feng. Leveraging transitive relations for crowdsourced joins. In: Proceedings of ACM Conference on Management of Data. 2013, 229–240
https://doi.org/10.1145/2463676.2465280
3 B C Russell, A Torralba, K P Murphy, W T Freeman. Labelme: a database and Web-based tool for image annotation. International Journal of Computer Vision, 2008, 77(1–3): 157–173
https://doi.org/10.1007/s11263-007-0090-8
4 K Hwang, S Y Lee. Environmental audio scene and activity recognition through mobile-based crowdsourcing. IEEE Transactions on Consumer Electronics, 2012, 58(2): 700–705
https://doi.org/10.1109/TCE.2012.6227479
5 C Vondrick, D Patterson, D Ramanan. Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 2013, 101(1): 184–204
https://doi.org/10.1007/s11263-012-0564-1
6 B Waggoner, Y Chen. Output agreement mechanisms and common knowledge. In: Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing. 2014
7 V Ordonez, J Deng, Y Choi, A C Berg, T Berg. From large scale image categorization to entry-level categories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2768–2775
https://doi.org/10.1109/ICCV.2013.344
8 S Feng, S Ravi, R Kumar, P Kuznetsova, W Liu, A C Berg, T L Berg, Y Choi. Refer-to-as relations as semantic knowledge. In: Proceedings of International Conference on Automated Planning and Scheduling. 2015
9 A P Dawid, A M Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 1979, 28(1): 20–28
https://doi.org/10.2307/2346806
10 J Whitehill, T f Wu, J Bergsma, J R Movellan, P L Ruvolo. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2009, 2035–2043
11 M Salek, Y Bachrach, P Key. Hotspotting-a probabilistic graphical model for image object localization through crowdsourcing. In: Proceedings of International Conference on Automated Planning and Scheduling. 2013
12 Y Bachrach, T Minka, J Guiver, T Graepel. How to grade a test without knowing the answers—a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819–826
13 V C Raykar, S Yu, L H Zhao, G H Valadez, C Florin, L Bogoni, L Moy. Learning from crowds. Journal of Machine Learning Research, 2010, 11(43): 1297–1322
14 G Demartini, D E Difallah, P Cudré-Mauroux. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478
https://doi.org/10.1145/2187836.2187900
15 D Zhou, S Basu, Y Mao, J C Platt. Learning from the wisdom of crowds= by minimax entropy. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2195–2203
16 T Han, H Sun, Y Song, Y Fang, X Liu. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In: Proceedings of International Joint Conference on Artificial Intelligence. 2016, 1541–1547
17 L B Chilton, G Little, D Edge, D S Weld, J A Landay. Cascade: crowdsourcing taxonomy creation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems. 2013, 1999–2008
https://doi.org/10.1145/2470654.2466265
18 J Bragg, D S Weld. Crowdsourcing multi-label classification for taxonomy creation. In: Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing. 2013
19 Y Sun, A Singla, D Fox, A Krause. Building hierarchies of concepts via crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2015, 844–851
20 C Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998
https://doi.org/10.7551/mitpress/7287.001.0001
21 D B Lenat, R V Guha. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989
22 R Speer, C Havasi. Representing general relational knowledge in conceptnet 5. In: Proceedings of Language Resources and Evaluation Conference. 2012, 3679–3686
23 W Wu, H Li, H Wang, K Q Zhu. Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM Conference on Management of Data. 2012, 481–492
https://doi.org/10.1145/2213836.2213891
24 D Prelec, H S Seung, J McCoy. A solution to the single-question crowd wisdom problem. Nature, 2017, 541(7638): 532–535
https://doi.org/10.1038/nature21054
25 S K Divvala, A Farhadi, C Guestrin. Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3270–3277
https://doi.org/10.1109/CVPR.2014.412
26 V S Sheng, F Provost, P G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622
https://doi.org/10.1145/1401890.1401965
27 P G Ipeirotis, F Provost, J Wang. Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. 2010, 64–67
https://doi.org/10.1145/1837885.1837906
28 T Han, H Sun, Y Song, Z Wang, X Liu. Budgeted task scheduling for crowdsourced knowledge acquisition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, 1059–1068
https://doi.org/10.1145/3132847.3133002
29 C Callison-Burch. Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 286–295
https://doi.org/10.3115/1699510.1699548
30 C Hu, B B Bederson, P Resnik. Translation by iterative collaboration between monolingual users. In: Proceedings of Graphics Interface 2010. 2010, 39–46
https://doi.org/10.1145/1837885.1837902
31 V Ambati, S Vogel, J Carbonell. Active learning and crowd-sourcing for machine translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010
32 X L Dong, E Gabrilovich, G Heitz, W Horn, K Murphy, S Sun, W Zhang. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10): 881–892
https://doi.org/10.14778/2732951.2732962
33 F Ma, Y Li, Q Li, M Qiu, J Gao, S Zhi, L Su, B Zhao, H Ji, J Han. Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 745–754
https://doi.org/10.1145/2783258.2783314
34 Y Fang, H Sun, P Chen, J Huai. On the cost complexity of crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 1531–1537
https://doi.org/10.24963/ijcai.2018/212
35 M A Luengo-Oroz, A Arranz, J Frean. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. Journal of Medical Internet Research, 2012, 14(6): e167
https://doi.org/10.2196/jmir.2338
36 R E Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960, 82(1): 35–45
https://doi.org/10.1115/1.3662552
37 H Sun, K Hu, Y Fang, Y Song. Adaptive result inference for collecting quantitative data with crowdsourcing. IEEE Internet of Things Journal, 2017, 4(5): 1389–1398
https://doi.org/10.1109/JIOT.2017.2673958
38 P Dai, C H Lin, D S Weld. Pomdp-based control of workflows for crowdsourcing. Artificial Intelligence, 2013, 202: 52–85
https://doi.org/10.1016/j.artint.2013.06.002
39 P Dai, D S Weld. Artificial intelligence for artificial artificial intelligence. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011
40 Y Fang, H Sun, G Li, R Zhang, J Huai. Context-aware result inference in crowdsourcing. Information Sciences, 2018, 460: 346–363
https://doi.org/10.1016/j.ins.2018.05.050
41 N Otani, Y Baba, H Kashima. Quality control of crowdsourced classification using hierarchical class structures. Expert Systems with Applications, 2016, 58: 155–163
https://doi.org/10.1016/j.eswa.2016.04.009
42 J Deng, W Dong, R Socher, L J Li, K Li, L Fei-Fei. Imagenet: a largescale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
https://doi.org/10.1109/CVPR.2009.5206848
[1] Article highlights Download
[1] Gang WU, Zhiyong CHEN, Jia LIU, Donghong HAN, Baiyou QIAO. Task assignment for social-oriented crowdsourcing[J]. Front. Comput. Sci., 2021, 15(2): 152316-.
[2] Zhenghui HU, Wenjun WU, Jie LUO, Xin WANG, Boshu LI. Quality assessment in competition-based software crowdsourcing[J]. Front. Comput. Sci., 2020, 14(6): 146207-.
[3] Bo YUAN, Xiaolei ZHOU, Xiaoqiang TENG, Deke GUO. Enabling entity discovery in indoor commercial environments without pre-deployed infrastructure[J]. Front. Comput. Sci., 2019, 13(3): 618-636.
[4] Xiaolei ZHOU, Tao CHEN, Deke GUO, Xiaoqiang TENG, Bo YUAN. From one to crowd: a survey on crowdsourcing-based wireless indoor localization[J]. Front. Comput. Sci., 2018, 12(3): 423-450.
[5] Najam NAZAR,He JIANG,Guojun GAO,Tao ZHANG,Xiaochen LI,Zhilei REN. Source code fragment summarization with small-scale crowdsourcing based features[J]. Front. Comput. Sci., 2016, 10(3): 504-517.
[6] Xiaolan XU,Wenjun WU,Ya WANG,Yuchuan WU. Software crowdsourcing for developing Software-as-a-Service[J]. Front. Comput. Sci., 2015, 9(4): 554-565.
[7] Wenjun WU, Wei-Tek TSAI, Wei LI. An evaluation framework for software crowdsourcing[J]. Front Comput Sci, 2013, 7(5): 694-709.
[8] Xuesong FENG, Yoshitsugu HAYASHI, Hirokazu KATO, Junyi ZHANG, Akimasa FUJIWARA, . Improved feedback modeling of transport in enlarging urban areas of developing countries[J]. Front. Comput. Sci., 2010, 4(1): 112-122.
[9] Sikang HU, Yuanda CAO, . Knowledge fusion framework based on Web page texts[J]. Front. Comput. Sci., 2009, 3(4): 457-464.
[10] CHEN Lifei, JIANG Qingshan. An extended EM algorithm for subspace clustering[J]. Front. Comput. Sci., 2008, 2(1): 81-86.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed