Automatic test report augmentation to assist crowdsourced testing

doi:10.1007/s11704-018-7308-5

Front. Comput. Sci.

2019, Vol. 13

Issue (5) : 943-959 https://doi.org/10.1007/s11704-018-7308-5

RESEARCH ARTICLE

Automatic test report augmentation to assist crowdsourced testing

Xin CHEN^1,², He JIANG^2,³(

), Zhenyu CHEN⁴, Tieke HE⁴, Liming NIE⁵

¹. School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
². School of Software, Dalian University of Technology, Dalian 116621, China
³. Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian 116621, China
⁴. School of Software, Nanjing University, Nanjing 210093, China
⁵. School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

Download: PDF(530 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

In crowdsourced mobile application testing, workers are often inexperienced in and unfamiliar with software testing. Meanwhile, workers edit test reports in descriptive natural language on mobile devices. Thus, these test reports generally lack important details and challenge developers in understanding the bugs. To improve the quality of inspected test reports, we issue a new problem of test report augmentation by leveraging the additional useful information contained in duplicate test reports. In this paper, we propose a new framework named test report augmentation framework (TRAF) towards resolving the problem. First, natural language processing (NLP) techniques are adopted to preprocess the crowdsourced test reports. Then, three strategies are proposed to augment the environments, inputs, and descriptions of the inspected test reports, respectively. Finally, we visualize the augmented test reports to help developers distinguish the added information. To evaluate TRAF, we conduct experiments over five industrial datasets with 757 crowdsourced test reports. Experimental results show that TRAF can recommend relevant inputs to augment the inspected test reports with 98.49% in terms of NDCG and 88.65% in terms of precision on average, and identify valuable sentences from the descriptions of duplicates to augment the inspected test reports with 83.58% in terms of precision, 77.76% in terms of recall, and 78.72% in terms of F-measure on average. Meanwhile, empirical evaluation also demonstrates that augmented test reports can help developers understand and fix bugs better.

Keywords crowdsourced testing test report TF-IDF natural language processing test report augmentation

Corresponding Author(s): He JIANG

Online First Date: 17 December 2018 Issue Date: 25 June 2019

Cite this article:

Xin CHEN,He JIANG,Zhenyu CHEN, et al. Automatic test report augmentation to assist crowdsourced testing[J]. Front. Comput. Sci., 2019, 13(5): 943-959.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7308-5
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I5/943

1	J Wang, S Wang, Q Cui, Q Wang. Local-based active classification of test report to assist crowdsourced testing. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 190–201 https://doi.org/10.1145/2970276.2970300
2	M Nebeling, M Speicher, M Grossniklaus, M C Norrie. Crowdsourced Web site evaluation with crowdstudy. In: Proceedings of International Conference on Web Engineering. 2012, 494–497 https://doi.org/10.1007/978-3-642-31753-8_52
3	Z Chen, B Luo. Quasi-crowdsourcing testing for educational projects. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 272–275 https://doi.org/10.1145/2591062.2591153
4	K Mao, L Capra, M Harman, Y Jia. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2017, 126: 57–84 https://doi.org/10.1016/j.jss.2016.09.015
5	Y Feng, Z Chen, J A Jones, C R Fang, B W Xu. Test report prioritization to assist crowdsourced testing. In: Proceedings of ACM SIGSOFT Symposium on the Foundation of Software Engineering/European Software Engineering Conference. 2015, 225–236 https://doi.org/10.1145/2786805.2786862
6	J Wang, Q Cui, Q Wang, S Wang. Towards effectively test report classification to assist crowdsourced testing. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2016, 6 https://doi.org/10.1145/2961111.2962584
7	Y Feng, J A Jones, Z Chen, C R Fang. Multi-objective test report prioritization using image understanding. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 202–213 https://doi.org/10.1145/2970276.2970367
8	N Bettenburg, R Premraj, T Zimmermann, S Kin. Duplicate bug reports considered harmful really? In: Proceedings of the IEEE International Conference on Software Maintenance. 2008, 337–345 https://doi.org/10.1109/ICSM.2008.4658082
9	N Bettenburg, S Just, A Schröter, C Weiss, R Premraj, T Zimmermann. What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2008, 308–318 https://doi.org/10.1145/1453101.1453146
10	T Zimmermann, R Premraj, N Bettenburg, S Just, A Schröter, C Weiss. What makes a good bug report? IEEE Transactions on Software Engineering, 2010, 36(5): 618–643 https://doi.org/10.1109/TSE.2010.63
11	P Runeson, M Alexandersson, O Nyholm. Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th IEEE International Conference on Software Engineering. 2007, 499–510 https://doi.org/10.1109/ICSE.2007.32
12	N Kaushik, L Tahvildari. A comparative study of the performance of IR models on duplicate bug detection. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering. 2012, 159–168 https://doi.org/10.1109/CSMR.2012.78
13	Y Tian, C Sun, D Lo. Improved duplicate bug report identification. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering. 2012, 385–390 https://doi.org/10.1109/CSMR.2012.48
14	K Aggarwal, F Timbers, T Rutgers, A Hindle, R Greiner, E Stroulia. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process, 2017, 29(3): e1821
15	F Thung, P S Kochhar, D Lo. DupFinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 2014, 871–874 https://doi.org/10.1145/2642937.2648627
16	K Liu, H B K Tan, H Zhang. Has this bug been reported? In: Proceedings of the 20th IEEE Working Conference on Reverse Engineering. 2013, 82–91 https://doi.org/10.1109/WCRE.2013.6671283
17	T Zhang, J Chen, H Jiang, X P Luo, X Xia. Bug report enrichment with application of automated fixer recommendation. In: Proceedings of the 25th IEEE International Conference on Program Comprehension. 2017, 230–240 https://doi.org/10.1109/ICPC.2017.28
18	H Jiang, X Chen, T K He, Z Y Chen, X C Li. Fuzzy clustering of crowdsourced test reports for apps. ACM Transactions on Internet Technology, 2018, 18(2): 18 https://doi.org/10.1145/3106164
19	X Chen, H Jiang, X C Li, T K He, Z Y Chen. Automated quality assessment for crowdsourced test reports of mobile applications. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2018, 368–379 https://doi.org/10.1109/SANER.2018.8330224
20	E Shutova, L Sun, A Korhonen. Metaphor identification using verb and noun clustering. In: Proceedings of the 23rd International Conference on Computational Linguistics. 2010, 1002–1010
21	X Wang, L Zhang, T Xie, J Anvik, J Sun. An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th ACM International Conference on Software Engineering. 2008, 461–470 https://doi.org/10.1145/1368088.1368151
22	J Cao, Z Wu, J Wu. Scaling up cosine interesting pattern discovery: a depth-first method. Information Sciences, 2014, 266: 31–46 https://doi.org/10.1016/j.ins.2013.12.062
23	G Salton, M McGill. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983
24	G Salton, C Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): 513–523 https://doi.org/10.1016/0306-4573(88)90021-0
25	D Inouye, J K Kalita. Comparing twitter summarization algorithms for multiple post summaries. In: Proceedings of the 3rd IEEE Inernational Conference on Social Computing and IEEE Inernational Conference on Privacy, Security, Risk and Trust. 2011, 298–306 https://doi.org/10.1109/PASSAT/SocialCom.2011.31
26	S Rastkar, G C Murphy, G Murray. Automatic summarization of bug reports. IEEE Transaction Software Engineering, 2014, 40(4): 366–380 https://doi.org/10.1109/TSE.2013.2297712
27	L Hiew. Assisted detection of duplicate bug reports. Doctor Dissertation, University of British Columbia, 2006
28	K Järvelin, J Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422–446 https://doi.org/10.1145/582415.582418
29	M Deshpande, G. KarypisItem-based top-n recommendation algorithms. ACM Transactions on Information Systems, 2004, 22(1): 143–177 https://doi.org/10.1145/963770.963776
30	Y Wang, L Wang, Y Li, D He, T Y Liu. A theoretical analysis of NDCG type ranking measures. In: Proceedings of Annual Conference on Learning Theory. 2013, 25–54
31	M T Nayeem, Y Chali. Paraphrastic fusion for abstractive multisentence compression generation. In: Proceedings of the 2007 ACM Conference on Information and Knowledge Management. 2017, 2223–2226
32	I Salman, A T Misirli, N Juristo. Are students representatives of professionals in software engineering experiments? In: Proceedings of the 37th International Conference on Software Engineering. 2015, 666–676 https://doi.org/10.1109/ICSE.2015.82
33	X Zhou, X Wan, J Xiao. Cminer: opinion extraction and summarization for Chinese microblogs. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7): 1650–1663 https://doi.org/10.1109/TKDE.2016.2541148
34	J Howe. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
35	D Liu, R G Bias, M Lease, R G Bias. Crowdsourcing for usability testing. Proceedings of the American Society for Information Science and Technology, 2012, 49(1): 1–10 https://doi.org/10.1002/meet.14504901100
36	E Dolstra, R Vliegendhart, J Pouwelse. Crowdsourcing gui tests. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 332–341 https://doi.org/10.1109/ICST.2013.44
37	F Pastore, L Mariani, G Fraser. Crowdoracles: can the crowd solve the oracle problem. In: Proceedings of International Conference on Software Testing, Verification and Validation. 2013, 342–351 https://doi.org/10.1109/ICST.2013.13
38	M Yan, H Sun, X Liu. iTest: testing software with mobile crowdsourcing. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies. 2014, 19–24 https://doi.org/10.1145/2666539.2666569
39	G Wu, Y Cao, W Chen, J Wei, H Zhong, T Huang. AppCheck: a crowdsourced testing service for android applications. In: Proceedings of IEEE International Conference on Web Services. 2017: 253–260 https://doi.org/10.1109/ICWS.2017.40
40	Y Cai, J Zhang, L Cao, J Liu. A deployable sampling strategy for data race detection. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 810–821 https://doi.org/10.1145/2950290.2950310
41	L Wei, Y Liu, S C Cheung. OASIS: prioritizing static analysis warnings for Android apps based on app user reviews. In: Proceedings of the 11st ACM Joint Meeting on Foundations of Software Engineering. 2017, 672–682 https://doi.org/10.1145/3106237.3106294
42	J Wang, Q Cui, S Wang, Q Wang. Domain adaptation for test report classification in crowdsourced testing. In: Proceedings of the 39th IEEE International Conference on Software Engineering: Software Engineering in Practice Track. 2017, 83–92 https://doi.org/10.1109/ICSE-SEIP.2017.8
43	S Guo, R Chen, H Li. Using knowledge transfer and rough set to predict the severity of android test reports via text mining. Symmetry, 2017, 9(8): 161 https://doi.org/10.3390/sym9080161
44	N Nazar, H Jiang, G Gao, T Zhang, X C Li, Z L Ren. Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10(3): 504–517 https://doi.org/10.1007/s11704-015-4409-2
45	C Sun, D Lo, S C Khoo, J Jiang. Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. 2011, 253–262 https://doi.org/10.1109/ASE.2011.6100061
46	C Sun, D Lo, X Wang, J Jing, S C Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 45–54 https://doi.org/10.1145/1806799.1806811
47	A T Nguyen, T T Nguyen, T N Nguyen, D Lo, C Sun. Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 70–79 https://doi.org/10.1145/2351676.2351687

[1]

Download

[1]	Lydia LAZIB, Bing QIN, Yanyan ZHAO, Weinan ZHANG, Ting LIU. A syntactic path-based hybrid neural network for negation scope detection[J]. Front. Comput. Sci., 2020, 14(1): 84-94.
[2]	Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
[3]	Zhongqing WANG, Shoushan LI, Guodong ZHOU. Personal summarization from profile networks[J]. Front. Comput. Sci., 2017, 11(6): 1085-1097.
[4]	Yang-Yen OU, Ta-Wen KUAN, Anand PAUL, Jhing-Fa WANG, An-Chao TSAI. Spoken dialog summarization system with HAPPINESS/SUFFERING factor recognition[J]. Front. Comput. Sci., 2017, 11(3): 429-443.

Viewed

Full text

Abstract

Cited

Shared

Discussed