Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (3) : 504-517    https://doi.org/10.1007/s11704-015-4409-2
RESEARCH ARTICLE
Source code fragment summarization with small-scale crowdsourcing based features
Najam NAZAR1,He JIANG1,2,*(),Guojun GAO1,Tao ZHANG3,Xiaochen LI1,Zhilei REN1
1. Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian 116621, China
2. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China
3. Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
 Download: PDF(724 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasibility of generating code fragment summaries by using supervised learning algorithms. We hire a crowd of ten individuals from the same work place to extract source code features on a corpus of 127 code fragments retrieved from Eclipse and Net-Beans Official frequently asked questions (FAQs). Human annotators suggest summary lines. Our machine learning algorithms produce better results with the precision of 82% and perform statistically better than existing code fragment classifiers. Evaluation of algorithms on several statistical measures endorses our result. This result is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.

Keywords summarizing code fragments      supervised learning      crowdsourcing     
Corresponding Author(s): He JIANG   
Just Accepted Date: 28 October 2015   Online First Date: 06 April 2016    Issue Date: 16 May 2016
 Cite this article:   
Najam NAZAR,He JIANG,Guojun GAO, et al. Source code fragment summarization with small-scale crowdsourcing based features[J]. Front. Comput. Sci., 2016, 10(3): 504-517.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-015-4409-2
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I3/504
1 Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering. 2010, 35–44
2 Cutrell E, Guan Z W. What are you looking for?: an eye-tracking study of information usage in Web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2007, 407–416
3 Ying A T T, Robillard M P. Code fragment summarization. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 2013, 655–658
4 Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 223–226
5 Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: replication and expansion. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 13–22
6 Moreno L, Aponte J. On the analysis of human and automatic summaries of source code. CLEI Electronic Journal, 2012, 15(2): 2
7 Rastkar S, Murphy G C, Bradley A W J. Generating natural language summaries for crosscutting source code concerns. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 103–112
8 Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, Vijay-Shanker K. Automatic generation of natural language summaries for Java classes. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 23–32
9 Moreno L, Marcus A, Pollock L, Vijay-Shanker K. JSummarizer: an automatic generator of natural language summaries for Java classes. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 230–232
10 Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K. Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 2010, 43–52
11 Jiang H, Xuan J F, Ren Z L, Wu Y X, Wu X D. Misleading classification. Science China Information Sciences, 2014, 57(5): 1–17
12 Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 505–514
13 Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transactions on Software Engineering, 2014, 40(4): 366–380
14 Mani S, Catherine R, Sinha V S, Dubey A. Ausum: approach for unsupervised bug report summarization. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering. 2012, 1–11
15 Radev D R, Jing H Y, Stýs M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919–938
16 Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336
17 Zhu X J, Goldberg A B, Gael J V,Andrzejewski D. Improving diversity in ranking using absorbing random walks. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2007, 97–104
18 Mei Q Z, Guo J, Radev D. Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 1009–1018
19 Lotufo R, Malik Z, Czarnecki K. Modelling the ‘Hurried’ bug report reading process to summarize bug reports. In: Proceedings of the 28th IEEE International Conference on Software Maintenance. 2012, 430–439
20 Xuan J F, Jiang H, Hu Y, Ren Z L, Zou W Q, Luo Z X, Wu X D. Towards effective bug triage with software data reduction techniques. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(1): 264–280
21 Xuan J F, Jiang H, Ren Z L, Luo Z X. Solving the large scale next release problem with a backbone-based multilevel algorithm. IEEE Transactions on Software Engineering, 2012, 38(5): 1195–1212
22 Lloret E, Plaza L, Aker A. Analyzing the capabilities of crowdsourcing services for text summarization. Language Resources and Evaluation, 2013, 47(2): 337–369
23 Hong S G, Shin S, Yi M Y. Contextual keyword extraction by building sentences with crowdsourcing. Multimedia Tools Applications, 2014, 68(2): 401–412
24 Mizuyama H, Yamashita K, Hitomi K, Anse M. A prototype crowdsourcing approach for document summarization service. Sustainable Production and Service Supply Chains. 2013, 415: 435–442
25 Carletta J. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 1996, 22(2): 249–254
26 Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20(1): 37
27 Zhao Y X, Zhu Q H. Evaluation on crowdsourcing research: current status and future direction. Information Systems Frontiers, 2014, 16(3): 417–434
28 Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
29 Greengard S. Following the crowd. Communications of the ACM, 2011, 54(2): 20–22
30 Riedl C, Blohm I, Leimeister J M, Krcmar H. Rating scales for collective intelligence in innovation communities: why quick and easy decision making does not get it right. In: Proceedings of the International Conference on Information Systems. 2010, 52
31 Whitla P. Crowdsourcing and its application in marketing activities. Contemporary Management Research, 2009, 5(1): 15–28
32 Hsueh P Y, Melville P, Sindhwani V. Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 workshop on active learning for natural language processing. 2009, 27–35
33 Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad H R, Bertino E, Dustdar S. Quality control in crowdsourcing systems: issues and directions. IEEE Internet Computing, 2013, 17(2): 76–81
34 Lofi C, Selke J, Balke W T. Information extraction meets crowdsourcing: a promising couple. Datenbank-Spektrum, 2012, 12(2): 109–120
35 Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
36 Fawcett T. Roc graphs: notes and practical considerations for researchers. Machine Learning, 2004, 31: 1–38
37 Hassan S, Rafi M, Shaikh M S. Comparing SVM and naive bayes classifiers for text categorization with wikitology as knowledge enrichment. In: Proceedings of 2011 IEEE 14th International Multitopic Conference. 2011, 31–34
38 Jaakkola T, Diekhans M, Haussler D. Using the fisher kernel method to detect remote protein homologies. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 1999, 149–158
39 Chen Y W, Lin C J. Combining SVMs with various feature selection strategies. Studies in Fuzziness and Soft Computing, 2006, 207: 315–324
[1]  Supplementary Material Download
[1] Tao HAN, Hailong SUN, Yangqiu SONG, Yili FANG, Xudong LIU. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing[J]. Front. Comput. Sci., 2021, 15(4): 154315-.
[2] Gang WU, Zhiyong CHEN, Jia LIU, Donghong HAN, Baiyou QIAO. Task assignment for social-oriented crowdsourcing[J]. Front. Comput. Sci., 2021, 15(2): 152316-.
[3] Zhenghui HU, Wenjun WU, Jie LUO, Xin WANG, Boshu LI. Quality assessment in competition-based software crowdsourcing[J]. Front. Comput. Sci., 2020, 14(6): 146207-.
[4] Yu-Feng LI, De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
[5] Bo YUAN, Xiaolei ZHOU, Xiaoqiang TENG, Deke GUO. Enabling entity discovery in indoor commercial environments without pre-deployed infrastructure[J]. Front. Comput. Sci., 2019, 13(3): 618-636.
[6] Xiangyu GUO, Wei WANG. Towards making co-training suffer less from insufficient views[J]. Front. Comput. Sci., 2019, 13(1): 99-105.
[7] Hai WANG, Shao-Bo WANG, Yu-Feng LI. Instance selection method for improving graph-based semi-supervised learning[J]. Front. Comput. Sci., 2018, 12(4): 725-735.
[8] Xiaolei ZHOU, Tao CHEN, Deke GUO, Xiaoqiang TENG, Bo YUAN. From one to crowd: a survey on crowdsourcing-based wireless indoor localization[J]. Front. Comput. Sci., 2018, 12(3): 423-450.
[9] Kang LI,Fazhi HE,Xiao CHEN. Real-time object tracking via compressive feature selection[J]. Front. Comput. Sci., 2016, 10(4): 689-701.
[10] Xiaolan XU,Wenjun WU,Ya WANG,Yuchuan WU. Software crowdsourcing for developing Software-as-a-Service[J]. Front. Comput. Sci., 2015, 9(4): 554-565.
[11] Xianfa CAI,Guihua WEN,Jia WEI,Zhiwen YU. Relative manifold based semi-supervised dimensionality reduction[J]. Front. Comput. Sci., 2014, 8(6): 923-932.
[12] Wenjun WU, Wei-Tek TSAI, Wei LI. An evaluation framework for software crowdsourcing[J]. Front Comput Sci, 2013, 7(5): 694-709.
[13] Ali DAUD, Juanzi LI, Lizhu ZHOU, Faqir MUHAMMAD. Knowledge discovery through directed probabilistic topic models: a survey[J]. Front Comput Sci Chin, 2010, 4(2): 280-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed