Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (3) : 565-578    https://doi.org/10.1007/s11704-018-7457-6
RESEARCH ARTICLE
CodeAttention: translating source code to comments by exploiting the code constructs
Wenhao ZHENG, Hongyu ZHOU, Ming LI(), Jianxin WU
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
 Download: PDF(522 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Appropriate comments of code snippets provide insight for code functionality, which are helpful for program comprehension. However, due to the great cost of authoring with the comments, many code projects do not contain adequate comments. Automatic comment generation techniques have been proposed to generate comments from pieces of code in order to alleviate the human efforts in annotating the code.Most existing approaches attempt to exploit certain correlations (usually manually given) between code and generated comments, which could be easily violated if coding patterns change and hence the performance of comment generation declines. In addition, recent approaches ignore exploiting the code constructs and leveraging the code snippets like plain text. Furthermore, previous datasets are also too small to validate the methods and show their advantage. In this paper, we propose a new attention mechanism called CodeAttention to translate code to comments, which is able to utilize the code constructs, such as critical statements, symbols and keywords. By focusing on these specific points, CodeAttention could understand the semantic meanings of code better than previous methods. To verify our approach in wider coding patterns, we build a large dataset from open projects in GitHub. Experimental results in this large dataset demonstrate that the proposed method has better performance over existing approaches in both objective and subjective evaluation. We also perform ablation studies to determine effects of different parts in CodeAttention.

Keywords software mining      machine learning      code comment generation      recurrent neural network      attention mechanism     
Corresponding Author(s): Ming LI   
Just Accepted Date: 21 June 2018   Online First Date: 22 October 2018    Issue Date: 24 April 2019
 Cite this article:   
Wenhao ZHENG,Hongyu ZHOU,Ming LI, et al. CodeAttention: translating source code to comments by exploiting the code constructs[J]. Front. Comput. Sci., 2019, 13(3): 565-578.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7457-6
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I3/565
1 BFluri, MWursch, H CGall. Do code and comments co-evolve? on the relation between source code and comment changes. In: Preceedings of the 14th Working Conference on Reverse Engineering. 2007, 70–79
2 GSridhara, EHill, DMuppaneni, L Pollock, S KVijay. Towards automatically generating summary comments for java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 2010, 43–52
https://doi.org/10.1145/1858996.1859006
3 SRastkar, G CMurphy, GMurray. Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 505–514
https://doi.org/10.1145/1806799.1806872
4 P WMcBurney, C McMillan. Automatic documentation generation via source code summarization of method context. In: Preceedings of the 22nd International Conference on Program Comprehension. 2014, 279–290
https://doi.org/10.1145/2597008.2597149
5 MSulír, J Porubän. Generating method documentation using concrete values from executions. OASIcs-OpenAccess Series in Informatics, 2017, 56(3): 1–13
6 ISrinivasan, K Ioannis, CAlvin, ZLuke. Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 2073–2083
7 MAllamanis, HPeng, CSutton. A convolutional attention network for extreme summarization of source code. In: Proceedings of the 23rd International Conference on Machine Learning. 2016, 2091–2100
8 XHuo, MLi, Z HZhou. Learning unified features from natural and programming languages for locating buggy source codes. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1606–1612
9 GSridhara, L Pollock, y S KVija. Automatically detecting and describing high level actions within methods. In: Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering. 2011, 101–110
https://doi.org/10.1145/1985793.1985808
10 A DMovshovitz, A Y Movshovitz, PSteenkiste, CFaloutsos. Analysis of the reputation system and user contributions on a question answering website: stackoverflow. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013, 886–893
https://doi.org/10.1145/2492517.2500242
11 SHaiduc, JAponte, LMoreno, A Marcus. On the use of automated text summarization techniques for summarizing source code. In: Preceedings of the 17th Working Conference on Reverse Engineering. 2010, 35–44
https://doi.org/10.1109/WCRE.2010.13
12 B PEddy, J A Robinson, N AKraft, J CCarver. Evaluating source code summarization techniques: replication and expansion. In: Preceedings of the 21st International Conference on Program Comprehension. 2013, 13–22
https://doi.org/10.1109/ICPC.2013.6613829
13 PRodeghero, C McMillan, P WMcBurney, NBosch, SDMello. Improving automated source code summarization via an eye-tracking study of programmers. In: Proceedings of the 36th ACM/IEEE International Conference on Software Engineering. 2014, 390–401
14 RDyer, H ANguyen, HRajan, T N Nguyen. Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 35th International Conference on Software Engineering. 2013, 422–431
https://doi.org/10.1109/ICSE.2013.6606588
15 EWong, JYang, LTan. Autocomment: mining question and answer sites for automatic comment generation. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. 2013, 562–567
https://doi.org/10.1109/ASE.2013.6693113
16 EWong, TLiu, LTan. CloCom: mining existing source code for automatic comment generation. In: Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2015, 380–389
https://doi.org/10.1109/SANER.2015.7081848
17 E BPeter, A D P Stephen, J D PVincent, L MRobert. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 1993, 19(2): 263–311
18 PKoehn, F JOch, DMarcu. Statistical phrase-based translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003, 48–54
https://doi.org/10.21236/ADA461156
19 GHinton, LDeng, DYu, GDahl, A RMohamed, N Jaitly, ASenior, VVanhoucke, PNguyen, TSainath, B Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 2012, 29(6): 82–97
https://doi.org/10.1109/MSP.2012.2205597
20 AKrizhevsky, I Sutskever, GHinton. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing System, 2012, 1106–1114
21 SIlya, VOriol, V LQuoc. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014, 3104–3112
22 X YYin, J Goudriaan, E ALantinga, JVos, H J Spiertz. A flexible sigmoid function of determinate growth. Annals of Botany, 2003, 91(3): 361–371
https://doi.org/10.1093/aob/mcg029
23 SHochreiter, J Schmidhuber. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
https://doi.org/10.1162/neco.1997.9.8.1735
24 KCho, B V Merrienboer, DBahdanau, YBengio. On the properties of neural machine translation: encoder-decoder approaches. 2014, arXiv preprint arXiv:1409.1259
25 JChung, C Gulcehre, KCho, YBengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014, arXiv preprint arXiv:1412.3555
26 KCho, D Bahdanau, FBougares, HSchwenk, YBengio. Learning phrase representations using RNN Encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1724–1734
https://doi.org/10.3115/v1/D14-1179
27 YOda, HFudaba, GNeubig, H Hata, SSakti, TToda, S Nakamura. Learning to generate pseudo-code from source code using statistical machine translation. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 574–584
https://doi.org/10.1109/ASE.2015.36
28 INeamtiu, J SFoster, MHicks. Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes, 2005, 30(4): 1–5
https://doi.org/10.1145/1082983.1083143
29 DBahdanau, KCho, YBengio. Neural machine translation by jointly learning to align and translate. 2014, arXiv preprint arXiv:1409.0473
30 PKoehn, HHoang, ABirch, B C Callison, MFederico, NBertoldi, BCowan, WShen, C Moran, RZens. Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007, 177–180
https://doi.org/10.3115/1557769.1557821
31 KHeafield. Ken L M: faster and smaller language model queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation. 2011, 187–197
32 OVinyals, LKaiser, TKoo, S Petrov, ISutskever, GHinton. Grammar as a foreign language. Advances in Neural Information Processing Systems, 2015, 2773–2781
33 AVaswani, N Shazeer, NParmar, JUszkoreit, LJones, AGomez, L Kaiser, IPolosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 6000–6010
34 KPapineni, SRoukos, TWard, W J Zhu. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002, 311–318
35 SBanerjee, ALavie. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the Association for Computational Linguistics Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005, 65–72
36 MDenkowski, ALavie. Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the 9th Workshop on Statistical Machine Translation. 2014, 376–380
https://doi.org/10.3115/v1/W14-3348
37 AStent, MMarge, MSinghai. Evaluating evaluation methods for generation in the presence of variation. In: Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics. 2005, 341–351
https://doi.org/10.1007/978-3-540-30586-6_38
[1] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[2] Yan-Ping SUN, Min-Ling ZHANG. Compositional metric learning for multi-label classification[J]. Front. Comput. Sci., 2021, 15(5): 155320-.
[3] Jian SUN, Pu-Feng DU. Predicting protein subchloroplast locations: the 10th anniversary[J]. Front. Comput. Sci., 2021, 15(2): 152901-.
[4] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[5] Lydia LAZIB, Bing QIN, Yanyan ZHAO, Weinan ZHANG, Ting LIU. A syntactic path-based hybrid neural network for negation scope detection[J]. Front. Comput. Sci., 2020, 14(1): 84-94.
[6] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[7] Yu-Feng LI, De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
[8] Hao SHAO. Query by diverse committee in transfer active learning[J]. Front. Comput. Sci., 2019, 13(2): 280-291.
[9] Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
[10] Ruochen HUANG, Xin WEI, Liang ZHOU, Chaoping LV, Hao MENG, Jiefeng JIN. A survey of data-driven approach on multimedia QoE evaluation[J]. Front. Comput. Sci., 2018, 12(6): 1060-1075.
[11] Qiang LV, Yixin CHEN, Zhaorong LI, Zhicheng CUI, Ling CHEN, Xing ZHANG, Haihua SHEN. Achieving data-driven actionability by combining learning and planning[J]. Front. Comput. Sci., 2018, 12(5): 939-949.
[12] Ashish Kumar DWIVEDI, Anand TIRKEY, Santanu Kumar RATH. Software design pattern mining using classification-based techniques[J]. Front. Comput. Sci., 2018, 12(5): 908-922.
[13] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[14] Min-Ling ZHANG, Yu-Kun LI, Xu-Ying LIU, Xin GENG. Binary relevance for multi-label learning: an overview[J]. Front. Comput. Sci., 2018, 12(2): 191-202.
[15] Zhongqing WANG, Shoushan LI, Guodong ZHOU. Personal summarization from profile networks[J]. Front. Comput. Sci., 2017, 11(6): 1085-1097.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed