Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (3) : 528-544    https://doi.org/10.1007/s11704-016-6023-3
RESEARCH ARTICLE
Effectiveness of exploring historical commits for developer recommendation: an empirical study
Xiaobing SUN1,2,5(), Hui YANG1, Hareton LEUNG3, Bin LI1, Hanchao (Jerry) LI4, Lingzhi LIAO6
1. School of Information Engineering, Yangzhou University, Yangzhou 225127, China
2. Information Technology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China
3. Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
4. Coventry University, Coventry CVI 5FB, UK
5. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
6. Nanjing University of Information Science and Technology, Nanjing 210044, China
 Download: PDF(1159 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Developer recommendation is an essential task for resolving incoming issues in the evolution of software. Many developer recommendation techniques have been developed in the literature; among these studies, most techniques usually combined historical commits as supplementary information with bug repositories and/or source-code repositories to recommend developers. However, the question of whether themessages in historical commits are always useful has not yet been answered. This article aims at solving this problem by conducting an empirical study on four open-source projects. The results show that: (1) the number of meaningfulwords of the commit description has an impact on the quality of the commit, and a larger number of meaningful words in the description means that it can generally better reflect developers’ expertise; (2) using commit description to recommend the relevant developers is better than that using relevant files that are recorded in historical commits; (3) developers tend to change the relevant files that they have changed many times before; (4) developers generally tend to change the files that they have changed recently.

Keywords developer recommendation      historical commits      empirical study     
Corresponding Author(s): Xiaobing SUN   
Just Accepted Date: 30 September 2016   Online First Date: 06 March 2018    Issue Date: 02 May 2018
 Cite this article:   
Xiaobing SUN,Hui YANG,Hareton LEUNG, et al. Effectiveness of exploring historical commits for developer recommendation: an empirical study[J]. Front. Comput. Sci., 2018, 12(3): 528-544.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-6023-3
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I3/528
1 Eyolfson J, Tan L, Lam P. Correlations between bugginess and timebased commit characteristics. Empirical Software Engineering, 2014, 19(4): 1009–1039
https://doi.org/10.1007/s10664-013-9245-0
2 Brindescu C, Codoban M, Shmarkatiuk S, Dig D. How do centralized and distributed version control systems impact software changes? In: Proceedings of the 36th International Conference on Software Engineering. 2014, 322–333
https://doi.org/10.1145/2568225.2568322
3 Sun X, Zhou T, Li G, Hu J, Yang H, Li B. An empirical study on real bugs for machine learning programs. In: Proceedings of the 24th Asia- Pacific Software Engineering Conference. 2017, 348–357
4 Fagerholm F, Guinea A S, Münch J, Borenstein J. The role of mentoring and project characteristics for onboarding in open source software projects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2014
https://doi.org/10.1145/2652524.2652540
5 Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering. 2006, 361–370
https://doi.org/10.1145/1134285.1134336
6 Zhou Z, Wang Y, Wu Q J, Yang C N, Sun X. Effective and efficient global context verification for image copy detection. IEEE Transactions on Information Forensics and Security, 2016, 12(1): 48–63
https://doi.org/10.1109/TIFS.2016.2601065
7 Sun X, Yang H, Xia X, Li B. Enhancing developer recommendation with supplementary information via mining historical commits. Journal of Systems and Software, 2017, 134: 355–368
https://doi.org/10.1016/j.jss.2017.09.021
8 Hossen K, Kagdi H H, Poshyvanyk D. Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 130–141
https://doi.org/10.1145/2597008.2597147
9 Shobe J F, Karim M Y, Zanjani M B, Kagdi H. On mapping releases to commits in open source systems. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 68–71
https://doi.org/10.1145/2597008.2597792
10 Zanjani M B, Swartzendruber G, Kagdi H. Impact analysis of change requests on source code based on interaction and commit histories. In: Proceedings of the 11th Working Conference on Mining Software Repositories. 2014, 162–171
https://doi.org/10.1145/2597073.2597096
11 Yang H, Sun X, Li B, Hu J. Recommending developers with supplementary information for issue request resolution. In: Proceedings of the 38th International Conference on Software Engineering. 2016, 707–709
https://doi.org/10.1145/2889160.2892644
12 Wang S, Lo D. Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 53–63
https://doi.org/10.1145/2597008.2597148
13 McDonald D W, Ackerman M S. Expertise recommender: a flexible recommendation system and architecture. In: Proceedings of ACM Conference on Computer Supported Cooperative Work. 2000, 231–240
https://doi.org/10.1145/358916.358994
14 Yang H, Sun X, Bin Li and Y D. DR_PSF: enhancing developer recommendation by leveraging personalized source-code files. In: Proceedings of the 40th IEEE Computer Society International Conference on Computers, Software and Applications. 2016, 239–244
15 Xia X, Lo D, Wang X, Zhou B. Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering. 2013, 72–81
https://doi.org/10.1109/WCRE.2013.6671282
16 Fu Z, Ren K, Shu J, Sun X, Huang F. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(9): 2546–2559
https://doi.org/10.1109/TPDS.2015.2506573
17 Kagdi H, Gethers M, Poshyvanyk D, Hammad M. Assigning change requests to software developers. Journal of Software: Evolution and Process, 2012, 24(1): 3–33
https://doi.org/10.1002/smr.530
18 Anvik J, Murphy G C. Reducing the effort of bug report triage: Recommenders for development-oriented decisions. ACM Transactions on Software Engineering and Methodology, 2011, 20(3): 10
https://doi.org/10.1145/2000791.2000794
19 Canfora G, Cerulo L. How software repositories can help in resolving a new change request. In: Proceedings of Workshop on Empirical Studies in Reverse Engineering. 2005
20 Bhattacharya P, Neamtiu I, Shelton C R. Automated, highly-accurate, bug assignment using machine learning and tossing graphs. Journal of Systems and Software, 2012, 85(10): 2275–2292
https://doi.org/10.1016/j.jss.2012.04.053
21 Ahsan S N, Ferzund J, Wotawa F. Automatic software bug triage system (BTS) based on latent semantic indexing and support vector machine. In: Proceedings of the 4th International Conference on Software Engineering Advances. 2009, 216–221
https://doi.org/10.1109/ICSEA.2009.92
22 Kagdi H H, Hammad M, Maletic J I. Who can help me with this source code change? In: Proceedings of the 24th IEEE International Conference on Software Maintenance. 2008, 157–166
https://doi.org/10.1109/ICSM.2008.4658064
23 Gu B, Sheng V S, Wang Z, Ho D, Osman S, Li S. Incremental learning for ν-support vector regression. Neural Networks, 2015, 67: 140–150
https://doi.org/10.1016/j.neunet.2015.03.013
24 Vásquez M L, Hossen K, Dang H, Kagdi H H, Gethers M, Poshyvanyk D. Triaging incoming change requests: bug or commit history, or code authorship? In: Proceedings of the 28th IEEE International Conference on Software Maintenance. 2012, 451–460
25 Gu B, Sheng V S, Tay K, Romano W, Li S. Incremental support vector learning for ordinal regression. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(7): 1403–1416
https://doi.org/10.1109/TNNLS.2014.2342533
26 Hu H, Zhang H, Xuan J, Sun W. Effective bug triage based on historical bug-fix information. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering. 2014, 122–132
https://doi.org/10.1109/ISSRE.2014.17
27 Shokripour R, Anvik J, Kasirun Z M, Zamani S. Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 2–11
https://doi.org/10.1109/MSR.2013.6623997
28 Xia Z, Wang X, Sun X, Wang B. Steganalysis of least significant bit matching using multi-order differences. Security and Communication Networks, 2014, 7(8): 1283–1291
https://doi.org/10.1002/sec.864
29 Kagdi H H, Poshyvanyk D. Who can help me with this change request? In: Proceedings of the 17th IEEE International Conference on Program Comprehension. 2009, 273–277
https://doi.org/10.1109/ICPC.2009.5090056
30 Shokripour R, Anvik J, Kasirun Z M, Zamani S. A time-based approach to automatic bug report assignment. Journal of Systems and Software, 2015, 102: 109–122
https://doi.org/10.1016/j.jss.2014.12.049
31 Ma T, Zhou J, Tang M, Tian Y, Al-Dhelaan A, Al-Rodhaan M, Lee S. Social network and tag sources based augmenting collaborative recommender system. IEICE Transactions on Information and Systems, 2015, 98(4): 902–910
https://doi.org/10.1587/transinf.2014EDP7283
32 Sun X, Peng X, Li B, Li B, Wen W. IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra. Frontiers of Computer Science, 2016, 10(5): 812–831
https://doi.org/10.1007/s11704-016-5226-y
33 Wang L, Sun X, Wang J, Duan Y, Li B. Construct bug knowledge graph for bug resolution: poster. In: Proceedings of the 39th International Conference on Software Engineering. 2017, 189–191
https://doi.org/10.1109/ICSE-C.2017.102
34 Sun X, Li B, Li Y, Chen Y. What information in software historical repositories do we need to support software maintenance tasks? an approach based on topic model. Computer and Information Science, 2015, 27–37
https://doi.org/10.1007/978-3-319-10509-3_3
35 Zhang Y, Sun X, Wang B. Efficient algorithm for k-barrier coverage based on integer linear programming. China Communications, 2016, 13(7): 16–23
https://doi.org/10.1109/CC.2016.7559071
36 Sun X, Liu X, Li B, Duan Y, Yang H, Hu J. Exploring topic models in software engineering data analysis: a survey. In: Proceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. 2016, 357–362
https://doi.org/10.1109/SNPD.2016.7515925
37 Sun X, Li B, Leung H K N, Li B, Li Y. MSR4SM: using topic models to effectively mining software repositories for software maintenance tasks. Information & Software Technology, 2015, 66: 1–12
https://doi.org/10.1016/j.infsof.2015.05.003
38 Xie S, Wang Y. Construction of tree network with limited delivery latency in homogeneous wireless sensor networks. Wireless Personal Communications, 2014, 78(1): 231–246
https://doi.org/10.1007/s11277-014-1748-5
39 Yang H, Sun X, Duan Y, Li B. On the effects of exploring historical commit messages for developer recommendation. Chinese Journal of Electronics, 2016, 25(4): 658–664
https://doi.org/10.1049/cje.2016.07.006
40 Fu Z, Wu X, Guan C, Sun X, Ren K. Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security, 2016, 11(12): 2706–2716
https://doi.org/10.1109/TIFS.2016.2596138
41 Fu Z, Sun X, Liu Q, Zhou L, Shu J. Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Transactions on Communications, 2015, E98.B(1): 190–200
42 McBurney P W, McMillan C. Automatic documentation generation via source code summarization of method context. In: Proceedings of the 22nd International Conference on Program Comprehension. 2014, 279–290
https://doi.org/10.1145/2597008.2597149
43 Hindle A, Germán D M, Holt R C. What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the International Working Conference on Mining Software Repositories. 2008, 99–108
https://doi.org/10.1145/1370750.1370773
44 Hattori L P, Lanza M. On the nature of commits. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2008, 63–71
https://doi.org/10.1109/ASEW.2008.4686322
45 Sun X, Geng Q, Lo D, Duan Y, Liu X, Li B. Code comment quality analysis and improvement recommendation: an automated approach. International Journal of Software Engineering and Knowledge Engineering, 2016, 26(6): 981–1000
https://doi.org/10.1142/S0218194016500339
46 Beyer D, Fararooy A. CheckDep: a tool for tracking software dependencies. In: Proceedings of the 18th International Conference on Program Comprehension. 42–43
https://doi.org/10.1109/ICPC.2010.51
47 Ye X, Bunescu R C, Liu C. Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2014, 689–699
https://doi.org/10.1145/2635868.2635874
48 Leacock C, Chodorow M. Combining local context and wordnet similarity for word sense identification. WordNet: An Electronic Lexical Database, 1998, 49(2): 265–283
49 Sun X, Liu X, Hu J, Zhu J. Empirical studies on the nlp techniques for source code data preprocessing. In: Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies. 2014, 32–39
https://doi.org/10.1145/2627508.2627514
50 Porter M F. An algorithm for suffix stripping. Program, 1980, 14(3): 130–137
https://doi.org/10.1108/eb046814
51 Conover W J. Practical nonparametric statistics. Technometrics, 1999
52 Grissom R J, Kim J J. Effect sizes for research: a broad practical approach. British Journal ofMathematical & Statistical Psychology, 2002
53 Bavota G, Vásquez M L, Bernal-Cárdenas C E, Penta M D, Oliveto R, Poshyvanyk D. The impact of API change- and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering, 2015, 41(4): 384–407
https://doi.org/10.1109/TSE.2014.2367027
54 Xia Z, Wang X, Sun X, Wang Q. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Transactions on Parallel & Distributed Systems, 2016, 27(2): 340–352
https://doi.org/10.1109/TPDS.2015.2401003
55 Madsen R E, Sigurdsson S, Hansen L K, Larsen J. Pruning the vocabulary for better context recognition. In: Proceedings of International Conference on Pattern Recognition. 2004, 483–488
56 Corazza A, Martino S D, Maggio V. Linsen: an efficient approach to split identifiers and expand abbreviations. In: Proceedings of the 28thIEEE International Conference on Software Maintenance. 2012, 233–242
https://doi.org/10.1109/ICSM.2012.6405277
57 Guerrouj L, Penta M D, Antoniol G, Guéhéneuc Y G. Tidier: an identifier splitting approach using speech recognition techniques. Software: Evolution and Process, 2013, 25(6): 575–599
https://doi.org/10.1002/smr.539
58 Xia Z, Wang X, Sun X, Liu Q, Xiong N. Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools & Applications, 2016, 75(4): 1947–1962
https://doi.org/10.1007/s11042-014-2381-8
[1] Deheng YANG, Yuhua QI, Xiaoguang MAO, Yan LEI. Evaluating the usage of fault localization in automated program repair: an empirical study[J]. Front. Comput. Sci., 2021, 15(1): 151202-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed