Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2022, Vol. 16 Issue (2): 162202   https://doi.org/10.1007/S11704-020-0047-4
  本期目录
ForkXplorer: an approach of fork summary generation
Zhang ZHANG, Xinjun MAO(), Chao ZHANG, Yao LU
Key Laboratory of Software Engineering for Complex Systems, College of Computer, National University of Defense Technology, Changsha 410073, China
 全文: PDF(11500 KB)   HTML
Abstract

Pull-based development has become an important paradigm for distributed software development. In this model, each developer independently works on a copied repository (i.e., a fork) from the central repository. It is essential for developers to maintain awareness of the state of other forks to improve collaboration efficiency. In this paper, we propose a method to automatically generate a summary of a fork. We first use the random forest method to generate the label of a fork, i.e., feature implementation or a bug fix. Based on the information of the fork-related commits, we then use the TextRank algorithm to generate detailed activity information of the fork. Finally, we apply a set of rules to integrate all related information to construct a complete fork summary. To validate the effectiveness of our method, we conduct 30 groups of manual experiment and 77 groups of case studies on Github. We propose Feaavg to evaluate the performance of the generated fork summary, considering the content accuracy, content integrity, sentence fluency, and label extraction accuracy. The results show that the average of Feaavg of the fork summary generated by this method is 0.672. More than 63% of project maintainers and the contributors believe that the fork summary can improve development efficiency.

Key wordsopen source software    pull-based development    fork summary    distributed cooperative development
收稿日期: 2020-02-02      出版日期: 2021-10-18
Corresponding Author(s): Xinjun MAO   
 引用本文:   
. [J]. Frontiers of Computer Science, 2022, 16(2): 162202.
Zhang ZHANG, Xinjun MAO, Chao ZHANG, Yao LU. ForkXplorer: an approach of fork summary generation. Front. Comput. Sci., 2022, 16(2): 162202.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/S11704-020-0047-4
https://academic.hep.com.cn/fcs/CN/Y2022/V16/I2/162202
Fig.1  
Fig.2  
Fig.3  
Fig.4  
Fig.5  
Fig.6  
Fig.7  
Fig.8  
Fig.9  
Fig.10  
Project Issues PRs Issues/PRs with feature label Issues/PRs with bug label
safe-ios 326 605 52 42
opencast 0 998 111 195
ezplatform-admin-ui 0 1027 0 302
Baystation12 1800 3021 781 0
Yogstation-TG 711 6101 920 288
Tab.1  
group label types projects issues and PRs
A feature, bug 18 12187
B feature 5 3190
C bug 7 4623
Tab.2  
Model Precision Recall F1-score
MultinomialNB 0.718 0.659 0.687
Logistic regression 0.724 0.703 0.713
Random forest 0.774 0.720 0.746
Tab.3  
Group Label Precision Recall F1-score Support cases
A contribution 0.59 0.79 0.67 896
feature 0.59 0.79 0.67 896
bug 0.64 0.67 0.72 400
B+C contribution 0.35 0.37 0.36 724
feature 0.73 0.79 0.63 441
bug 0.71 0.69 0.75 352
Tab.4  
Fig.11  
ID Fea avg feature bug contribution
26 0.808 4 0 0
27 0.833 0 6 0
28 0.858 0 2 0
29 0.883 11 0 0
30 0.908 6 0 0
Tab.5  
ID Fea avg feature bug contribution
1 0.383 7 5 8
2 0.383 5 1 16
3 0.433 2 5 4
4 0.433 1 6 7
5 0.533 2 5 10
Tab.6  
Fig.12  
project role commits result a
devilutionX maintainer #bf4a8c-274abd 1
Xamarin.Forms maintainer #e095e2-3c12f9 1
EntityFrame workCore developer #a6fe1c-525a15 3
lazydocker developer #f0650a-dba014 3
scrcpy maintainer #10069c-6901e9 1
scrcpy developer #772a49-6dc0fd 1
DeepFaceLab developer #bf78d3-29daad 1
DeepFaceLab developer #338e13-73f78b 3
semaphore maintainer #a65eb2-8f8fad 1
semaphore developer #ecde61-b5ef1e 3
semaphore developer #769d57-7b15ce 2
SyliusWish listPlugin maintainer #9085ea-8a5d4f 3
SyliusWish listPlugin developer #f90ff0-9284d2 2
a10-ansible developer #ecd486-be997b 1
sherlock maintainer #c4c4a5-054a24 1
ansibullbot maintainer #16009c-ef4982 2
silverstripe -installer maintainer #9085eaf-8a5d4 3
Tab.7  
Commit Message Summary
#769d57 fix dumb error commit #769d57c to #7b15ce of fork semaphore-installer /Tom Whiston contain 3 bugs of [fix dumb error, fix, fix casting issue] and 2 contributions of [use middleware creator for all project api endpoints,more]
#80ad34 fix
#1aa834 use middleware creator for all project api endpoints
#44e1f37 more
#7b15ce fix casting issue
#f90ff0 Fixed coding standards commit #f90ff0 to #9284d2 of fork SyliusWishlistPlugin/ mamazu contain 1 feature of [Added coding standards], 3 bugs of [Fixed coding standards, Fixed coding standards path,Fixed coding standards path] and 1 contribution of [Yaml is stupid]
#0e3a07 Added coding standards
#662d3b Fixed coding standards path
#6856ea Fixed coding standards path
#9284d2 Yaml is stupid
#16009c Fix main.php path commit #16009c to #ef4982 of fork silverstripe-installer/ Ingo Schommer contain 2 features of [Removed stylesheet from frameworkmissing file, Adjust phpunit path to framework] and 3 bugs of [Fix main.php path,Fix main.php path,Fix main.php path in install.php]
#74f717 Fix main.php path
#f73996 Removed stylesheet from framework missing file
#8a507e Fix main.php path in install.php
#ef4982 Adjust phpunit path to framework
Tab.8  
Fig.13  
  
  
  
  
1 Gousios G, Storey M A, Bacchelli A. Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of IEEE/ACM International Conference on Software Engineering. 2016, 285-296
2 Y Lu , X Mao , T Wang , G Yin , Z Li . Improving students’ programming quality with the continuous inspection process: a social coding perspective. Frontiers of Computer Science, 2020, 14( 5): 1– 18
3 J Jiang , D Lo , J He , X Xia , P S Kochhar , L Zhang . Why and how developers fork what from whom in GitHub. Empirical Software Engineering, 2017, 22( 1): 547– 578
https://doi.org/10.1007/s10664-016-9436-6
4 Bitzer J, Schröder P J H. The Economics of open source software development. 1st ed. Kidlington: Elsevier, 2006
5 Abdullah R, Lakulu M, Ibrahim H, Selamat M H, Nor M Z M. The challenges of open source software development with collaborative environment. In: Proceedings of IEEE International Conference on Computer Technology and Development. 2009, 251-255
6 Padhye R, Mani S, Sinha V S. A study of external community contribution to open-source projects on GitHub. In: Proceedings of the Working Conference on Mining Software Repositories. 2014, 332-335
7 Ren L, Zhou S, Kästner C, Wąsowski A. Identifying redundancies in fork-based development. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2019, 230-241
8 Stănciulescu Ş, Schulze S, Wąsowski A. Forked and integrated variants in an open-source firmware project. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2015, 151-160
9 Ren L, Zhou S, Kästner C. Poster: Forks insight: providing an overview of GitHub forks. In: Proceedings of ACM/IEEE International Conference on Software Engineering. 2018, 179-180
10 Zhou S, Stanciulescu S, Leßenich O, Xiong Y, Wasowski A, Kästner C. Identifying features in forks. In: Proceedings of ACM/IEEE International Conference on Software Engineering. 2018, 105–116
11 Yu Y, Li Z, Yin G, Wang T, Wang H M. A dataset of duplicate pullrequests in Github. In: Proceedings of International Conference on Mining Software Repositories. 2018, 22-25
12 Zhu J, Zhou M, Mockus A. Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 871-882
13 Li L, Ren Z, Li X, Zou W, Jiang H. How are issue units linked? Empirical study on the linking behavior in GitHub. In: Proceedings of IEEE Asia-Pacific Software Engineering Conference. 2018, 386-395
14 Li Z, Yin G, Yu Y, Wang T, Wang H. Detecting duplicate pull-requests in github. In: Proceedings of Asia-Pacific Symposium on Internetware. 2017, 1-6
15 H Ruan , B Chen , X Peng , W Zhao . DeepLink: Recovering issuecommit links based on deep learning. Journal of Systems and Software, 2019, 158 : 110406–
https://doi.org/10.1016/j.jss.2019.110406
16 Sun Y, Chen C, Wang Q, Boehm, B. Improving missing issue-commit link recovery using positive and unlabeled data. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2017, 147-152
17 G Salton , A Wong , C S Yang . A vector space model for automatic indexing. Communications of the ACM, 1975, 18( 11): 613– 620
https://doi.org/10.1145/361219.361220
18 G Salton , C Buckley . Term-weighting approaches in automatic text retrieval. Information processing & management, 1988, 24( 5): 513– 523
19 James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. 1st ed. New York: Springer, 2013
20 Z Liu , X Chen , M Sun . Mining the interests of Chinese microbloggers via keyword extraction. Frontiers of Computer Science, 2012, 6( 1): 76– 87
21 Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. 2004, 404-411
22 M Gambhir , V Gupta . Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 2017, 47( 1): 1– 66
https://doi.org/10.1007/s10462-016-9475-9
23 L Nyman , T Mikkonen . To fork or not to fork: Fork motivations in SourceForge projects. International Journal of Open Source Software and Processes, 2011, 3( 3): 1– 9
https://doi.org/10.4018/jossp.2011070101
24 Robles G, González-Barahona J M. A comprehensive study of software forks: dates, reasons and outcomes. In: Proceedings of IFIP International Conference on Open Source Systems. 2012, 1-14
25 Stănciulescu Ş, Schulze S, Wąsowski A. Forked and integrated variants in an open-source firmware project. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2015, 151-160
26 Gousios G, Pinzger M, Deursen A. An exploratory study of the pullbased software development model. In: Proceedings of International Conference on Software Engineering. 2014, 345-355
27 Dabbish L, Stuart C, Tsay J, Herbsleb J. Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of ACM Conference on Computer Supported Cooperative Work. 2012, 1277-1286
28 L Dabbish , C Stuart , J Tsay , J Herbsleb . Leveraging transparency. IEEE Software, 2012, 30( 1): 37– 43
29 A Kuhn , S Ducasse , T Gírba . Semantic clustering: Identifying topics in source code. Information and Software Technology, 2007, 49( 3): 230– 243
https://doi.org/10.1016/j.infsof.2006.10.017
30 Murphy G C. Lightweight structural summarization as an aid to software evolution. Seattle: University of Washington, 1996
31 Poshyvanyk D, Marcus A. Combining formal concept analysis with information retrieval for concept location in source code. In: Proceedings of IEEE International Conference on Program Comprehension. 2007, 37-48
32 Storey M A, Cheng L T, Bull I, Rigby P. Shared waypoints and social tagging to support collaboration in software development. In: Proceedings of ACM Anniversary Conference on Computer Supported Cooperative Work. 2006, 195–198
33 Khatavkar V, Kulkarni P. Comparison of support vector machines with and without latent semantic analysis for document classification. In: Proceedings of International Conference on Data Management, Analytics & Innovation. 2019, 263-274
34 N Nazar , H Jiang , G Gao , T Zhang , X Li , Z Ren . Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10( 3): 504– 517
https://doi.org/10.1007/s11704-015-4409-2
35 Cortés-Coy L F, Linares-Vásquez M, Aponte J, Poshyvanyk, D. On automatically generating commit messages via summarization of source code changes. In: Proceedings of IEEE International Working Conference on Source Code Analysis and Manipulation. 2014, 275-284
36 Jiang S, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2017, 135-146
37 Liu Z, Xia X, Hassan A E, Lo D, Xing Z, Wang X. Neural-machinetranslation-based commit message generation: how far are we? In: Proceedings of ACM/IEEE International Conference on Automated Software Engineering. 2018, 373-384
38 Zaidi A. Summarizing git commits and Github pull requests using sequence to sequence neural attention models. California: Stanford University, 2017
39 Liu Z, Xia X, Treude C, Lo D, Li S. Automatic generation of pull request descriptions. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2019, 176-188
[1] Highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed