Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (6) : 176214    https://doi.org/10.1007/s11704-022-2313-0
Software
Measuring code maintainability with deep neural networks
Yamin HU1(), Hao JIANG2, Zongyao HU1
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2. School of Artificial Intelligence, Anhui University, Hefei 230601, China
 Download: PDF(8218 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The maintainability of source code is a key quality characteristic for software quality. Many approaches have been proposed to quantitatively measure code maintainability. Such approaches rely heavily on code metrics, e.g., the number of Lines of Code and McCabe’s Cyclomatic Complexity. The employed code metrics are essentially statistics regarding code elements, e.g., the numbers of tokens, lines, references, and branch statements. However, natural language in source code, especially identifiers, is rarely exploited by such approaches. As a result, replacing meaningful identifiers with nonsense tokens would not significantly influence their outputs, although the replacement should have significantly reduced code maintainability. To this end, in this paper, we propose a novel approach (called DeepM) to measure code maintainability by exploiting the lexical semantics of text in source code. DeepM leverages deep learning techniques (e.g., LSTM and attention mechanism) to exploit these lexical semantics in measuring code maintainability. Another key rationale of DeepM is that measuring code maintainability is complex and often far beyond the capabilities of statistics or simple heuristics. Consequently, DeepM leverages deep learning techniques to automatically select useful features from complex and lengthy inputs and to construct a complex mapping (rather than simple heuristics) from the input to the output (code maintainability index). DeepM is evaluated on a manually-assessed dataset. The evaluation results suggest that DeepM is accurate, and it generates the same rankings of code maintainability as those of experienced programmers on 87.5% of manually ranked pairs of Java classes.

Keywords code maintainability      lexical semantics      deep learning      neural networks     
Corresponding Author(s): Yamin HU   
Just Accepted Date: 31 October 2022   Issue Date: 17 January 2023
 Cite this article:   
Yamin HU,Hao JIANG,Zongyao HU. Measuring code maintainability with deep neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176214.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2313-0
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I6/176214
Fig.1  
Fig.2  
Fig.3  Overview of DeepM
Fig.4  
Fig.5  
Fig.6  Representation of the field declarations of running example
Category Code metrics
Coupling Coupling between objects (CBO) response for a class (RFC)
Cohesion Lack of cohesion of methods (LCOM)
Complexity Weighted methods per class (WMC)
Size Lines of code (LOC)
Basic counting Numbers of {anonymous, inner} classes Numbers of {total, abstract, default, final, private, protected, public, static, synchronized, visible} methods Numbers of {total, default, final, private, protected, public, static, synchronized} fields Number of assignments Number of log statements Number of loops Number of returns Number of try/catches Number of comparisons Number of lambdas Number of math operations Maximum number of nested blocks Number of numbers Number of parenthesized expressions Number of string literals Number of unique words Number of variables
Tab.1  Employed code metrics
Fig.7  Deep learning-based model for code maintainability measurement
Fig.8  Attention-based dense network
Dataset # Classes Construction approach
Testing data 400 Manual Construction
Training data 1,394,514 Automated Construction
Tab.2  Datasets for evaluation
Approach Tomcat Log4j 2 Apollo Druid Sentinel Average
DeepM 90.0% 80.0% 90.0% 85.0% 92.5% 87.5%
LRR 80.0% 70.0% 77.5% 75.0% 72.5% 75.0%
CHC 65.0% 55.0% 70.0% 45.0% 52.5% 57.5%
Tab.3  Ranking accuracy on the testing data
Setting Tomcat Log4j 2 Apollo Druid Sentinel Average Rate
Default 90.0% 80.0% 90.0% 85.0% 92.5% 87.5%
Disabling identifiers 75.0% 60.0% 80.0% 72.5% 82.5% 74.0% 15.4%
Disabling field declarations 80.0% 82.5% 90.0% 80.0% 92.5% 85.0% 2.9%
Disabling method headers 80.0% 75.0% 90.0% 85.0% 90.0% 84.0% 4.0%
Disabling simplified ASTs 85.0% 70.0% 87.5% 82.5% 92.5% 83.5% 4.6%
Disabling structural format 82.5% 67.5% 90.0% 80.0% 92.5% 82.5% 5.7%
Disabling code metrics 80.0% 67.5% 87.5% 80.0% 92.5% 81.5% 6.9%
Tab.4  Effects of the exploited features on the testing data (ranking accuracy)
Metric Value
Size of training data 85,560,826 LOC
Training time (3 epoch) ~62 hours
Size of testing data 29,434 LOC
Testing time ~30 seconds
Testing time per class ~0.076 seconds
Tab.5  Performance of DeepM
Fig.9  Training time versus size of training dataset
  
  
  
1 N F Schneidewind . The state of software maintenance. IEEE Transactions on Software Engineering, 1987, SE-13( 3): 303–310
2 K H, Bennett V T Rajlich . Software maintenance and evolution: a roadmap. In: Proceedings of the Conference on the Future of Software Engineering. 2000, 73−87
3 B P, Lientz E B Swanson . Software Maintenance Management: A Study of the Maintenance of Computer Application Software in 487 Data Processing Organizations. Reading: Addison-Wesley, 1980
4 S S, Yau J S Collofello . Some stability measures for software maintenance. IEEE Transactions on Software Engineering, 1980, SE-6( 6): 545–552
5 S P Reiss . Incremental maintenance of software artifacts. IEEE Transactions on Software Engineering, 2006, 32( 9): 682–697
6 ISO. ISO/IEC 5055: 2021 Information technology — Software measurement — Software quality measurement — Automated source code quality measures. Geneva: ISO, 2021
7 P, Grubb A A Takang . Software Maintenance: Concepts and Practice. 2nd ed. London: World Scientific Publishing, 2003
8 Dallal J Al . Object-oriented class maintainability prediction using internal quality attributes. Information and Software Technology, 2013, 55( 11): 2028–2048
9 G J, Myers T, Badgett T M, Thomas C Sandler . The Art of Software Testing. 2nd ed. Hoboken: John Wiley & Sons, 2004
10 M, Mari N Eila . The impact of maintainability on component-based software systems. In: Proceedings of the 29th Euromicro Conference. 2003, 25−32
11 W, Li S Henry . Object-oriented metrics that predict maintainability. Journal of Systems and Software, 1993, 23( 2): 111–122
12 P, Oman J Hagemeister . Construction and testing of polynomials predicting software maintainability. Journal of Systems and Software, 1994, 24( 3): 251–266
13 I, Heitlager T, Kuipers J Visser . A practical model for measuring maintainability. In: Proceedings of the 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007). 2007, 30−39
14 ISO. ISO/IEC 9126-1: 2001 Software engineering—Product quality — Part 1: Quality model. Geneva: ISO, 2001
15 Y, Zhou H Leung . Predicting object-oriented software maintainability using multivariate adaptive regression splines. Journal of Systems and Software, 2007, 80( 8): 1349–1361
16 Bavota G, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A. An empirical study on the developers’ perception of software coupling. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 692−701
17 Pantiuchina J, Lanza M, Bavota G. Improving code: the (Mis) perception of quality metrics. In: Proceedings of 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 2018, 80−91
18 J, Lenhard M, Blom S Herold . Exploring the suitability of source code metrics for indicating architectural inconsistencies. Software Quality Journal, 2019, 27( 1): 241–274
19 H, Liu Z, Xu Y Zou . Deep learning based feature envy detection. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018, 385−396
20 Scalabrino S, Linares-Vasquez M, Poshyvanyk D, Oliveto R. Improving code readability models with textual features. In: Proceedings of the 24th IEEE International Conference on Program Comprehension. 2016, 1−10
21 S, Kim D Kim . Automatic identifier inconsistency detection using code dictionary. Empirical Software Engineering, 2016, 21( 2): 565–604
22 Schankin A, Berger A, Holt D V, Hofmeister J C, Riedel T, Beigl M. Descriptive compound identifier names improve source code comprehension. In: Proceedings of the 26th IEEE/ACM International Conference on Program Comprehension (ICPC). 2018, 31−40
23 S, Lai K, Liu S, He J Zhao . How to generate a good word embedding. IEEE Intelligent Systems, 2016, 31( 6): 5–14
24 K, Greff R K, Srivastava J, Koutník B R, Steunebrink J Schmidhuber . LSTM: a search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28( 10): 2222–2232
25 K S, Tai R, Socher C D Manning . Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015, 1556−1566
26 C, Raffel D P Ellis . Feed-forward networks with attention can solve some long-term memory problems. 2016, arXiv preprint arXiv: 1512.08756
27 W S, McCulloch W Pitts . A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 1943, 5( 4): 115–133
28 A S, Nuñez-Varela H G, Pérez-Gonzalez F E, Martínez-Perez C Soubervielle-Montalvo . Source code metrics: a systematic mapping study. Journal of Systems and Software, 2017, 128: 164–197
29 A Przybyłek . Where the truth lies: AOP and its impact on software modularity. In: Proceedings of the 14th International Conference on Fundamental Approaches to Software Engineering. 2011, 447−461
30 B M, Goel P K Bhatia . Analysis of reusability of object-oriented systems using object-oriented metrics. ACM SIGSOFT Software Engineering Notes, 2013, 38( 4): 1–5
31 M, Bruntink Deursen A Van . An empirical study into class testability. Journal of Systems and Software, 2006, 79( 9): 1219–1232
32 E R, Poort N, Martens De Weerd I, Van Vliet H Van . How architects see non-functional requirements: beware of modifiability. In: Proceedings of the 18th International Conference on Requirements Engineering: Foundation for Software Quality. 2012, 37−51
33 Alzahrani M, Alqithami S, Melton A. Using client-based class cohesion metrics to predict class maintainability. In: Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference (COMPSAC). 2019, 72−80
34 Y, Kanellopoulos P, Antonellis D, Antoniou C, Makris E, Theodoridis C, Tjortjis N Tsirakis . Code quality evaluation methodology using the ISO/IEC 9126 standard. International Journal of Software Engineering & Applications, 2010, 1( 3): 17–36
35 R, Malhotra K Lata . An empirical study to investigate the impact of data resampling techniques on the performance of class maintainability prediction models. Neurocomputing, 2021, 459: 432–453
36 N, Padhy R, Panigrahi K Neeraja . Threshold estimation from software metrics by using evolutionary techniques and its proposed algorithms, models. Evolutionary Intelligence, 2019, 14( 2): 315–329
37 R Shatnawi . Comparison of threshold identification techniques for object-oriented software metrics. IET Software, 2020, 14( 6): 727–738
38 K A M, Ferreira M A S, Bigonha R S, Bigonha L F O, Mendes H C Almeida . Identifying thresholds for object-oriented software metrics. Journal of Systems and Software, 2012, 85( 2): 244–257
39 Hofmeister J, Siegmund J, Holt D V. Shorter identifier names take longer to comprehend. In: Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2017, 217−227
40 R, Morales F, Khomh G Antoniol . RePOR: mimicking humans on refactoring tasks. Are we there yet? Empirical Software Engineering, 2020, 25( 4): 2960–2996
41 Y, Hussain Z, Huang Y Zhou . Improving source code suggestion with code embedding and enhanced convolutional long short-term memory. IET Software, 2021, 15( 3): 199–213
42 S, Scalabrino G, Bavota C, Vendome M, Linares-Vasquez D, Poshyvanyk R Oliveto . Automatically assessing code understandability. IEEE Transactions on Software Engineering, 2021, 47( 3): 595–613
43 Smith N, Van Bruggen D, Tomassetti F. JavaParser: Visited. Victoria: Leanpub, 2021
44 Y, Jiang H, Liu J, Zhu L Zhang . Automatic and accurate expansion of abbreviations in parameters. IEEE Transactions on Software Engineering, 2020, 46( 7): 732–747
45 S Butler . The effect of identifier naming on source code readability and quality. In: Proceedings of the Doctoral Symposium for ESEC/FSE on Doctoral Symposium. 2009, 33−34
46 D, Lawrie H, Feild D Binkley . Quantifying identifier quality: an analysis of trends. Empirical Software Engineering, 2007, 12( 4): 359–388
47 ORACLE. Naming Conventions. See oracle-base/articles/misc/naming-conventions website, 2017
48 M Olsson . Constants. In: Olsson M, ed. Java 17 Quick Syntax Reference. 3rd ed. Berkele: Apress Berkele, 2022, 85−87
49 M Fowler . Refactoring: Improving the Design of Existing Code. 2nd ed. Reading: Addison-Wesley Professional, 2018
50 Q, Mi Y, Xiao Z, Cai X Jia . The effectiveness of data augmentation in code readability classification. Information and Software Technology, 2021, 129: 106378
51 Alon U, Zilberstein M, Levy O, Yahav E. Code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages, 2019, 3(POPL): 40
52 T, Lee J B, Lee H P In . A study of different coding styles affecting code readability. International Journal of Software Engineering and its Applications, 2013, 7( 5): 413–422
53 H, Sun R, Wang K, Chen M, Utiyama E, Sumita T Zhao . Unsupervised bilingual word embedding agreement for unsupervised neural machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 1235−1245
54 S R, Chidamber C F Kemerer . A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 1994, 20( 6): 476–493
55 Aniche M. Java code metrics calculator (CK), 2015
56 A, Svyatkovskiy S K, Deng S, Fu N Sundaresan . IntelliCode compose: code generation using transformer. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1433−1443
57 A, Svyatkovskiy Y, Zhao S, Fu N Sundaresan . Pythia: AI-assisted code completion system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 2727−2735
58 Li Y, Yang Z, Guo Y, Chen X. Humanoid: a deep learning-based approach to automated black-box android app testing. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2019, 1070−1073
59 Gu X, Zhang H, Zhang D, Kim S. Deep API learning. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 631−642
60 I, Goodfellow Y, Bengio A Courville . Deep Learning. Cambridge: MIT Press, 2016
61 J, Lever M, Krzywinski N Altman . Logistic regression. Nature Methods, 2016, 13( 7): 541–543
62 Y, Zhang M, Zhou A, Mockus Z Jin . Companies’ participation in OSS development−an empirical study of OpenStack. IEEE Transactions on Software Engineering, 2021, 47( 10): 2242–2259
63 S, Amreen A, Mockus R, Zaretzki C, Bogart Y X Zhang . ALFAA: active learning fingerprint based anti-aliasing for correcting developer identity errors in version control systems. Empirical Software Engineering, 2020, 25( 2): 1136–1167
64 Emam K, El S, Benlarbi N, Goel S N Rai . The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Software Engineering, 2001, 27( 7): 630–650
65 Y Bengio . Practical recommendations for gradient-based training of deep architectures. In: Montavon G, Orr G B, Müller K R, eds. Neural Networks: Tricks of the Trade. Berlin: Springer, 2012, 437−478
66 W, Wang G, Li S, Shen X, Xia Z Jin . Modular tree network for source code representation learning. ACM Transactions on Software Engineering and Methodology, 2020, 29( 4): 31
67 Allamanis M, Barr E T, Bird C, Sutton C. Suggesting accurate method and class names. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 2015, 38−49
68 N Baranasuriya . Java Coding Standard. See se-education/guides/ conventions/java/index website, 2022
69 I, Triguero S, González J M, Moyano S, García J, Alcalá-Fdez J, Luengo A, Fernández Jesús M J, Del L, Sánchez F Herrera . KEEL 3.0: an open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems, 2017, 10( 1): 1238–1249
70 Zerouali A, Mens T. Analyzing the evolution of testing library usage in open source java projects. In: Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2017, 417−421
[1] FCS-22313-OF-YH_suppl_1 Download
[1] Lingling ZHAO, Shitao SONG, Pengyan WANG, Chunyu WANG, Junjie WANG, Maozu GUO. A MLP-Mixer and mixture of expert model for remaining useful life prediction of lithium-ion batteries[J]. Front. Comput. Sci., 2024, 18(5): 185329-.
[2] Enes DEDEOGLU, Himmet Toprak KESGIN, Mehmet Fatih AMASYALI. A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k[J]. Front. Comput. Sci., 2024, 18(4): 184315-.
[3] Hengyu LIU, Tiancheng ZHANG, Fan LI, Minghe YU, Ge YU. A probabilistic generative model for tracking multi-knowledge concept mastery probability[J]. Front. Comput. Sci., 2024, 18(3): 183602-.
[4] Mingzhi YUAN, Kexue FU, Zhihao LI, Manning WANG. Decoupled deep hough voting for point cloud registration[J]. Front. Comput. Sci., 2024, 18(2): 182703-.
[5] Mingzhen LI, Changxi LIU, Jianjin LIAO, Xuegui ZHENG, Hailong YANG, Rujun SUN, Jun XU, Lin GAN, Guangwen YANG, Zhongzhi LUAN, Depei QIAN. Towards optimized tensor code generation for deep learning on sunway many-core processor[J]. Front. Comput. Sci., 2024, 18(2): 182101-.
[6] Hanadi AL-MEKHLAFI, Shiguang LIU. Single image super-resolution: a comprehensive review and recent insight[J]. Front. Comput. Sci., 2024, 18(1): 181702-.
[7] Yongquan LIANG, Qiuyu SONG, Zhongying ZHAO, Hui ZHOU, Maoguo GONG. BA-GNN: Behavior-aware graph neural network for session-based recommendation[J]. Front. Comput. Sci., 2023, 17(6): 176613-.
[8] Yufei ZENG, Zhixin LI, Zhenbin CHEN, Huifang MA. Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network[J]. Front. Comput. Sci., 2023, 17(6): 176340-.
[9] Shuo TAN, Lei ZHANG, Xin SHU, Zizhou WANG. A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176338-.
[10] Yuan GAO, Xiang WANG, Xiangnan HE, Huamin FENG, Yongdong ZHANG. Rumor detection with self-supervised learning on texts and social graph[J]. Front. Comput. Sci., 2023, 17(4): 174611-.
[11] Muazzam MAQSOOD, Sadaf YASMIN, Saira GILLANI, Maryam BUKHARI, Seungmin RHO, Sang-Soo YEO. An efficient deep learning-assisted person re-identification solution for intelligent video surveillance in smart cities[J]. Front. Comput. Sci., 2023, 17(4): 174329-.
[12] Hongjia RUAN, Huihui SONG, Bo LIU, Yong CHENG, Qingshan LIU. Intellectual property protection for deep semantic segmentation models[J]. Front. Comput. Sci., 2023, 17(1): 171306-.
[13] Tian WANG, Jiakun LI, Huai-Ning WU, Ce LI, Hichem SNOUSSI, Yang WU. ResLNet: deep residual LSTM network with longer input for action recognition[J]. Front. Comput. Sci., 2022, 16(6): 166334-.
[14] Yi WEI, Mei XUE, Xin LIU, Pengxiang XU. Data fusing and joint training for learning with noisy labels[J]. Front. Comput. Sci., 2022, 16(6): 166338-.
[15] Donghong HAN, Yanru KONG, Jiayi HAN, Guoren WANG. A survey of music emotion recognition[J]. Front. Comput. Sci., 2022, 16(6): 166335-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed