Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (4) : 735-759    https://doi.org/10.1007/s11704-017-6512-z
RESEARCH ARTICLE
Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference
Farid FEYZI, Saeed PARSA()
Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
 Download: PDF(1378 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant causeeffect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experimentswith seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.

Keywords fault localization      debugging      backward dynamic slice      mutual information      feature selection     
Corresponding Author(s): Saeed PARSA   
Just Accepted Date: 07 July 2017   Online First Date: 04 September 2018    Issue Date: 29 May 2019
 Cite this article:   
Farid FEYZI,Saeed PARSA. Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference[J]. Front. Comput. Sci., 2019, 13(4): 735-759.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-017-6512-z
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I4/735
1 SParsa, M Vahidi-Asl, MAsadi-Aghbolaghi. Hierarchy-Debug: a scalable statistical technique for fault localization. Software Quality Journal, 2014, 22(3): 427–466
https://doi.org/10.1007/s11219-013-9199-x
2 W EWong, VDebroy, RGao, Y Li. The DStar method for effective software fault localization. IEEE Transactions on Reliability, 2014, 63(1): 290–308
https://doi.org/10.1109/TR.2013.2285319
3 W EWong, VDebroy, DXu. Towards better fault localization: a crosstab-based statistical approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(3): 378–396
https://doi.org/10.1109/TSMCC.2011.2118751
4 W EWong, VDebroy, BChoi. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208
https://doi.org/10.1016/j.jss.2009.09.037
5 W EWong, YQi. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 2009, 19(4): 573–597
https://doi.org/10.1142/S021819400900426X
6 W EWong, VDebroy, RGolden, X Xu, BThuraisingham. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 2012, 61(1): 149–169
https://doi.org/10.1109/TR.2011.2172031
7 RAbreu, P Zoeteweij, A JVan Gemund. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the International Symposium on Dependable Computing. 2006, 39–46
https://doi.org/10.1109/PRDC.2006.18
8 LNaish, HLee, KRamamohanarao. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology (TOSEM), 2011, 20(3): 11
https://doi.org/10.1145/2000791.2000795
9 SRoychowdhury, S Khurshid. Software fault localization using feature selection. In: Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering. 2011, 11–18
https://doi.org/10.1145/2070821.2070823
10 SRoychowdhury, S Khurshid. A family of generalized entropies and its application to software fault localization. In: Proceedings of the 6th IEEE International Conference on Intelligent Systems. 2012, 368–373
https://doi.org/10.1109/IS.2012.6335163
11 LJiang, ZSu. Context-aware statistical debugging: from bug predictors to faulty control flow paths. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 2007, 184–193
https://doi.org/10.1145/1321631.1321660
12 HLiu, JSun, LLiu, H Zhang. Feature selection with dynamic mutual information. Pattern Recognition, 2009, 42(7): 1330–1339
https://doi.org/10.1016/j.patcog.2008.10.028
13 XSun, YLiu, MXu, HChen, JHan, K Wang. Feature selection using dynamic weights for classification. Knowledge-Based Systems, 2013, 37: 541–549
https://doi.org/10.1016/j.knosys.2012.10.001
14 P AEstévez, M Tesmer, C APerez , J MZurada. Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 2009, 20(2): 189–201
https://doi.org/10.1109/TNN.2008.2005601
15 G KBaah, A Podgurski, M JHarrold. Causal inference for statistical fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. 2010, 73–84
https://doi.org/10.1145/1831708.1831717
16 G KBaah, A Podgurski, M JHarrold. Mitigating the confounding effects of program dependences for effective fault localization. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011, 146–156
https://doi.org/10.1145/2025113.2025136
17 G KBaah, A Podgurski, M JHarrold. Matching Test Cases for Effective Fault Localization. Georgia Institute of Technology, 2011
18 GShu. Statistical estimation of software reliability and failure-causing effect. Doctoral Dissertation, Case Western Reserve University, 2014
19 MWeiser. Program slicing. In: Proceedings of the 5th International Conference on Software Engineering. 1981, 439–449
20 TDenmat, M Ducassé, ORidoux. Data mining and cross-checking of execution traces: a re-interpretation of Jones, Harrold and Stasko test information. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 396–399
https://doi.org/10.1145/1101908.1101979
21 S FSun, A Podgurski. Properties of effective metrics for coveragebased statistical fault localization. In: Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST). 2016, 124–134
22 ZBai, GShu, APodgurski. NUMFL: localizing faults in numerical software using a value-based causal model. In: Proceedings of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST). 2015, 1–10
https://doi.org/10.1109/ICST.2015.7102597
23 ZBai, GShu, APodgurski. Causal inference based fault localization for numerical software with NUMFL. Software Testing, Verification and Reliability, 2017, 27(6): e1613
https://doi.org/10.1002/stvr.1613
24 RGore, P F Reynolds. Reducing confounding bias in predicate-level statistical debugging metrics. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 463–473
https://doi.org/10.1109/ICSE.2012.6227169
25 XWang, S CCheung, W KChan, Z Zhang. Taming coincidental correctness: coverage refinement with context patterns to improve fault localization. In: Proceedings of the 31st International Conference on Software Engineering. 2009, 45–55
https://doi.org/10.1109/ICSE.2009.5070507
26 WMasri, R AAssi. Prevalence of coincidental correctness and mitigation of its impact on fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM), 2014, 23(1): 8
https://doi.org/10.1145/2559932
27 YMiao, ZChen, SLi, ZZhao, YZhou. Identifying coincidental correctness for fault localization by clustering test cases. SEKE. 2012, 267–272
28 XZhang, NGupta, RGupta. Locating faulty code by multiple points slicing. Software: Practice and Experience, 2007, 37(9): 935–961
https://doi.org/10.1002/spe.795
29 XZhang, NGupta, RGupta. Pruning dynamic slices with confidence. ACM SIGPLAN Notices, 2006, 41(6): 169–180
https://doi.org/10.1145/1133255.1134002
30 T MCover, J AThomas. Elements of Information Theory. Hoboken: John Wiley & Sons, 2012
31 JBurbea, C RRao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 1982, 28: 489–495
https://doi.org/10.1109/TIT.1982.1056497
32 T D BLe, DLo, MLi. Constrained feature selection for localizing faults. In: Proceedings of the International Conference on Software Maintenance and Evolution. 2015, 501–505
https://doi.org/10.1109/ICSM.2015.7332502
33 JXu, ZZhang, W KChan, T H Tse, SLi. A general noise-reduction framework for fault localization of Java programs. Information and Software Technology, 2013, 55(5): 880–896
https://doi.org/10.1016/j.infsof.2012.08.006
34 AZeller, R Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 2002, 28(2): 183–200
https://doi.org/10.1109/32.988498
35 AZeller. Isolating cause-effect chains from computer programs. In: Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering. 2002, 1–10
https://doi.org/10.1145/587051.587053
36 JPearl. Causality: models, reasoning and inference. Econometric Theory, 2003, 19: 675–685
37 JFerrante, K J Ottenstein, J DWarren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 1987, 9(3): 319–349
https://doi.org/10.1145/24039.24041
38 APodgurski, L AClarke. A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Transactions on Software Engineering, 1990, 16(9): 965–979
https://doi.org/10.1109/32.58784
39 GShu, BSun, APodgurski, F Cao. MFL: method-level fault localization with causal inference. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 124–133
https://doi.org/10.1109/ICST.2013.31
40 P CAustin. Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-toone matching on the propensity score. American Journal of Epidemiology, 2010, 172(9): 1092–1097
https://doi.org/10.1093/aje/kwq224
41 D WHosmer, S Lemeshow. Applied Logistic Regression. Hoboken: John Wiley & Sons, 2013
https://doi.org/10.1002/9781118548387
42 B BHansen, S O Klopfer. Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 2006, 15(3): 609–627
https://doi.org/10.1198/106186006X137047
43 JFriedman, THastie, RTibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010, 33(1): 1–22
https://doi.org/10.18637/jss.v033.i01
44 DHo, KImai, GKing, E A Stuart. MatchIt: nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 2011, 42(8): 1–28
https://doi.org/10.18637/jss.v042.i08
45 MHutchins, HFoster, TGoradia, T Ostrand. Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering. 1994, 191–200
https://doi.org/10.1109/ICSE.1994.296778
46 XXie, F CKuo, T YChen, S Yoo, MHarman. Provably optimal and human-competitive results in SBSE for spectrum based fault localisation. In: Proceedings of International Symposium on Search Based Software Engineering, Springer Berlin Heidelberg. 2013, 224–238
https://doi.org/10.1007/978-3-642-39742-4_17
47 YYu , J AJones, M JHarrold. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 201–210
https://doi.org/10.1145/1368088.1368116
48 I HWitten, EFrank. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann, 2005
49 L CAscari, L YAraki, A RPozo, S R Vergilio. Exploring machine learning techniques for fault localization. In: Proceedings of the 10th Latin American Test Workshop. 2009, 1–6
https://doi.org/10.1109/LATW.2009.4813783
50 DLo, LJiang, ABudi. Comprehensive evaluation of association measures for fault localization. In: Proceedings of IEEE International Conference on Software Maintenance. 2010, 1–10
51 J AJones, M J Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 273–282
https://doi.org/10.1145/1101908.1101949
52 SRoychowdhury. A mixed approach to spectrum-based fault localization using information theoretic foundations. Doctoral Dissertation, the University of Texas at Austin, 2013
53 BJiang, ZZhang, T HTse, T Y Chen. How well do test case prioritization techniques support statistical fault localization. In: Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference. 2009, 99–106
https://doi.org/10.1109/COMPSAC.2009.23
54 BJiang, ZZhang, W KChan, T H Tse. Adaptive random test case prioritization. In: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering. 2009, 233–244
https://doi.org/10.1109/ASE.2009.77
55 SYoo, MHarman, DClark. Fault localization prioritization: comparing information-theoretic and coverage-based approaches. ACM Transactions on Software Engineering and Methodology (TOSEM), 2013, 22(3): 19
https://doi.org/10.1145/2491509.2491513
56 SRoychowdhury, S Khurshid. A novel framework for locating software faults using latent divergences. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2011, 49–64
https://doi.org/10.1007/978-3-642-23808-6_4
57 RAbreu, P Zoeteweij, RGolsteijn, A JVan Gemund. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 2009, 82(11): 1780–1792
https://doi.org/10.1016/j.jss.2009.06.035
58 SYoo. Evolving human competitive spectra-based fault localisation techniques. In: Proceedings of International Symposium on Search Based Software Engineering. 2012, 244–258
https://doi.org/10.1007/978-3-642-33119-0_18
59 W EWong, RGao, YLi, RAbreu, FWotawa. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740
https://doi.org/10.1109/TSE.2016.2521368
60 T D BLe, FThung, DLo. Theory and practice, do they matchfi a case with spectrum-based fault localization. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM). 2013, 380–383
61 SYoo, XXie, F CKuo, T Y Chen, MHarman. No pot of gold at the end of program spectrum rainbow: greatest risk evaluation formula does not exist. Research Note (RN), 2014, 14(14): 14
62 XJu, SJiang, XChen, X Wang, YZhang, HCao. HSFal: effective fault localization using hybrid spectrum of full slices and execution slices. Journal of Systems and Software, 2014, 90: 3–17
https://doi.org/10.1016/j.jss.2013.11.1109
63 NGupta , HHe, XZhang, R Gupta. Locating faulty code using failureinducing chops. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 263–272
https://doi.org/10.1145/1101908.1101948
64 XZhang, NGupta, RGupta. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 272–281
https://doi.org/10.1145/1134285.1134324
65 DJeffrey, NGupta, RGupta. Fault localization using value replacement. In: Proceedings of the International Symposium on Software Testing and Analysis. 2008, 167–178
https://doi.org/10.1145/1390630.1390652
66 XZhang, STallam, NGupta, R Gupta. Towards locating execution omission errors. ACM Sigplan Notices, 2007, 42(6): 415–424
https://doi.org/10.1145/1273442.1250782
[1] Deheng YANG, Yuhua QI, Xiaoguang MAO, Yan LEI. Evaluating the usage of fault localization in automated program repair: an empirical study[J]. Front. Comput. Sci., 2021, 15(1): 151202-.
[2] Lei CHEN, Kai SHAO, Xianzhong LONG, Lingsheng WANG. Multi-task regression learning for survival analysis via prior information guided transductive matrix completion[J]. Front. Comput. Sci., 2020, 14(5): 145312-.
[3] Parnika PARANJAPE, Meera DHABU, Parag DESHPANDE. A novel classifier for multivariate instance using graph class signatures[J]. Front. Comput. Sci., 2020, 14(4): 144307-.
[4] Nannan XIE, Xing WANG, Wei WANG, Jiqiang LIU. Fingerprinting Android malware families[J]. Front. Comput. Sci., 2019, 13(3): 637-646.
[5] Xuegang HU, Peng ZHOU, Peipei LI, Jing WANG, Xindong WU. A survey on online feature selection with streaming features[J]. Front. Comput. Sci., 2018, 12(3): 479-493.
[6] Xiaobing SUN,Xin PENG,Bin LI,Bixin LI,Wanzhi WEN. IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra[J]. Front. Comput. Sci., 2016, 10(5): 812-831.
[7] Djamal ZIANI. Feature selection on probabilistic symbolic objects[J]. Front. Comput. Sci., 2014, 8(6): 933-947.
[8] Zhisong PAN,Zhantao DENG,Yibing WANG,Yanyan ZHANG. Dimensionality reduction via kernel sparse representation[J]. Front. Comput. Sci., 2014, 8(5): 807-815.
[9] Dion DETTERER, Paul KWAN, Cedric GONDRO. A co-evolving memetic wrapper for prediction of patient outcomes in TCM informatics[J]. Front Comput Sci, 2012, 6(5): 621-629.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed