<i>Inforence</i>: effective fault localization based on information-theoretic analysis and statistical causal inference

doi:10.1007/s11704-017-6512-z

Front. Comput. Sci.

2019, Vol. 13

Issue (4) : 735-759 https://doi.org/10.1007/s11704-017-6512-z

RESEARCH ARTICLE

Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Farid FEYZI, Saeed PARSA(

)

Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

Download: PDF(1378 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant causeeffect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experimentswith seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.

Keywords fault localization debugging backward dynamic slice mutual information feature selection

Corresponding Author(s): Saeed PARSA

Just Accepted Date: 07 July 2017 Online First Date: 04 September 2018 Issue Date: 29 May 2019

Cite this article:

Farid FEYZI,Saeed PARSA. Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference[J]. Front. Comput. Sci., 2019, 13(4): 735-759.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-017-6512-z
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I4/735

1	SParsa, M Vahidi-Asl, MAsadi-Aghbolaghi. Hierarchy-Debug: a scalable statistical technique for fault localization. Software Quality Journal, 2014, 22(3): 427–466 https://doi.org/10.1007/s11219-013-9199-x
2	W EWong, VDebroy, RGao, Y Li. The DStar method for effective software fault localization. IEEE Transactions on Reliability, 2014, 63(1): 290–308 https://doi.org/10.1109/TR.2013.2285319
3	W EWong, VDebroy, DXu. Towards better fault localization: a crosstab-based statistical approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(3): 378–396 https://doi.org/10.1109/TSMCC.2011.2118751
4	W EWong, VDebroy, BChoi. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208 https://doi.org/10.1016/j.jss.2009.09.037
5	W EWong, YQi. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 2009, 19(4): 573–597 https://doi.org/10.1142/S021819400900426X
6	W EWong, VDebroy, RGolden, X Xu, BThuraisingham. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 2012, 61(1): 149–169 https://doi.org/10.1109/TR.2011.2172031
7	RAbreu, P Zoeteweij, A JVan Gemund. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the International Symposium on Dependable Computing. 2006, 39–46 https://doi.org/10.1109/PRDC.2006.18
8	LNaish, HLee, KRamamohanarao. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology (TOSEM), 2011, 20(3): 11 https://doi.org/10.1145/2000791.2000795
9	SRoychowdhury, S Khurshid. Software fault localization using feature selection. In: Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering. 2011, 11–18 https://doi.org/10.1145/2070821.2070823
10	SRoychowdhury, S Khurshid. A family of generalized entropies and its application to software fault localization. In: Proceedings of the 6th IEEE International Conference on Intelligent Systems. 2012, 368–373 https://doi.org/10.1109/IS.2012.6335163
11	LJiang, ZSu. Context-aware statistical debugging: from bug predictors to faulty control flow paths. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 2007, 184–193 https://doi.org/10.1145/1321631.1321660
12	HLiu, JSun, LLiu, H Zhang. Feature selection with dynamic mutual information. Pattern Recognition, 2009, 42(7): 1330–1339 https://doi.org/10.1016/j.patcog.2008.10.028
13	XSun, YLiu, MXu, HChen, JHan, K Wang. Feature selection using dynamic weights for classification. Knowledge-Based Systems, 2013, 37: 541–549 https://doi.org/10.1016/j.knosys.2012.10.001
14	P AEstévez, M Tesmer, C APerez , J MZurada. Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 2009, 20(2): 189–201 https://doi.org/10.1109/TNN.2008.2005601
15	G KBaah, A Podgurski, M JHarrold. Causal inference for statistical fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. 2010, 73–84 https://doi.org/10.1145/1831708.1831717
16	G KBaah, A Podgurski, M JHarrold. Mitigating the confounding effects of program dependences for effective fault localization. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011, 146–156 https://doi.org/10.1145/2025113.2025136
17	G KBaah, A Podgurski, M JHarrold. Matching Test Cases for Effective Fault Localization. Georgia Institute of Technology, 2011
18	GShu. Statistical estimation of software reliability and failure-causing effect. Doctoral Dissertation, Case Western Reserve University, 2014
19	MWeiser. Program slicing. In: Proceedings of the 5th International Conference on Software Engineering. 1981, 439–449
20	TDenmat, M Ducassé, ORidoux. Data mining and cross-checking of execution traces: a re-interpretation of Jones, Harrold and Stasko test information. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 396–399 https://doi.org/10.1145/1101908.1101979
21	S FSun, A Podgurski. Properties of effective metrics for coveragebased statistical fault localization. In: Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST). 2016, 124–134
22	ZBai, GShu, APodgurski. NUMFL: localizing faults in numerical software using a value-based causal model. In: Proceedings of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST). 2015, 1–10 https://doi.org/10.1109/ICST.2015.7102597
23	ZBai, GShu, APodgurski. Causal inference based fault localization for numerical software with NUMFL. Software Testing, Verification and Reliability, 2017, 27(6): e1613 https://doi.org/10.1002/stvr.1613
24	RGore, P F Reynolds. Reducing confounding bias in predicate-level statistical debugging metrics. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 463–473 https://doi.org/10.1109/ICSE.2012.6227169
25	XWang, S CCheung, W KChan, Z Zhang. Taming coincidental correctness: coverage refinement with context patterns to improve fault localization. In: Proceedings of the 31st International Conference on Software Engineering. 2009, 45–55 https://doi.org/10.1109/ICSE.2009.5070507
26	WMasri, R AAssi. Prevalence of coincidental correctness and mitigation of its impact on fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM), 2014, 23(1): 8 https://doi.org/10.1145/2559932
27	YMiao, ZChen, SLi, ZZhao, YZhou. Identifying coincidental correctness for fault localization by clustering test cases. SEKE. 2012, 267–272
28	XZhang, NGupta, RGupta. Locating faulty code by multiple points slicing. Software: Practice and Experience, 2007, 37(9): 935–961 https://doi.org/10.1002/spe.795
29	XZhang, NGupta, RGupta. Pruning dynamic slices with confidence. ACM SIGPLAN Notices, 2006, 41(6): 169–180 https://doi.org/10.1145/1133255.1134002
30	T MCover, J AThomas. Elements of Information Theory. Hoboken: John Wiley & Sons, 2012
31	JBurbea, C RRao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 1982, 28: 489–495 https://doi.org/10.1109/TIT.1982.1056497
32	T D BLe, DLo, MLi. Constrained feature selection for localizing faults. In: Proceedings of the International Conference on Software Maintenance and Evolution. 2015, 501–505 https://doi.org/10.1109/ICSM.2015.7332502
33	JXu, ZZhang, W KChan, T H Tse, SLi. A general noise-reduction framework for fault localization of Java programs. Information and Software Technology, 2013, 55(5): 880–896 https://doi.org/10.1016/j.infsof.2012.08.006
34	AZeller, R Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 2002, 28(2): 183–200 https://doi.org/10.1109/32.988498
35	AZeller. Isolating cause-effect chains from computer programs. In: Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering. 2002, 1–10 https://doi.org/10.1145/587051.587053
36	JPearl. Causality: models, reasoning and inference. Econometric Theory, 2003, 19: 675–685
37	JFerrante, K J Ottenstein, J DWarren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 1987, 9(3): 319–349 https://doi.org/10.1145/24039.24041
38	APodgurski, L AClarke. A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Transactions on Software Engineering, 1990, 16(9): 965–979 https://doi.org/10.1109/32.58784
39	GShu, BSun, APodgurski, F Cao. MFL: method-level fault localization with causal inference. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 124–133 https://doi.org/10.1109/ICST.2013.31
40	P CAustin. Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-toone matching on the propensity score. American Journal of Epidemiology, 2010, 172(9): 1092–1097 https://doi.org/10.1093/aje/kwq224
41	D WHosmer, S Lemeshow. Applied Logistic Regression. Hoboken: John Wiley & Sons, 2013 https://doi.org/10.1002/9781118548387
42	B BHansen, S O Klopfer. Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 2006, 15(3): 609–627 https://doi.org/10.1198/106186006X137047
43	JFriedman, THastie, RTibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010, 33(1): 1–22 https://doi.org/10.18637/jss.v033.i01
44	DHo, KImai, GKing, E A Stuart. MatchIt: nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 2011, 42(8): 1–28 https://doi.org/10.18637/jss.v042.i08
45	MHutchins, HFoster, TGoradia, T Ostrand. Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering. 1994, 191–200 https://doi.org/10.1109/ICSE.1994.296778
46	XXie, F CKuo, T YChen, S Yoo, MHarman. Provably optimal and human-competitive results in SBSE for spectrum based fault localisation. In: Proceedings of International Symposium on Search Based Software Engineering, Springer Berlin Heidelberg. 2013, 224–238 https://doi.org/10.1007/978-3-642-39742-4_17
47	YYu , J AJones, M JHarrold. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 201–210 https://doi.org/10.1145/1368088.1368116
48	I HWitten, EFrank. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann, 2005
49	L CAscari, L YAraki, A RPozo, S R Vergilio. Exploring machine learning techniques for fault localization. In: Proceedings of the 10th Latin American Test Workshop. 2009, 1–6 https://doi.org/10.1109/LATW.2009.4813783
50	DLo, LJiang, ABudi. Comprehensive evaluation of association measures for fault localization. In: Proceedings of IEEE International Conference on Software Maintenance. 2010, 1–10
51	J AJones, M J Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 273–282 https://doi.org/10.1145/1101908.1101949
52	SRoychowdhury. A mixed approach to spectrum-based fault localization using information theoretic foundations. Doctoral Dissertation, the University of Texas at Austin, 2013
53	BJiang, ZZhang, T HTse, T Y Chen. How well do test case prioritization techniques support statistical fault localization. In: Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference. 2009, 99–106 https://doi.org/10.1109/COMPSAC.2009.23
54	BJiang, ZZhang, W KChan, T H Tse. Adaptive random test case prioritization. In: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering. 2009, 233–244 https://doi.org/10.1109/ASE.2009.77
55	SYoo, MHarman, DClark. Fault localization prioritization: comparing information-theoretic and coverage-based approaches. ACM Transactions on Software Engineering and Methodology (TOSEM), 2013, 22(3): 19 https://doi.org/10.1145/2491509.2491513
56	SRoychowdhury, S Khurshid. A novel framework for locating software faults using latent divergences. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2011, 49–64 https://doi.org/10.1007/978-3-642-23808-6_4
57	RAbreu, P Zoeteweij, RGolsteijn, A JVan Gemund. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 2009, 82(11): 1780–1792 https://doi.org/10.1016/j.jss.2009.06.035
58	SYoo. Evolving human competitive spectra-based fault localisation techniques. In: Proceedings of International Symposium on Search Based Software Engineering. 2012, 244–258 https://doi.org/10.1007/978-3-642-33119-0_18
59	W EWong, RGao, YLi, RAbreu, FWotawa. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740 https://doi.org/10.1109/TSE.2016.2521368
60	T D BLe, FThung, DLo. Theory and practice, do they matchfi a case with spectrum-based fault localization. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM). 2013, 380–383
61	SYoo, XXie, F CKuo, T Y Chen, MHarman. No pot of gold at the end of program spectrum rainbow: greatest risk evaluation formula does not exist. Research Note (RN), 2014, 14(14): 14
62	XJu, SJiang, XChen, X Wang, YZhang, HCao. HSFal: effective fault localization using hybrid spectrum of full slices and execution slices. Journal of Systems and Software, 2014, 90: 3–17 https://doi.org/10.1016/j.jss.2013.11.1109
63	NGupta , HHe, XZhang, R Gupta. Locating faulty code using failureinducing chops. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 263–272 https://doi.org/10.1145/1101908.1101948
64	XZhang, NGupta, RGupta. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 272–281 https://doi.org/10.1145/1134285.1134324
65	DJeffrey, NGupta, RGupta. Fault localization using value replacement. In: Proceedings of the International Symposium on Software Testing and Analysis. 2008, 167–178 https://doi.org/10.1145/1390630.1390652
66	XZhang, STallam, NGupta, R Gupta. Towards locating execution omission errors. ACM Sigplan Notices, 2007, 42(6): 415–424 https://doi.org/10.1145/1273442.1250782

[1]

Download

[1]	Deheng YANG, Yuhua QI, Xiaoguang MAO, Yan LEI. Evaluating the usage of fault localization in automated program repair: an empirical study[J]. Front. Comput. Sci., 2021, 15(1): 151202-.
[2]	Lei CHEN, Kai SHAO, Xianzhong LONG, Lingsheng WANG. Multi-task regression learning for survival analysis via prior information guided transductive matrix completion[J]. Front. Comput. Sci., 2020, 14(5): 145312-.
[3]	Parnika PARANJAPE, Meera DHABU, Parag DESHPANDE. A novel classifier for multivariate instance using graph class signatures[J]. Front. Comput. Sci., 2020, 14(4): 144307-.
[4]	Nannan XIE, Xing WANG, Wei WANG, Jiqiang LIU. Fingerprinting Android malware families[J]. Front. Comput. Sci., 2019, 13(3): 637-646.
[5]	Xuegang HU, Peng ZHOU, Peipei LI, Jing WANG, Xindong WU. A survey on online feature selection with streaming features[J]. Front. Comput. Sci., 2018, 12(3): 479-493.
[6]	Xiaobing SUN,Xin PENG,Bin LI,Bixin LI,Wanzhi WEN. IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra[J]. Front. Comput. Sci., 2016, 10(5): 812-831.
[7]	Djamal ZIANI. Feature selection on probabilistic symbolic objects[J]. Front. Comput. Sci., 2014, 8(6): 933-947.
[8]	Zhisong PAN,Zhantao DENG,Yibing WANG,Yanyan ZHANG. Dimensionality reduction via kernel sparse representation[J]. Front. Comput. Sci., 2014, 8(5): 807-815.
[9]	Dion DETTERER, Paul KWAN, Cedric GONDRO. A co-evolving memetic wrapper for prediction of patient outcomes in TCM informatics[J]. Front Comput Sci, 2012, 6(5): 621-629.

Viewed

Full text

Abstract

Cited

Shared

Discussed