Please wait a minute...
Frontiers of Earth Science

ISSN 2095-0195

ISSN 2095-0209(Online)

CN 11-5982/P

Postal Subscription Code 80-963

2018 Impact Factor: 1.205

Front. Earth Sci.    2016, Vol. 10 Issue (3) : 389-408    https://doi.org/10.1007/s11707-016-0595-y
RESEARCH ARTICLE
The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression
Helmut SCHAEBEN(),Georg SEMMLER
Department of Geophysics and Geoinformatics, TU Bergakademie Freiberg, Freiberg 09596, Germany
 Download: PDF(1655 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The objective of prospectivity modeling is prediction of the conditional probability of the presence T=1 or absence T=0 of a target T given favorable or prohibitive predictors B, or construction of a two classes {0,1} classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists’ favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to “validate” this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.

Keywords general weights of evidence      joint conditional independence      naïve Bayes model      Hammersley–Clifford theorem      interaction terms      statistical significance     
Corresponding Author(s): Helmut SCHAEBEN   
Just Accepted Date: 25 April 2016   Online First Date: 17 May 2016    Issue Date: 20 June 2016
 Cite this article:   
Helmut SCHAEBEN,Georg SEMMLER. The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression[J]. Front. Earth Sci., 2016, 10(3): 389-408.
 URL:  
https://academic.hep.com.cn/fesci/EN/10.1007/s11707-016-0595-y
https://academic.hep.com.cn/fesci/EN/Y2016/V10/I3/389
Fig.1  Spatial distribution of three indicator predictor variables B 1 , B 2 , and B 3 , and the indicator target variable T of training dataset q. Blancs are used instead of displaying 0.
Fig.2  Spatial distribution of the ground truth according to elementary estimation by counting in the training dataset q (left), spatial distribution of conditional probabilities estimated with general weights of evidence without assuming conditional independence (right). The match is perfect.
B 1 B 2 B 3 T
B 1 1.000 ?0.019 0.000 0.291
B 2 ?0.019 1.000 0.468 0.312
B 3 0.000 0.468 1.000 0.167
T 0.291 0.312 0.167 1.000
Tab.1  Pearson correlation matrix of the training dataset Q
Random variables B i , T p-value
B 1 , T 0.003
B 2 , T 0.001
B 3 , T 0.097
Tab.2  Significance tests of Kendall correlations B i , T , i = 1 , 2 , 3 , for training dataset Q
Random variables B i , B j p-value
B 1 , B 2 0.846
B 1 , B 3 1.000
B 2 , B 3 0.000
Tab.3  Significance tests of Kendall correlations B i , B j , i , j = 1 , 2 , 3 , i < j , for training dataset Q
Statistics χ 2 df P ( > χ 2 )
Likelihood ratio 20.7374 8 0.007
Pearson 23.6850 8 0.002
Tab.4  Significance test of joint conditional independence referring to a log–linear model for training dataset Q loglm(formula=B1*T+B2*T+B3*T,data=xtabs(.,Q))
Statistics χ 2 df P ( > χ 2 )
Likelihood ratio 2.022720 2 0.363
Pearson 1.851326 2 0.396
Tab.5  Significance test of conditional independence of B 1 and B 2 given T loglm ( formula = B 1 * T + B 2 * T , data = xtabs ( . , Q [ , 3 ] ) )
Statistics χ 2 df P ( > χ 2 )
Likelihood ratio 0.5800827 2 0.748
Pearson 0.5531054 2 0.758
Tab.6  Significance test of conditional independence of B 1 and B 3 given T loglm ( formula = B 1 * T + B 3 * T , data = xtabs ( . , Q [ , 2 ] ) )
Statistics χ 2 df P ( > χ 2 )
Likelihood ratio 18.30563 2 0.000
Pearson 19.54423 2 0.000
Tab.7  Significance test of conditional independence of B 2 and B 3 given T loglm ( formula = B 2 * T + B 3 * T , data = xtabs ( . , Q [ , 1 ] ) )
B 1 B 2 B 3  
W ( 1 ) 1.008 1.099 0.811
W ( 0 ) ?0.909 ?0.938 ?0.315 ?2.162
C 1.917 2.037 1.126
Tab.8  Numerical results of Wof3E applied to training dataset q despite violation of joint conditional independence
Fig.3  Ground truth (left) and numerical results of weights-of-evidence with three predictors B i , i = 1 , 2 , 3 , despite violation of joint conditional independence (center), and two predictors B i , i = 1 , 2 (right) applied to training dataset Q.
B 1 B 2 B 3  
  boost 123 W ( 1 ) 0.008 1.241 0.018
  boost 123 W ( 0 ) ?0.909 ?0.951 ?0.016 ?1.876
  boost 123 C 0.917 2.192 0.034
Tab.9  Numerical results of Boost123WofE ( B 1 , B 2 , B 3 ) formally applied to q taken from ( Cheng, 2015, Eqs. (41), (45), (48), pp. 609–611). Boosted weights and contrast with respect to B 1 agree with the conventional ones
Fig.4  Ground truth (left) and numerical results of Boost123WofE (right) formally applied to training dataset Q.
B 2 B 1 B 3  
  Boost 213 W ( 1 ) 0.098 1.160 0.104
  Boost 213 W ( 0 ) ?0.938 ?0.928 ?0.036 ?1.903
  Boost 213 C 0.036 2.088 0.140
Tab.10  Numerical results of Boost213WofE applied to training dataset Q. Boosted weights and contrast with respect to B 2 agree with the conventional ones
B 3 B 1 B 2  
  Boost 312 W ( 1 ) 0.810 1.129 0.838
  Boost 312 W ( 0 ) ?0.315 ?0.972 ?0.960 ?2.247
  Boost 312 C 0.125 2.101 1.798
Tab.11  Numerical results of Boost312WofE applied to training dataset q. Boosted weights and contrast with respect to B 3 agree with the conventional ones
Fig.5  Numerical result of BoostWofE as suggested by Cheng (2015) applied to training dataset Q comprising three predictors B i , i = 1 , 2 , 3 , (top) for all permutations of { 1 , 2 , 3 } . B 2 preceding B 3 : Boost123WofE, Boost213WofE, and Boost231WofE (center); B 3 preceding B 2 : Boost132WofE, Boost312WofE, and Boost321WofE (bottom).
Estimate Std. Error z value Pr ( > | z | )
(Intercept) ?4.402 0.882 ?4.990 0.000
B 1 2.299 0.821 2.800 0.005
B 2 2.407 0.820 2.930 0.003
Tab.12  Numerical results of two-terms logistic regression model applied to Q (AIC= 53.25)
Estimate Std. Error z value Pr ( > | z | )
(Intercept) ?4.421 0.889 ?4.970 0.000
B 1 2.301 0.822 2.800 0.005
B 2 2.330 0.896 2.600 0.009
B 3 0.187 0.890 0.210 0.833
Tab.13  Numerical results of three-term logistic regression model applied to Q (AIC= 55.206)
Estimate Std. Error z value Pr ( > | z | )
(Intercept) ?3.806 1.011 ?3.770 0.000
B 1 1.609 1.256 1.280 0.200
B 2 1.609 1.460 1.100 0.270
B 3 ?14.759 3261.319 ?0.000 0.996
B 1 : B 2 0.587 1.920 0.310 0.759
B 1 : B 3 ?1.609 5648.770 ?0.000 0.999
B 2 : B 3 14.759 3261.319 0.000 0.996
B 1 : B 2 : B 3 2.708 5648.771 0.000 0.999
Tab.14  Numerical results of full logistic regression model applied to Q (AIC= 61.686)
Fig.6  Comparing the ground truth (top left) and numerical results of logistic regression models with two significant terms (top right), with three terms (bottom left), and with all interaction terms (bottom right) applied to training dataset Q.
Fig.7  Estimated conditional probabilities (top row) and estimation errors (bottom row) of significant two terms logistic regression model (left), three terms logistic regression model (center), and full logistic regression model with all interaction terms (right) applied to training dataset Q.
Predictors P ^ ( T = 1 | B )
B 1 B 2 B 3 Counting Wof3E Wof2E 2term lrM significant WofE (Cheng, 2015) BoostWoE (Cheng, 2015) Boost123WofE Boost213WofE Boost312WofE
1 1 1 0.750 0.672 0.477 0.575 0.69 0.52 0.519 0.541 0.641
1 1 0 0.500 0.399 0.477 0.575 0.35 0.51 0.510 0.506 0.367
0 1 1 0.100 0.232 0.118 0.119 0.20 0.14 0.137 0.127 0.179
1 0 0 0.100 0.079 0.106 0.108 0.12 0.10 0.104 0.118 0.087
0 1 0 0.100 0.089 0.118 0.119 0.07 0.13 0.132 0.112 0.066
0 0 0 0.021 0.012 0.017 0.012 0.02 0.02 0.016 0.016 0.011
1 0 1 0.000 0.211 0.106 0.108 0.30 0.11 0.107 0.133 0.228
0 0 1 0.000 0.037 0.017 0.012 0.06 0.02 0.017 0.018 0.034
Tab.15  Comparison of predicted conditional probabilities for various methods comprising the ground truth given in terms of conditional frequencies by counting (first column), numerical results of weights-of-evidence using all three predictors (second column), or the two predictors B1 and B2 only (third column), numerical results of the best significant logistic regression model using the same two predictors B1 and B2 (fourth column), original figures of weights-of-evidence from (Cheng, 2015) (sixth column), numerical results with Boost123WofE (seventh column), Boost213WofE (eighth column), (Boost312WofE (ninth column)
Fig.8  Spatial distribution of two indicator predictor variables B 1 , B 2 and the indicator target variable T of the training dataset rankit.
Fig.9  Spatial distribution of predicted conditional probabilities P ^ ( T = 1 | B 1 B 2 ) for the training dataset rankit according to: elementary estimation by counting referred to as ground truth (top?left); logistic regression with interaction term (top?right); weights-of-evidence (middle?left); logistic regression without interaction (middle?right); boost12 weights-of-evidence (bottom?left); boost21 weights-of-evidence (bottom?right).
Fig.1  Fig.A1 The logarithm function is mistaken for a linear function (Polyakova and Journel, 2007, Mathematical Geology 39, p. 723).
1 Agterberg F P (2014). Geomathematics: Theoretical Foundations, Applications and Future Developments. Cham, Heidelberg, New York, Dordrecht, London: Springer
2 Agterberg F P, Bonham-Carter G F, Wright D F (1990). Statistical pattern integration for mineral exploration. In: Gaál G, Merriam D F, eds. Computer Applications in Resource Estimation Prediction and Assessment for Metals and Petroleum. Oxford, New York: Pergamon Press, 1–21
3 Agterberg F P, Cheng Q (2002). Conditional independence test for weights-of-evidence modeling. Nat Resour Res, 11(4): 249–255
https://doi.org/10.1023/A:1021193827501
4 Berkson J (1944). Application of the logistic function to bio-assay. J Am Stat Assoc, 39(227): 357–365
5 Bonham-Carter G (1994). Geographic Information Systems for Geoscientists: Modeling with GIS. New York: Pergamon, Elsevier Science
6 Butz C J, Sanscartier M J (2002). Properties of weak conditional independence. In: Alpigini J J, Peters J F, Skowron A, Zhong N, eds. Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science (Volume 2475). Berlin, Heidelberg: Springer, 349–356
7 Chalak K, White H (2012). Causality, conditional independence, and graphical separation in settable systems. Neural Comput, 24(7): 1611–1668
https://doi.org/10.1162/NECO_a_00295
8 Cheng Q (2012). Application of a newly developed boost weights of evidence model (BoostWofE) for mineral resources quantitative assessments. Journal of Jilin University, Earth Sci Ed, 42(6): 1976–1985
9 Cheng Q (2015). BoostWofE: a new sequential weights of evidence model reducing the effect of conditional dependency. Math Geosci, 47(5): 591–621
https://doi.org/10.1007/s11004-014-9578-2
10 Chilès J P, Delfiner P (2012). Geostatistics- Modeling Spatial Uncertainty (2nd ed). New York, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley & Sons
12 Dawid A P (1979). Conditional independence in statistical theory. J R Stat Soc, B, 41(1): 1–31
13 Dawid A P (2004). Probability, causality and the empirical world: a Bayes-de Finetti-Popper-Borel synthesis. Stat Sci, 19(1): 44–57
https://doi.org/10.1214/088342304000000125
14 Dawid A P (2007). Fundamentals of Statistical Causality. Research Report 279, Department of Statistical Science, University College London ESRI, ArcGIS.
15 Ford A, Miller J M, Mol A G (2016). A comparative analysis of weights of evidence, evidential belief functions, and fuzzy logic for mineral potential mapping using incomplete data at the scale of investigation. Nat Resour Res, 25(1): 19–33
https://doi.org/10.1007/s11053-015-9263-2
16 Freund Y, Schapire R E (1997). A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 55(1): 119–139
https://doi.org/10.1006/jcss.1997.1504
17 Freund Y, Schapire R E (1999). A short introduction to boosting. Jinko Chino Gakkaishi, 14(5): 771–780
18 Friedman J, Hastie T, Tibshirani R (2000). Additive logistic regression: a statistical view of boosting. Ann Stat, 28(2): 337–407
https://doi.org/10.1214/aos/1016218223
19 Good I J (1950). Probability and the Weighing of Evidence. London: Griffin
20 Good I J (1960). Weight of evidence, corroboration, explanatory power, information and the utility of experiments. J R Stat Soc, B, 22(2): 319–331
21 Good I J (1968). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Research Monograph No. 30, The MIT Press, Cambridge, MA, 109
22 Harris D P, Pan G C (1999). Mineral favorability mapping: a comparison of artificial neural networks, logistic regression and discriminant analysis. Nat Resour Res, 8(2): 93–109
https://doi.org/10.1023/A:1021886501912
23 Harris D P, Zurcher L, Stanley M, Marlow J, Pan G (2003). A comparative analysis of favorability mappings by weights of evidence, probabilistic neural networks, discriminant analysis, and logistic regression. Nat Resour Res, 12(4): 241–255
https://doi.org/10.1023/B:NARR.0000007804.27450.e8
24 Hastie T, Tibshirani R, Friedman J (2009). The Elements of Statistical Learning (2nd ed). New York: Springer
25 Hosmer D W, Lemeshow S, Sturdivant R X (2013). Applied Logistic Regression (3rd ed). Hoboken, NJ: Wiley & Sons
26 Journel A G (2002). Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math Geol, 34(5): 573–596
https://doi.org/10.1023/A:1016047012594
27 Kreuzer O, Porwal A, eds. (2010). Special Issue “Mineral Prospectivity Analysis and Quantitative Resource Estimation”. Ore Geol Rev, 38(3): 121–304
https://doi.org/10.1016/j.oregeorev.2010.06.002
28 Krishnan S (2008). The τ-model for data redundancy and information combination in Earth sciences: theory and application. Math Geol, 40(6): 705–727
29 Minsky M, Selfridge O G (1961). Learning in random nets. In: Cherry C, ed. 4th London Symposium on Information Theory. London: Butterworths, 335–347
30 Pearl J (2009). Causality: Models, Reasoning, and Inference. 2nd ed.New York: Cambridge University Press
31 Polyakova E I, Journel A G (2007). The ν. Math Geol, 39(8): 715–733
https://doi.org/10.1007/s11004-007-9117-5
32 Porwal A, Carranza E J M (2015). Introduction to the Special Issue: GIS-based mineral potential modelling and geological data analyses for mineral exploration. Ore Geol Rev, 71: 477–483
https://doi.org/10.1016/j.oregeorev.2015.04.017
33 Porwal A, González-Álvarez I, Markwitz V, McCuaig T C, Mamuse A (2010). Weights of evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia. Ore Geol Rev, 38(3): 184–196
https://doi.org/10.1016/j.oregeorev.2010.04.002
34 Reed L J, Berkson J (1929). The application of the logistic function to experimental data. J Phys Chem, 33(5): 760–779
https://doi.org/10.1021/j150299a014
35 Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015). Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev, 71: 804–818
https://doi.org/10.1016/j.oregeorev.2015.01.001
36 Schaeben H (2014a). Targeting: logistic regression, special cases and extensions. ISPRS Int J Geoinf, 3(4): 1387–1411.
https://doi.org/10.3390/ijgi3041387
37 Schaeben H (2014b). Potential modeling: conditional independence matters. GEM-International Journal on Geomathematics, 5(1): 99–116
https://doi.org/10.1007/s13137-014-0059-z
38 Schaeben H (2014c). A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of Markov random fields. Math Geosci, 46(6): 691–709
https://doi.org/10.1007/s11004-013-9513-y
39 Šochman J, Matas J (2004). Adaboost with totally corrective updates for fast face detection. In: Proc. 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, 445–450
40 Suppes P (1970). A Probabilistic Theory of Causality. Amsterdam: North-Holland
41 Tolosana-Delgado R, van den Boogaart K G, Schaeben H (2014). Potential mapping from geochemical surveys using a Cox process. 10th Conference on Geostatistics for Environmental Applications, Paris, July 9–11, 2014
42 van den Boogaart K G, Schaeben H (2012). Mineral potential mapping using Cox–type regression for marked point processes. 34th IGC Brisbane, Australia
43 Wong M S K M , Butz C J (1999). Contextual weak independence in Bayesian networks. In: Proc. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 670–679
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed