Spatial prediction of soil contamination based on machine learning: a review

doi:10.1007/s11783-023-1693-1

Front. Environ. Sci. Eng.

2023, Vol. 17

Issue (8) : 93 https://doi.org/10.1007/s11783-023-1693-1

REVIEW ARTICLE

Spatial prediction of soil contamination based on machine learning: a review

Yang Zhang^1,², Mei Lei^1,²(

), Kai Li¹, Tienan Ju¹

¹. Institute of Geographic Sciences & Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
². Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100190, China

Download: PDF(4098 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

● A review of machine learning (ML) for spatial prediction of soil contamination.

● ML have achieved significant breakthroughs for soil contamination prediction.

● A structured guideline for using ML in soil contamination is proposed.

● The guideline includes variable selection, model evaluation, and interpretation.

Soil pollution levels can be quantified via sampling and experimental analysis; however, sampling is performed at discrete points with long distances owing to limited funding and human resources, and is insufficient to characterize the entire study area. Spatial prediction is required to comprehensively investigate potentially contaminated areas. Consequently, machine learning models that can simulate complex nonlinear relationships between a variety of environmental conditions and soil contamination have recently become popular tools for predicting soil pollution. The characteristics, advantages, and applications of machine learning models used to predict soil pollution are reviewed in this study. Satisfactory model performance generally requires the following: 1) selection of the most appropriate model with the required structure; 2) selection of appropriate independent variables related to pollutant sources and pathways to improve model interpretability; 3) improvement of model reliability through comprehensive model evaluation; and 4) integration of geostatistics with the machine learning model. With the enrichment of environmental data and development of algorithms, machine learning will become a powerful tool for predicting the spatial distribution and identifying sources of soil contamination in the future.

Keywords Soil contamination Machine learning Prediction Spatial distribution

Corresponding Author(s): Mei Lei

Issue Date: 16 February 2023

Cite this article:

Yang Zhang,Mei Lei,Kai Li, et al. Spatial prediction of soil contamination based on machine learning: a review[J]. Front. Environ. Sci. Eng., 2023, 17(8): 93.

URL:

https://academic.hep.com.cn/fese/EN/10.1007/s11783-023-1693-1
https://academic.hep.com.cn/fese/EN/Y2023/V17/I8/93

Fig.1 Schematic diagram of the methodology used for selecting the papers reviewed.

Fig.2 Keywords clustering analysis of papers related to soil pollution prediction.

Fig.3 Frequency of pollutants investigated (histogram) and related papers published per year in Web of Science and Scopus (line). OP denotes organic pollutants; PCBs, PCDD/Fs, TPAH, NAP, FLE, PHE, and BaP are targeted organic pollutants in ML model with no distinct categories. They denote polychlorinated biphenyls, polychlorinated dibenzo-p-dioxin and dibenzofurans, total polyaromatic hydrocarbons, naphthalene, fluorene, phenanthrene, and benzo[a]pyrene, respectively.

Fig.4 Five main steps needed in the spatial prediction of soil contamination using ML.

Fig.5 (a) Diagram and (b) frequencies of independent variables used in ML models.

ML model	Advantages	Disadvantages
ANN: MLP (Kanevski et al., 2003; Li et al., 2011; Zhou et al., 2015; Bonelli et al., 2017; Sakizadeh et al., 2017; Tarasov et al., 2018a; Baglaeva et al., 2018; Jia et al., 2017; Shichkin et al., 2018; Tarasov et al., 2018b; Tarasov et al., 2018c; Bazoobandi et al., 2022; Jia et al., 2019; Sergeev et al., 2019; Tao et al., 2019; Zhang et al., 2020; Baglaeva et al., 2021; Bhagat et al., 2021a; Bhagat et al., 2021b; Droz et al., 2021; Duong et al., 2021; Jia et al., 2021; Shao et al., 2021; Wang et al., 2021a)RBFNN (Cao and Zhang, 2020,2021)GRNN (Kanevski et al., 2003; Shichkin et al., 2018; Tarasov et al., 2018b; Tarasov et al., 2018c; Sergeev et al., 2019)WNN (Cao and Zhang, 2020)DNN (Ballabio et al., 2021)SeOM-NN (Kebonye et al., 2021)	1. The complex nonlinear relationships between sources and transport-related variables can be approximated with guaranteed data quality, and are therefore less sensitive to missing data.2. Unsusceptible to outliers, the neural network is less disturbed by anomalously high or low data points compared with distance-based models.3. Capable of handling both continuous and categorical contamination levels.4. RBFNN can use local weight adjustment to reduce the impact of non-relevant variables on the prediction (Bishop, 1991).	1. The data requirement for this model is higher compared to other models, and a larger sample size is recommended to avoid overfitting.2. Network structure, number of iterations and other hyperparameters in the training process have large impacts on model performance.3. The output results are poorly interpretable. It is difficult to interpret the impact factors of soil pollution due to black-box of the neural network.4. Self-Organizing Map Neural Networks (SeOM-NN)cannot solve regression problems, but has the potential for source identification.
DT:CART (Zhang et al., 2008; Schwarz et al., 2013; Bou Kheir et al., 2014; Qiu et al., 2016; Ru et al., 2016; Mikkonen et al., 2018a; Mikkonen et al., 2018b; Wang et al., 2021b; Yang et al., 2021a)	1. The interpretability of DT is better than that of neural networks, and the tree structure can be visualized to make intuitive judgments about the factors that related to soil pollution.2. CART can be applied to both soil contamination level classification and contaminant content prediction.3. Capable of giving logical expressions for the causes of soil pollution based on node impurity or revealing the correlation between pollutants and other substances in the soil.	1. It is easy to overfit, and the generalization ability and robustness of CART are weak. Prediction accuracy is difficult to guarantee when predict new areas or new sampling points.2. Sensitive to missing data, the model performance will be affected when part of environmental variables of soil samples are missing.
SVM (Kanevski et al., 2003; Wu et al., 2016; Sakizadeh et al., 2017; Jia et al., 2019; Akinpelu et al., 2020; Cao and Zhang, 2020; Zhang et al., 2020; Bhagat et al., 2021a; Bhagat et al., 2021b; Jia et al., 2021; Wang et al., 2021b; Yang et al., 2021a; Paes et al., 2022)	1. When the sample size is small, SVM is more suitable as a predictive model compared to ANN and EL.2. The complexity of the learning process depends on the number of support vectors and not on the dimensionality of the input data, thus enabling accurate prediction of soil contamination in the presence of multiple influencing factors at the same time.3. Structural risk minimization compared to other model’s empirical risk minimization makes SVM has better robustness, and generalization ability; enabling prediction of soil samples.	1. SVM partitions the data space, making it difficult to perform classification or regression analysis on soil sample with missing features.2. More sensitive to hyperparameters and kernel functions; the optimal parameter set of SVM is unknown for different study areas, different pollutants, and different input variables.3. SVM does not perform well with high dimensional data; for example, sample with many environmental factors.
EL:RF (Schwarz et al., 2013; Wang et al., 2015; Qiu et al., 2016; Fathizad et al., 2020; Huang et al., 2020; Jia et al., 2020; Liu et al., 2020a; Wang et al., 2020; Xiao et al., 2020b; Zhang et al., 2020; Bhagat et al., 2021b; Droz et al., 2021; Huang et al., 2021a; Jia et al., 2021; Li et al., 2021; Shi et al., 2021; Taghizadeh-Mehrjardi et al., 2021; Wang et al., 2021a; Yang et al., 2021a; Yang et al., 2021b; Zhang et al., 2021b; Azizi et al., 2022; Gao et al., 2022; Paes et al., 2022; Yu et al., 2022)SGBT (Wang et al., 2015; Yang et al., 2021a)XGBoost (Bhagat et al., 2021a; Bhagat et al., 2021b; Yang et al., 2021a)ERF (Jia et al., 2021; Yang et al., 2021a)	1. The randomness in RF can effectively avoid overfitting, and the generalization performance and robustness are greatly improved compared with CART.2. Capable of handling both continuous and categorical data.3. The interpretability is stronger than ANN; the feature importance can be ranked by node impurity.4. Capable of simulating nonlinear situations.5. RF models can be considered as an ensemble of decision tree models to estimate the probability distribution of model outputs.	1. The Boost algorithm may continuously improve the weight of outliers during training process, while decreasing the prediction accuracy of the test data.2. The results will be severely biased if the contents of contaminants in test set exceed the range of training data.3. Since multiple learners are connected in parallel or in series to obtain the final output, the explanation of the influencing factors affecting the accumulation of soil contaminants is weakened compared to DT model.
NB (Jia et al., 2019)	1. The classification is efficient and stable, and less sensitive to missing features.2. Suitable for classifying data with small sample size and capable of handling multiple classification tasks.3. Combination with ANN models makes it possible to quantify uncertainty of the results.	1. The model assumes that attributes of the input variables are independent of each other, while the influencing factors of soil pollution may be correlated with each other and therefore are not applicable to directly predict soil pollution.2. Specialized in text processing, not a good choice for spatial prediction.
ANFIS (Bazoobandi et al., 2022)	1. Simultaneously has the interpretability of fuzzy systems and the learning ability of neural networks (Jang, 1993).	1. The simplification of information may deteriorate model performance.
KNN (Yang et al., 2021a; Paes et al., 2022)	1. Simple model structure can cope with both classification and regression problems.	1. KNN needs sample with balanced distribution, the performance will be extremely influenced by biased training data.2. Inappropriate for data with too many features which may cause a curse of dimensionality.3. Not sensitive to outliners, when some data points are extremely high than other points, the predictive results may be biased.
K-means (Jia et al., 2020; Tepanosyan et al., 2020; Cao and Zhang, 2021; Kebonye et al., 2021; Xu et al., 2021)	1. Spatial classification iterations based on features of soil samples are simple and easy to implement.	1. The algorithm is based on spatial distance optimization and is subject to the interference of abnormal pollutant concentrations or abnormal variables.2. If soil samples miss certain influencing factors, the clustering results will change.3. A priori knowledge is needed to determine the number of clusters, but the type of soil pollution sources is hard to determined.

Tab.1 Comparison of machine learning algorithms in soil pollution prediction

Fig.6 Frequency of different ML models used in soil pollution prediction.

Fig.7 Frequency of use for model evaluation metrics in selected papers.

Evaluation Metrics	Formula	Evaluation Metrics	Formula
RMSE	$∑ i = 1 n (y i − x i) 2 n$	AIC	$2 k − 2 l n L (θ l^, x)$
R²	$1 − ∑ i = 1 n (x i − y i) 2 ∑ i = 1 n (x i − x −) 2$	RPIQ	$I Q R M S E$
MAE	$∑ i = 1 n \| y i − x i \| n$	Log-cosh loss	$∑ i = 1 n l o g (c o s h (y i − x i))$
R_s	$1 − 6 ∑ i = 1 n d i 2 n (n 2 − 1)$	Accuracy	$T P + T N T P + F P + F N + T N$
MSE	$∑ i = 1 n (x − x i) 2 n$	Precision	$T P T P + F P$
NSE	$1 − ∑ i = 1 n (y i − x i) 2 ∑ i = 1 n (x i − x −) 2$	Recall	$T P T P + F N$
F1-score	$2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l$	Kappa	$p o − p e 1 − p e$
r	$∑ i = 1 n (x i − x −) (y i − y −) ∑ i = 1 n (x i − x −) 2 ∑ i − 1 n (y i − y −) 2$	AUC-ROC	$A r e a u n d e r R O C c u r v e$
md	$1 − ∑ i = 1 n \| x i − y i \| ∑ i = 1 n (\| y i − x − \| + \| x i − x − \|)$	Geometric mean	$R e c a l l ∗ T N T N + F P$
SMAPE	$1 n ∑ i = 1 n \| y i − x i \| (\| y i \| + \| x i \|) / 2 × 100 %$	MAPE	$1 n ∑ i = 1 n \| y i − x i x i \| × 100 %$
Huber loss	${12 (y i − x i) 2, \| y i − x i \| ≤ δ 2 δ ? (\| y i − x i \| − 12 δ), o t h e r w i s e$

Tab.2 Formulas of model evaluation indices

Fig.8 Number of papers using different interpretation methods. The pie chart demonstrates how many researches interpret ML models by TFI, PFI, PDP, and SHAP.

	Advantages	Disadvantages
TFI	1. TFI is inherent in and suitable for all tree-based models, for instance, RF, DT and XGBoost.2. It demonstrates real effects of covariates on soil pollution simulated by tree-based models.	1.TFI only shows how crucially a pollutant is influenced by a covariate, and cannot reveal whether it is positive or negative.2. The feature importance may be biased due to inappropriate suboptimal predictor variables (Strobl et al., 2007).
PFI	1. The importance of a feature is determined by measuring the increase in model error when the feature is disturbed, and applicable for most ML models.2. The metrics of importance values for different features are uniform and can be ranked for comparison.	1. Both labels and features are indispensable for performing PFI.2.The PFI results may be biased due to collinearities among features.
PDP	1. Applicable to all ML models.2. By changing one or two variables, PDP cam display the relationships between soil pollution and environmental conditions in a straightforward way.	1. Since the expression is limited to one and two-dimensional space, only two factors affecting soil contamination can be selected for interpretation at one time.2. The independence assumption is a precondition, two features that do not have synergistic effects on soil pollution are preferred.
SHAP	1. The contribution of each influencing factor is calculated based on cooperative game theory with strong statistical basement.2. Can explain ML models both locally and globally.	1. Since the overall SHAP feature importance ranking relies on each soil sampling unit and influencing factor, the Shapley values are computationally intensive and time-consuming.2. Correlations between influencing factors are ignored in the ranking.
LIME	1. The use of local agent models improves the local interpretability of ML model.2. Also applicable to tabular, textual and image data, data from different sources are transformed into tabular data for modeling in soil contamination prediction.	1. Have difficulty in correctly defining the fitted neighborhoods because the soil contamination prediction data is tabular.2. Adjacent points in soil contamination may respond differently to environmental variables, which may cause ineffectiveness of local proxy model.

Tab.3 Comparison of model interpretation methods in machine learning

Tab.4 Comparison of geostatistical models, ML, and hybrids models

1	M Abdar, F Pourpanah, S Hussain, D Rezazadegan, L Liu, M Ghavamzadeh, P Fieguth, X Cao, A Khosravi, U R Acharya. et al.. (2021). A review of uncertainty quantification in deep learning: techniques, applications and challenges. Information Fusion, 76: 243–297 https://doi.org/10.1016/j.inffus.2021.05.008
2	N Adimalla, H Qian, M J Nandan, A S Hursthouse. (2020). Potentially toxic elements (PTEs) pollution in surface soils in a typical urban region of south India: an application of health risk assessment and distribution pattern. Ecotoxicology and Environmental Safety, 203: 111055 https://doi.org/10.1016/j.ecoenv.2020.111055
3	K Adnan, R Akbar. (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1): 91 https://doi.org/10.1186/s40537-019-0254-8
4	H Akaike. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): 716–723 https://doi.org/10.1109/TAC.1974.1100705
5	A A Akinpelu, M E Ali, T O Owolabi, M R Johan, R Saidur, S O Olatunji, Z Chowdbury. (2020). A support vector regression model for the prediction of total polyaromatic hydrocarbons in soil: an artificial intelligent system for mapping environmental pollution. Neural Computing & Applications, 32(18): 14899–14908 https://doi.org/10.1007/s00521-020-04845-3
6	K Azizi, S Ayoubi, K Nabiollahi, Y Garosi, R Gislum. (2022). Predicting heavy metal contents by applying machine learning approaches and environmental covariates in west of Iran. Journal of Geochemical Exploration, 233: 106921 https://doi.org/10.1016/j.gexplo.2021.106921
7	E Baglaeva, A Buevich, A Sergeev, A Shichkin, I Subbotina (2018). Recognition of chromium distribution features in different urban soils by multilayer perceptron. In: International Conference of Computational Methods in Sciences and Engineering (ICCMSE), Thessaloniki. Maryland: AMER INST Physics2040: 050008 https://doi.org/10.1063/1.5079106
8	E M Baglaeva, A P Sergeev, A V Shichkin, A G Buevich. (2021). The extraction of the training subset for the spatial distribution modelling of the heavy metals in topsoil. Catena, 207: 105699 https://doi.org/10.1016/j.catena.2021.105699
9	C Ballabio, M Jiskra, S Osterwalder, P Borrelli, L Montanarella, P Panagos. (2021). A spatial assessment of mercury content in the European Union topsoil. Science of the Total Environment, 769: 144755 https://doi.org/10.1016/j.scitotenv.2020.144755
10	A Bazoobandi, S Emamgholizadeh, H Ghorbani. (2022). Estimating the amount of cadmium and lead in the polluted soil using artificial intelligence models. European Journal of Environmental and Civil Engineering, 26(3): 933–951 https://doi.org/10.1080/19648189.2019.1686429
11	V Bellon-Maurel, E Fernandez-Ahumada, B Palagos, J M Roger, A Mcbratney. (2010). Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. Trends in Analytical Chemistry, 29(9): 1073–1081 https://doi.org/10.1016/j.trac.2010.05.006
12	S K Bhagat, T Tiyasha, S M Awadh, T M Tung, A H Jawad, Z M Yaseen. (2021a). Prediction of sediment heavy metal at the Australian Bays using newly developed hybrid artificial intelligence models. Environmental Pollution, 268: 115663 https://doi.org/10.1016/j.envpol.2020.115663
13	S K Bhagat, T M Tung, Z M Yaseen. (2021b). Heavy metal contamination prediction using ensemble model: case study of bay sedimentation, Australia. Journal of Hazardous Materials, 403: 123492 https://doi.org/10.1016/j.jhazmat.2020.123492
14	C Bishop. (1991). Improving the generalization properties of radial basis function neural networks. Neural Computation, 3(4): 579–588 https://doi.org/10.1162/neco.1991.3.4.579
15	M G Bonelli, M Ferrini, A Manni. (2017). Artificial neural networks to evaluate organic and inorganic contamination in agricultural soils. Chemosphere, 186: 124–131 https://doi.org/10.1016/j.chemosphere.2017.07.116
16	A D Gordon, L Breiman, J H Friedman, R A Olshen, C J Stone. (1984). Classification and Regression Trees. Biometrics, 40(3): 874 https://doi.org/10.2307/2530946
17	D Broomhead, D Lowe. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2: 321–355
18	C Cai, J Li, D Wu, X Wang, D C W Tsang, X Li, J Sun, L Zhu, H Shen, S Tao, W Liu. (2017). Spatial distribution, emission source and health risk of parent PAHs and derivatives in surface soils from the Yangtze River Delta, eastern China. Chemosphere, 178: 301–308 https://doi.org/10.1016/j.chemosphere.2017.03.057
19	W Cao, C Zhang. (2020). A collaborative compound neural network model for soil heavy metal content prediction. IEEE Access: Practical Innovations, Open Solutions, 8: 129497–129509 https://doi.org/10.1109/ACCESS.2020.3009248
20	W Cao, C Zhang. (2021). Data prediction of soil heavy metal content by deep composite model. Journal of Soils and Sediments, 21(1): 487–498 https://doi.org/10.1007/s11368-020-02793-y
21	F Chen, Q Zhang, J Ma, Q Zhu, Y Wang, H Liang. (2021). Effective remediation of organic-metal co-contaminated soil by enhanced electrokinetic-bioremediation process. Frontiers of Environmental Science & Engineering, 15(6): 113 https://doi.org/10.1007/s11783-021-1401-y
22	T Chen, C Guestrin. (2016). XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Fransisco. New York: Association for Computing Machinery, 785–794 https://doi.org/10.1145/2939672.2939785
23	T M Cover, P E Hart. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1): 21–27 https://doi.org/10.1109/TIT.1967.1053964
24	’M D, M Macchiato, M Ragosta, T Simoniello (2012). A method for the integration of satellite vegetation activities observations and magnetic susceptibility measurements for monitoring heavy metals in soil. Journal of Hazardous Materials, 241–242: 118–126 https://doi.org/10.1016/j.jhazmat.2012.09.021
25	B Droz, S Payraudeau, Martín J A Rodríguez, G Tóth, P Panagos, L Montanarella, P Borrelli, G Imfeld. (2021). Copper content and export in European vineyard soils influenced by climate and soil properties. Environmental Science & Technology, 55(11): 7327–7334 https://doi.org/10.1021/acs.est.0c02093
26	V H Duong, H B Ly, D H Trinh, T S Nguyen, B T Pham. (2021). Development of Artificial Neural Network for prediction of radon dispersion released from Sinquyen Mine, Vietnam. Environmental Pollution, 282: 116973 https://doi.org/10.1016/j.envpol.2021.116973
27	H Fathizad, M A H Ardakani, B Heung, H Sodaiezadeh, A Rahmani, A Fathabadi, T Scholten, R Taghizadeh-Mehrjardi. (2020). Spatio-temporal dynamic of soil quality in the central Iranian desert modeled with machine learning and digital soil assessment techniques. Ecological Indicators, 118: 106736 https://doi.org/10.1016/j.ecolind.2020.106736
28	X Fei, G Christakos, R Xiao, Z Ren, Y Liu, X Lv. (2019a). Improved heavy metal mapping and pollution source apportionment in Shanghai City soils using auxiliary information. Science of the Total Environment, 661: 168–177 https://doi.org/10.1016/j.scitotenv.2019.01.149
29	X Fei, R Xiao, G Christakos, A Langousis, Z Ren, Y Tian, X Lv. (2019b). Comprehensive assessment and source apportionment of heavy metals in Shanghai agricultural soils with different fertility levels. Ecological Indicators, 106: 105508 https://doi.org/10.1016/j.ecolind.2019.105508
30	J H Friedman. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4): 367–378 https://doi.org/10.1016/S0167-9473(01)00065-2
31	B Gao, A Stein, J Wang. (2022). A two-point machine learning method for the spatial prediction of soil pollution. International Journal of Applied Earth Observation and Geoinformation, 108: 102742 https://doi.org/10.1016/j.jag.2022.102742
32	H Huang, Y Zhou, Y Liu, K Li, L Xiao, M Li, Y Tian, F Wu. (2020). Assessment of anthropogenic sources of potentially toxic elements in soil from arable land using multivariate statistical analysis and random forest analysis. Sustainability (Basel), 12(20): 8538 https://doi.org/10.3390/su12208538
33	H Huang, Y Zhou, Y J Liu, L Xiao, K Li, M Y Li, Y Tian, F Wu. (2021a). Source apportionment and ecological risk assessment of potentially toxic elements in cultivated soils of Xiangzhou, China: a combined approach of geographic information system and random forest. Sustainability (Basel), 13(3): 1214 https://doi.org/10.3390/su13031214
34	S Huang, L Xiao, Y Zhang, L Wang, L Tang. (2021b). Interactive effects of natural and anthropogenic factors on heterogenetic accumulations of heavy metals in surface soils through geodetector analysis. Science of the Total Environment, 789: 147937 https://doi.org/10.1016/j.scitotenv.2021.147937
35	E Hüllermeier, W Waegeman. (2021). Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Machine Learning, 110(3): 457–506 https://doi.org/10.1007/s10994-021-05946-3
36	J S R Jang. (1993). ANFIS - adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23(3): 665–685 https://doi.org/10.1109/21.256541
37	X Jia, Y Cao, D O’connor, J Zhu, D C W Tsang, B Zou, D Hou. (2021). Mapping soil pollution by using drone image recognition and machine learning at an arsenic-contaminated agricultural field. Environmental Pollution, 270: 116281 https://doi.org/10.1016/j.envpol.2020.116281
38	X Jia, T Fu, B Hu, Z Shi, L Zhou, Y Zhu. (2020). Identification of the potential risk areas for soil heavy metal pollution based on the source-sink theory. Journal of Hazardous Materials, 393: 122424 https://doi.org/10.1016/j.jhazmat.2020.122424
39	X Jia, B Hu, B P Marchant, L Zhou, Z Shi, Y Zhu. (2019). A methodological framework for identifying potential sources of soil heavy metal pollution based on machine learning: a case study in the Yangtze Delta, China. Environmental Pollution, 250: 601–609 https://doi.org/10.1016/j.envpol.2019.04.047
40	Z Jia, S Zhou, Q Su, H Yi, J Wang. (2017). Comparison study on the estimation of the spatial distribution of regional soil metal(loid)s pollution based on kriging interpolation and BP neural network. International Journal of Environmental Research and Public Health, 15(1): 34 https://doi.org/10.3390/ijerph15010034
41	M I Jordan, T M Mitchell. (2015). Machine learning: trends, perspectives, and prospects. Science, 349(6245): 255–260 https://doi.org/10.1126/science.aaa8415
42	M Kanevski, V Demyanov, A Pozdnukhov, R Parkin, M Maignan (2003). Advanced geostatistical and machine-learning models for spatial data analysis of radioactively contaminated regions. Environmental Science and Pollution Research International, (Special Issue): 137–149
43	N M Kebonye, P N Eze, K John, A Gholizadeh, J Dajčl, O Drábek, K Němeček, L Borůvka. (2021). Self-organizing map artificial neural networks and sequential Gaussian simulation technique for mapping potentially toxic element hotspots in polluted mining soils. Journal of Geochemical Exploration, 222: 106680 https://doi.org/10.1016/j.gexplo.2020.106680
44	R Bou Kheir, B Shomar, M B Greve, M H Greve. (2014). On the quantitative relationships between environmental parameters and heavy metals pollution in Mediterranean soils using GIS regression-trees: the case study of Lebanon. Journal of Geochemical Exploration, 147: 250–259 https://doi.org/10.1016/j.gexplo.2014.05.015
45	S B Kim, K S Han, H C Rim, S H Myaeng. (2006). Some effective techniques for naive Bayes text classification. IEEE Transactions on Knowledge and Data Engineering, 18(11): 1457–1466 https://doi.org/10.1109/TKDE.2006.180
46	J Li, A D Heap. (2014). Spatial interpolation methods applied in the environmental sciences: a review. Environmental Modelling & Software, 53: 173–189 https://doi.org/10.1016/j.envsoft.2013.12.008
47	X Li, T Geng, W Shen, J Zhang, Y Zhou. (2021). Quantifying the influencing factors and multi-factor interactions affecting cadmium accumulation in limestone-derived agricultural soil using random forest (RF) approach. Ecotoxicology and Environmental Safety, 209: 111773 https://doi.org/10.1016/j.ecoenv.2020.111773
48	Y Li, C Li, J Tao, L Wang. (2011). Study on spatial distribution of soil heavy metals in Huizhou City based on BP-ANN modeling and GIS. Procedia Environmental Sciences, 10, 1953–1960 https://doi.org/10.1016/j.proenv.2011.09.306
49	G Liu, X Zhou, Q Li, Y Shi, G Guo, L Zhao, J Wang, Y Su, C Zhang. (2020a). Spatial distribution prediction of soil As in a large-scale arsenic slag contaminated site based on an integrated model and multi-source environmental data. Environmental Pollution, 267: 115631 https://doi.org/10.1016/j.envpol.2020.115631
50	H Liu, S Yin, C Chen, Z Duan. (2020b). Data multi-scale decomposition strategies for air pollution forecasting: a comprehensive review. Journal of Cleaner Production, 277: 124023 https://doi.org/10.1016/j.jclepro.2020.124023
51	S M Lundberg, S I Lee. (2017). A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach. New York: Curran Associates Inc, 4768–4777 https://doi.org/10.5555/3295222.3295230
52	R H McCuen, Z Knight, A G Cutter. (2006). Evaluation of the Nash-Sutcliffe efficiency index. Journal of Hydrologic Engineering, 11(6): 597–602 https://doi.org/10.1061/(ASCE)1084-0699(2006)11:6(597
53	H G Mikkonen, R Van De Graaff, B O Clarke, R Dasika, C J Wallis, S M Reichman. (2018a). Geochemical indices and regression tree models for estimation of ambient background concentrations of copper, chromium, nickel and zinc in soil. Chemosphere, 210: 193–203 https://doi.org/10.1016/j.chemosphere.2018.06.138
54	H G Mikkonen, R Van De Graaff, A T Mikkonen, B O Clarke, R Dasika, C J Wallis, S M Reichman. (2018b). Environmental and anthropogenic influences on ambient background concentrations of fluoride in soil. Environmental Pollution, 242: 1838–1849 https://doi.org/10.1016/j.envpol.2018.07.083
55	J E Nash, J V Sutcliffe. (1970). River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology (Amsterdam), 10(3): 282–290 https://doi.org/10.1016/0022-1694(70)90255-6
56	J Padarian, B Minasny, A B Mcbratney. (2020). Machine learning and soil sciences: a review aided by machine learning tools. Soil (Göttingen), 6(1): 35–52 https://doi.org/10.5194/soil-6-35-2020
57	É D C Paes, G V Veloso, A A Fonseca, E I Fernandes-Filho, M P F Fontes, E M B Soares. (2022). Predictive modeling of contents of potentially toxic elements using morphometric data, proximal sensing, and chemical and physical properties of soils under mining influence. Science of the Total Environment, 817: 152972 https://doi.org/10.1016/j.scitotenv.2022.152972
58	G Qin, Z Niu, J Yu, Z Li, J Ma, P Xiang. (2021). Soil heavy metal pollution and food safety in China: effects, sources and removing technology. Chemosphere, 267: 129205 https://doi.org/10.1016/j.chemosphere.2020.129205
59	L Qiu, K Wang, W Long, K Wang, W Hu, G S Amable. (2016). A comparative assessment of the influences of human impacts on soil cd concentrations based on stepwise linear regression, classification and regression tree, and random forest models. PLoS One, 11(3): e0151131 https://doi.org/10.1371/journal.pone.0151131
60	X Ren, G Zeng, L Tang, J Wang, J Wan, Y Liu, J Yu, H Yi, S Ye, R Deng. (2018). Sorption, transport and biodegradation: an insight into bioavailability of persistent organic pollutants in soil. Science of the Total Environment, 610-611: 1154–1163 https://doi.org/10.1016/j.scitotenv.2017.08.089
61	M Riedmiller. (1994). Advanced supervised learning in multilayer perceptrons: from backpropagation to adaptive learning algorithms. Computer Standards & Interfaces, 16(3): 265–278 https://doi.org/10.1016/0920-5489(94)90017-5
62	D G Rossiter. (2018). Past, present & future of information technology in pedometrics. Geoderma, 324: 131–137 https://doi.org/10.1016/j.geoderma.2018.03.009
63	F Ru, A Yin, J Jin, X Zhang, X Yang, M Zhang, C Gao. (2016). Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree. Estuarine, Coastal and Shelf Science, 177: 1–7 https://doi.org/10.1016/j.ecss.2016.04.018
64	M Sakizadeh, R Mirzaei, H Ghorbani. (2017). Support vector machine and artificial neural network to model soil pollution: a case study in Semnan Province, Iran. Neural Computing & Applications, 28(11): 3229–3238 https://doi.org/10.1007/s00521-016-2231-x
65	K Schwarz, K C Weathers, S T A Pickett, R G Jr Lathrop, R V Pouyat, M L Cadenasso. (2013). A comparison of three empirically based, spatially explicit predictive models of residential soil Pb concentrations in Baltimore, Maryland, USA: Understanding the variability within cities. Environmental Geochemistry and Health, 35(4): 495–510 https://doi.org/10.1007/s10653-013-9510-6
66	A P Sergeev, A G Buevich, E M Baglaeva, A V Shichkin. (2019). Combining spatial autocorrelation with machine learning increases prediction accuracy of soil heavy metals. Catena, 174: 425–435 https://doi.org/10.1016/j.catena.2018.11.037
67	W Shao, Q Guan, Z Tan, H Luo, H Li, Y Sun, Y Ma. (2021). Application of BP-ANN model in evaluation of soil quality in the arid area, northwest China. Soil & Tillage Research, 208: 104907 https://doi.org/10.1016/j.still.2020.104907
68	T Shi, X Hu, L Guo, F Su, W Tu, Z Hu, H Liu, C Yang, J Wang, J Zhang, G Wu. (2021). Digital mapping of zinc in urban topsoil using multisource geospatial data and random forest. Science of the Total Environment, 792: 148455 https://doi.org/10.1016/j.scitotenv.2021.148455
69	A Shichkin, A Buevich, A Sergeev, E Baglaeva, I Subbotina. (2018). Forecasting of spatial variable by the models based on Artificial Neural Networks on an example of heavy metal content in Topsoil. Thessaloniki. Maryland: American Institute of Physics Inc, 2040: 050007 https://doi.org/10.1063/1.5079105
70	S Singha, S Pasupuleti, S S Singha, R Singh, S Kumar. (2021). Prediction of groundwater quality using efficient machine learning technique. Chemosphere, 276: 130265 https://doi.org/10.1016/j.chemosphere.2021.130265
71	D F Specht. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6): 568–576 https://doi.org/10.1109/72.97934
72	C Strobl, A L Boulesteix, A Zeileis, T Hothorn. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8(1): 1–21 https://doi.org/10.1186/1471-2105-8-25
73	D Svozil, V Kvasnicka, J Pospichal. (1997). Introduction to multi-layer feed-forward neural networks. Chemometrics and Intelligent Laboratory Systems, 39(1): 43–62 https://doi.org/10.1016/S0169-7439(97)00061-0
74	J A Swets. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857): 1285–1293 https://doi.org/10.1126/science.3287615
75	R Taghizadeh-Mehrjardi, H Fathizad, M Ali Hakimzadeh Ardakani, H Sodaiezadeh, R Kerry, B Heung, T Scholten. (2021). Spatio-temporal analysis of heavy metals in arid soils at the catchment scale using digital soil assessment and a random forest model. Remote Sensing (Basel), 13(9): 1698 https://doi.org/10.3390/rs13091698
76	H Tao, X Liao, D Zhao, X Gong, D P Cassidy. (2019). Delineation of soil contaminant plumes at a co-contaminated site using BP neural networks and geostatistics. Geoderma, 354: 113878 https://doi.org/10.1016/j.geoderma.2019.07.036
77	D Tarasov, A Buevich, A Shichkin, I Subbotina, A Tyagunov, E Baglaeva. (2018a). Chromium distribution forecasting using multilayer perceptron Neural Network and Multilayer perceptron residual Kriging. Maryland: American Institute of Physics Inc, 1978, 440019 https://doi.org/10.1063/1.5044048
78	D Tarasov, A Buevich, A Shichkin, J Vasilev. (2018b). Forecasting of chromium distribution in subarctic noyabrsk using generalized regression neural networks and multilayer perceptron. Maryland: American Institute of Physics Inc, 1978, 440024 https://doi.org/10.1063/1.5044053
79	D A Tarasov, A G Buevich, A P Sergeev, A V Shichkin. (2018c). High variation topsoil pollution forecasting in the russian subarctic: using artificial neural networks combined with residual kriging. Applied Geochemistry, 88: 188–197 https://doi.org/10.1016/j.apgeochem.2017.07.007
80	G Tepanosyan, N Maghakyan, L Sahakyan, A Saghatelyan. (2017). Heavy metals pollution levels and children health risk assessment of Yerevan kindergartens soils. Ecotoxicology and Environmental Safety, 142: 257–265 https://doi.org/10.1016/j.ecoenv.2017.04.013
81	G Tepanosyan, L Sahakyan, N Maghakyan, A Saghatelyan. (2020). Combination of compositional data analysis and machine learning approaches to identify sources and geochemical associations of potentially toxic elements in soil and assess the associated human health risk in a mining city. Environmental Pollution, 261: 114210 https://doi.org/10.1016/j.envpol.2020.114210
82	H Wang, Q Yilihamu, M Yuan, H Bai, H Xu, J Wu. (2020). Prediction models of soil heavy metal(loid)s concentration for agricultural land in Dongli: a comparison of regression and random forest. Ecological Indicators, 119: 106801 https://doi.org/10.1016/j.ecolind.2020.106801
83	L Wang, Y Zhou, Q Li, T Xu, Z Wu, J Liu. (2021a). Application of three deep machine-learning algorithms in a construction assessment model of farmland quality at the county scale: case study of Xiangzhou, Hubei Province, China. Agriculture, 11(1): 72 https://doi.org/10.3390/agriculture11010072
84	Q Wang, Z Xie, F Li. (2015). Using ensemble models to identify and apportion heavy metal pollution sources in agricultural soils on a local scale. Environmental Pollution, 206: 227–235 https://doi.org/10.1016/j.envpol.2015.06.040
85	Y Wang, X Wu, S He, R Niu. (2021b). Eco-environmental assessment model of the mining area in Gongyi, China. Scientific Reports, 11(1): 17549 https://doi.org/10.1038/s41598-021-96625-9
86	J Wu, Y Teng, H Chen, J Li. (2016). Machine-learning models for on-site estimation of background concentrations of arsenic in soils using soil formation factors. Journal of Soils and Sediments, 16(6): 1787–1797 https://doi.org/10.1007/s11368-016-1374-9
87	L Xiao, Y Zhou, H Huang, Y J Liu, K Li, M Y Li, Y Tian, F Wu. (2020a). Application of geostatistical analysis and random forest for source analysis and human health risk assessment of Potentially Toxic Elements (PTEs) in Arable Land Soil. International Journal of Environmental Research and Public Health, 17(24): 9296 https://doi.org/10.3390/ijerph17249296
88	L Xiao, Y Zhou, H Huang, Y J Liu, K Li, M Y Li, Y Tian, F Wu. (2020b). Application of geostatistical analysis and random forest for source analysis and human health risk assessment of potentially toxic elements (PTEs) in arable land soil. International Journal of Environmental Research and Public Health, 17(24): 9296
89	H Xu, P Croot, C Zhang. (2021). Discovering hidden spatial patterns and their associations with controlling factors for potentially toxic elements in topsoil using hot spot analysis and K-means clustering analysis. Environment International, 151: 106456 https://doi.org/10.1016/j.envint.2021.106456
90	H Yang, K Huang, K Zhang, Q Weng, H Zhang, F Wang. (2021a). Predicting heavy metal adsorption on soil with machine learning and mapping global distribution of soil adsorption capacities. Environmental Science & Technology, 55(20): 14316–14328 https://doi.org/10.1021/acs.est.1c02479
91	S Yang, D Taylor, D Yang, M He, X Liu, J Xu. (2021b). A synthesis framework using machine learning and spatial bivariate analysis to identify drivers and hotspots of heavy metal pollution of agricultural soils. Environmental Pollution, 287: 117611 https://doi.org/10.1016/j.envpol.2021.117611
92	Z M Yaseen. (2021). An insight into machine learning models era in simulating soil, water bodies and adsorption heavy metals: review, challenges and solutions. Chemosphere, 277: 130126 https://doi.org/10.1016/j.chemosphere.2021.130126
93	Z Yu, C Zhang, N Xiong, F Chen. (2022). A new random forest applied to heavy metal risk assessment. Computer Systems Science and Engineering, 40(1): 207–221 https://doi.org/10.32604/csse.2022.018301
94	M R Zafar, N Khan. (2021). Deterministic local interpretable model-agnostic explanations for stable explainability. Machine Learning and Knowledge Extraction, 3(3): 525–541 https://doi.org/10.3390/make3030027
95	C Zhang, W Kuang, J Wu, J Liu, H Tian. (2021a). Industrial land expansion in rural China threatens environmental securities. Frontiers of Environmental Science & Engineering, 15(2): 29 https://doi.org/10.1007/s11783-020-1321-2
96	H Zhang, A Yin, X Yang, M Fan, S Shao, J Wu, P Wu, M Zhang, C Gao. (2021b). Use of machine-learning and receptor models for prediction and source apportionment of heavy metals in coastal reclaimed soils. Ecological Indicators, 122: 107233 https://doi.org/10.1016/j.ecolind.2020.107233
97	H Zhang, S H Yin, Y H Chen, S S Shao, J T Wu, M M Fan, F R Chen, C Gao. (2020). Machine learning-based source identification and spatial prediction of heavy metals in soil in a rapid urbanization area, eastern China. Journal of Cleaner Production, 273: 122858 https://doi.org/10.1016/j.jclepro.2020.122858
98	X Zhang, F Lin, Y Jiang, K Wang, M T F Wong. (2008). Assessing soil Cu content and anthropogenic influences using decision tree analysis. Environmental Pollution, 156(3): 1260–1267 https://doi.org/10.1016/j.envpol.2008.03.009
99	S Zhong, K Zhang, M Bagheri, J G Burken, A Gu, B Li, X Ma, B L Marrone, Z J Ren, J Schrier. et al.. (2021). Machine learning: new ideas and tools in environmental science and engineering. Environmental Science & Technology, 55(19): 12741–12754 https://doi.org/10.1021/acs.est.1c01339
100	P Zhou, Y Zhao, Z Zhao, T Chai (2015). Source mapping and determining of soil contamination by heavy metals using statistical analysis, artificial neural network, and adaptive genetic algorithm. Journal of Environmental Chemical Engineering, 3(4, Part A): 2569–2579 https://doi.org/10.1016/j.jece.2015.08.003

[1]	Lewei Zeng, Fengbin Wang, Shupei Xiao, Xuan Zheng, Xintong Li, Qiyuan Xie, Xiaoyang Yu, Cheng Huang, Qingyao Hu, Yan You, Ye Wu. Characterization and prediction of tailpipe ammonia emissions from in-use China 5/6 light-duty gasoline vehicles[J]. Front. Environ. Sci. Eng., 2024, 18(1): 6-.
[2]	Zhaocai Wang, Qingyu Wang, Tunhua Wu. A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM[J]. Front. Environ. Sci. Eng., 2023, 17(7): 88-.
[3]	Junlang Li, Zhenguo Chen, Xiaoyong Li, Xiaohui Yi, Yingzhong Zhao, Xinzhong He, Zehua Huang, Mohamed A. Hassaan, Ahmed El Nemr, Mingzhi Huang. Water quality soft-sensor prediction in anaerobic process using deep neural network optimized by Tree-structured Parzen Estimator[J]. Front. Environ. Sci. Eng., 2023, 17(6): 67-.
[4]	Zhongyao Liang, Yaoyang Xu, Gang Zhao, Wentao Lu, Zhenghui Fu, Shuhang Wang, Tyler Wagner. Approaching the upper boundary of driver-response relationships: identifying factors using a novel framework integrating quantile regression with interpretable machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(6): 76-.
[5]	Yirong Hu, Wenjie Du, Cheng Yang, Yang Wang, Tianyin Huang, Xiaoyi Xu, Wenwei Li. Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique[J]. Front. Environ. Sci. Eng., 2023, 17(5): 55-.
[6]	Rui Liang, Chao Chen, Akash Kumar, Junyu Tao, Yan Kang, Dong Han, Xianjia Jiang, Pei Tang, Beibei Yan, Guanyi Chen. State-of-the-art applications of machine learning in the life cycle of solid waste management[J]. Front. Environ. Sci. Eng., 2023, 17(4): 44-.
[7]	Yuanxin Zhang, Fei Li, Chaoqiong Ni, Song Gao, Shuwei Zhang, Jin Xue, Zhukai Ning, Chuanming Wei, Fang Fang, Yongyou Nie, Zheng Jiao. Prediction and cause investigation of ozone based on a double-stage attention mechanism recurrent neural network[J]. Front. Environ. Sci. Eng., 2023, 17(2): 21-.
[8]	Pengxiao Zhou, Zhong Li, Yimei Zhang, Spencer Snowling, Jacob Barclay. Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies[J]. Front. Environ. Sci. Eng., 2023, 17(12): 152-.
[9]	Min Cheng, Zhiyuan Zhang, Shihui Wang, Kexin Bi, Kong-qiu Hu, Zhongde Dai, Yiyang Dai, Chong Liu, Li Zhou, Xu Ji, Wei-qun Shi. A large-scale screening of metal-organic frameworks for iodine capture combining molecular simulation and machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(12): 148-.
[10]	Weishuai Li, Jingang Huang, Zhuoer Shi, Wei Han, Ting Lü, Yuanyuan Lin, Jianfang Meng, Xiaobing Xu, Pingzhi Hou. Machine learning enabled prediction and process optimization of VFA production from riboflavin-mediated sludge fermentation[J]. Front. Environ. Sci. Eng., 2023, 17(11): 135-.
[11]	Jin Xue, Fangting Wang, Kun Zhang, Hehe Zhai, Dan Jin, Yusen Duan, Elly Yaluk, Yangjun Wang, Ling Huang, Yuewu Li, Thomas Lei, Qingyan Fu, Joshua S. Fu, Li Li. Elucidate long-term changes of ozone in Shanghai based on an integrated machine learning method[J]. Front. Environ. Sci. Eng., 2023, 17(11): 138-.
[12]	Bing Zhang, Chenxiang Sun, Huimin Lin, Wei Liu, Wentao Qin, Tan Chen, Ting Yang, Xianghua Wen. Differences in distributions, assembly mechanisms, and putative interactions of AOB and NOB at a large spatial scale[J]. Front. Environ. Sci. Eng., 2023, 17(10): 122-.
[13]	Haoyang Xian, Pinjing He, Dongying Lan, Yaping Qi, Ruiheng Wang, Fan Lü, Hua Zhang, Jisheng Long. Predicting the elemental compositions of solid waste using ATR-FTIR and machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(10): 121-.
[14]	Tienan Ju, Mei Lei, Guanghui Guo, Jinglun Xi, Yang Zhang, Yuan Xu, Qijia Lou. A new prediction method of industrial atmospheric pollutant emission intensity based on pollutant emission standard quantification[J]. Front. Environ. Sci. Eng., 2023, 17(1): 8-.
[15]	Wenjing Lu, Weizhong Huo, Huwanbieke Gulina, Chao Pan. Development of machine learning multi-city model for municipal solid waste generation prediction[J]. Front. Environ. Sci. Eng., 2022, 16(9): 119-.

Viewed

Full text

Abstract

Cited

Shared

Discussed