|
|
Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique |
Yirong Hu1,3, Wenjie Du2,3, Cheng Yang1, Yang Wang2(), Tianyin Huang4, Xiaoyi Xu4, Wenwei Li1,3() |
1. CAS Key Laboratory of Urban Pollutant Conversion, Department of Environmental Science and Engineering, University of Science and Technology of China, Hefei 230026, China 2. School of Software Engineering, University of Science and Technology of China, Hefei 230026, China 3. USTC-CityU Joint Advanced Research Center, Suzhou Institute for Advanced Research of USTC, Suzhou 215123, China 4. National and Local Joint Engineering Laboratory for Municipal Sewage Resource Utilization Technology, Suzhou University of Science and Technology, Suzhou 215009, China |
|
|
Abstract ● A machine learning model was used to identify lake nutrient pollution sources. ● XGBoost model showed the best performance for lake water quality prediction. ● Model feature size was reduced by screening the key features with the MIC method. ● TN and TP concentrations of Lake Taihu are mainly affected by endogenous sources. ● Next-month lake TN and TP concentrations were predicted accurately. Effective control of lake eutrophication necessitates a full understanding of the complicated nitrogen and phosphorus pollution sources, for which mathematical modeling is commonly adopted. In contrast to the conventional knowledge-based models that usually perform poorly due to insufficient knowledge of pollutant geochemical cycling, we employed an ensemble machine learning (ML) model to identify the key nitrogen and phosphorus sources of lakes. Six ML models were developed based on 13 years of historical data of Lake Taihu’s water quality, environmental input, and meteorological conditions, among which the XGBoost model stood out as the best model for total nitrogen (TN) and total phosphorus (TP) prediction. The results suggest that the lake TN is mainly affected by the endogenous load and inflow river water quality, while the lake TP is predominantly from endogenous sources. The prediction of the lake TN and TP concentration changes in response to these key feature variations suggests that endogenous source control is a highly desirable option for lake eutrophication control. Finally, one-month-ahead prediction of lake TN and TP concentrations (R2 of 0.85 and 0.95, respectively) was achieved based on this model with sliding time window lengths of 9 and 6 months, respectively. Our work demonstrates the great potential of using ensemble ML models for lake pollution source tracking and prediction, which may provide valuable references for early warning and rational control of lake eutrophication.
|
Keywords
Eutrophication
Machine learning
Water quality
Nutrients
Prediction
|
Corresponding Author(s):
Yang Wang,Wenwei Li
|
Issue Date: 30 November 2022
|
|
1 |
H Cao, L Han, L Li. (2022). A deep learning method for cyanobacterial harmful algae blooms prediction in Taihu Lake, China. Harmful Algae, 113: 102189
https://doi.org/10.1016/j.hal.2022.102189
pmid: 35287935
|
2 |
Q Chen, Z Ni, S Wang, Y Guo, S Liu. (2020). Climate change and human activities reduced the burial efficiency of nitrogen and phosphorus in sediment from Dianchi Lake, China. Journal of Cleaner Production, 274: 122839
https://doi.org/10.1016/j.jclepro.2020.122839
|
3 |
T Chen, C Guestrin. (2016). Xgboost: a scalable tree boosting system. In: Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 785–794
|
4 |
S S Dhaliwal, A A Nahid, R Abbas. (2018). Effective intrusion detection system using XGBoost. Information, 9(7): 149
https://doi.org/10.3390/info9070149
|
5 |
Y Dong, L Xu, Z Yang, H Zheng, L Chen. (2020). Aggravation of reactive nitrogen flow driven by human production and consumption in Guangzhou City China. Nature Communications, 11(1): 1209
https://doi.org/10.1038/s41467-020-14699-x
pmid: 32139678
|
6 |
Pedregosa G V Fabian, G Alexandre, M Vincent, T Bertrand, G Olivier, B Mathieu, P Peter, W Ron, D Vincent, V Jake. et al.. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12(85): 2825–2830
|
7 |
J H Friedman. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5): 1189–1232
https://doi.org/10.1214/aos/1013203451
|
8 |
Nieto P J García, E García-Gonzalo, Fernández J R Alonso, Muñiz C Díaz. (2019). Water eutrophication assessment relied on various machine learning techniques: a case study in the Englishmen Lake (Northern Spain). Ecological Modelling, 404: 91–102
https://doi.org/10.1016/j.ecolmodel.2019.03.009
|
9 |
K J Gibbons, T B Bridgeman. (2020). Effect of temperature on phosphorus flux from anoxic western Lake Erie sediments. Water Research, 182: 116022
https://doi.org/10.1016/j.watres.2020.116022
pmid: 32623199
|
10 |
J C Ho, A M Michalak, N Pahlevan. (2019). Widespread global increase in intense lake phytoplankton blooms since the 1980s. Nature, 574(7780): 667–670
https://doi.org/10.1038/s41586-019-1648-7
pmid: 31610543
|
11 |
J Huang, Y Zhang, G B Arhonditsis, J Gao, Q Chen, J Peng. (2020). The magnitude and drivers of harmful algal blooms in China’s lakes and reservoirs: a national-scale characterization. Water Research, 181: 115902
https://doi.org/10.1016/j.watres.2020.115902
pmid: 32505885
|
12 |
Y Huang, J Chen, Q Duan, Y Feng, R Luo, W Wang, F Liu, S Bi, J Lee. (2022). A fast antibiotic detection method for simplified pretreatment through spectra-based machine learning. Frontiers of Environmental Science & Engineering, 16(3): 38
https://doi.org/10.1007/s11783-021-1472-9
|
13 |
A B G Janssen, V C L de Jager, J H Janse, X Kong, S Liu, Q Ye, W M Mooij. (2017). Spatial identification of critical nutrient loads of large shallow lakes: implications for Lake Taihu (China). Water Research, 119: 276–287
https://doi.org/10.1016/j.watres.2017.04.045
pmid: 28477543
|
14 |
S R Joshi, R K Kukkadapu, D J Burdige, M E Bowden, D L Sparks, D P Jaisi. (2015). Organic matter remineralization predominates phosphorus cycling in the mid-Bay sediments in the Chesapeake Bay. Environmental Science & Technology, 49(10): 5887–5896
https://doi.org/10.1021/es5059617
pmid: 25633477
|
15 |
K Kim. (2016). A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recognition, 60: 157–163
https://doi.org/10.1016/j.patcog.2016.04.016
|
16 |
M Kong, J Chao, W Zhuang, P Wang, C Wang, J Hou, Z Wu, L Wang, G Gao, Y Wang. (2018). Spatial and temporal distribution of particulate phosphorus and their correlation with environmental factors in a shallow eutrophic Chinese lake (Lake Taihu). International Journal of Environmental Research and Public Health, 15(11): 2355
https://doi.org/10.3390/ijerph15112355
pmid: 30366408
|
17 |
Taihu Basin Authority Lake. (2021). Taihu basin and southeast rivers water resources bulletin (2020). Shanghai: Lake Taihu Basin Authority, 1–24
|
18 |
X Li, J Sha, Z L Wang. (2018). Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake. Environmental Science and Pollution Research International, 25(20): 19488–19498
https://doi.org/10.1007/s11356-018-2147-3
pmid: 29730758
|
19 |
Y Li, L Ni, Y Guo, X Zhao, Y Dong, Y Cheng. (2022). Challenges and Opportunities to Treat Water Pollution. Paths to Clean Water Under Rapid Changing Environment in China. Singapore: Springer, 13–42
|
20 |
I E Lima Neto, P H A Medeiros, A C Costa, M C Wiegand, A R M Barros, M U G Barros. (2022). Assessment of phosphorus loading dynamics in a tropical reservoir with high seasonal water level changes. Science of the Total Environment, 815: 152875
https://doi.org/10.1016/j.scitotenv.2021.152875
pmid: 34995594
|
21 |
Y Liu, H Luo, B Zhao, X Zhao, Z Han. (2018). Short-Term Power Load Forecasting Based on Clustering and XGBoost Method. Piscataway: IEEE, 536–539
|
22 |
Lu H, Yang L, Fan Y, Qian X, Liu T (2022). Novel simulation of aqueous total nitrogen and phosphorus concentrations in Taihu Lake with machine learning. Environmental Research, 204(Pt B): 111940
|
23 |
Ministry of Ecology and Environment of the People’s Republic of China (2022). Bulletin on the State of China’s Ecological Environment in 2021. Beijing: Ministry of Ecology and Environment of the People’s Republic of China
|
24 |
H MosaffaM SadeghiI MallakpourM N JahromiH R Pourghasemi (2022). Application of machine learning algorithms in hydrology. In: Pourghasemi H R, ed. Computers in Earth and Environmental Sciences. Amsterdam: Elsevier
|
25 |
K B Newhart, J E Goldman-Torres, D E Freedman, K B Wisdom, A S Hering, T Y Cath. (2021). Prediction of peracetic acid disinfection performance for secondary municipal wastewater treatment using artificial neural networks. ACS ES&T Water, 1(2): 328–338
https://doi.org/10.1021/acsestwater.0c00095
|
26 |
B Qin, H W Paerl, J D Brookes, J Liu, E Jeppesen, G Zhu, Y Zhang, H Xu, K Shi, J Deng. (2019). Why Lake Taihu continues to be plagued with cyanobacterial blooms through 10 years (2007–2017) efforts. Science Bulletin, 64(6): 354–356
https://doi.org/10.1016/j.scib.2019.02.008
|
27 |
B QinP Xu Q WuL Luo Y Zhang (2007). Environmental issues of Lake Taihu, China. In: Qin B, Liu Z, Havens K, eds. Eutrophication of Shallow Lakes with Special Reference to Lake Taihu, China. Dordrecht: Springer Netherlands
|
28 |
G T Reddy, M P K Reddy, K Lakshmanna, R Kaluri, D S Rajput, G Srivastava, K Baker. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8: 54776–54788
https://doi.org/10.1109/ACCESS.2020.2980942
|
29 |
D N Reshef, Y A Reshef, H K Finucane, S R Grossman, G McVean, P J Turnbaugh, E S Lander, M Mitzenmacher, P C Sabeti. (2011). Detecting novel associations in large data sets. Science, 334(6062): 1518–1524
https://doi.org/10.1126/science.1205438
pmid: 22174245
|
30 |
R P Sheridan, W M Wang, A Liaw, J Ma, E M Gifford. (2016). Extreme gradient boosting as a method for quantitative structure–activity relationships. Journal of Chemical Information and Modeling, 56(12): 2353–2360
https://doi.org/10.1021/acs.jcim.6b00591
pmid: 27958738
|
31 |
A J Siade, B C Bostick, O A Cirpka, H Prommer. (2021). Unraveling biogeochemical complexity through better integration of experiments and modeling. Environmental Science. Processes & Impacts, 23(12): 1825–1833
https://doi.org/10.1039/D1EM00303H
pmid: 34739021
|
32 |
K Song, S Zhu, Y Lu, G Dao, Y Wu, Z Chen, S Wang, J Liu, W Zhou, H Y Hu. (2022). Modelling the thresholds of nitrogen/phosphorus concentration and hydraulic retention time for bloom control in reclaimed water landscape. Frontiers of Environmental Science & Engineering, 16(10): 129
https://doi.org/10.1007/s11783-022-1564-1
|
33 |
S Sundar, M C Rajagopal, H Zhao, G Kuntumalla, Y Meng, H C Chang, C Shao, P Ferreira, N Miljkovic, S Sinha. et al.. (2020). Fouling modeling and prediction approach for heat exchangers using deep learning. International Journal of Heat and Mass Transfer, 159: 120112
https://doi.org/10.1016/j.ijheatmasstransfer.2020.120112
|
34 |
Y D Tong, X W Xu, M Qi, n J J Sun, Y Y Zhang, W Zhang, M Z Wang, X J Wang, Y Zhang. (2021). Lake warming intensifies the seasonal pattern of internal nutrient cycling in the eutrophic lake and potential impacts on algal blooms. Water Research, 188: 116570
https://doi.org/10.1016/j.watres.2020.116570
pmid: 33137524
|
35 |
Y D Tong, X W Xu, S L Zhang, L M Shi, X Y Zhang, M Z Wang, M Qi, C Chen, Y T Wen, Y Zhao. et al.. (2019). Establishment of season-specific nutrient thresholds and analyses of the effects of nutrient management in eutrophic lakes through statistical machine learning. Journal of Hydrology, 578: 124079
https://doi.org/10.1016/j.jhydrol.2019.124079
|
36 |
M Tourian, A Tarpanelli, O Elmi, T Qin, L Brocca, T Moramarco, N Sneeuw. (2016). Spatiotemporal densification of river water level time series by multimission satellite altimetry. Water Resources Research, 52(2): 1140–1159
https://doi.org/10.1002/2015WR017654
|
37 |
L Wang, Y Wang, H Cheng, J Cheng. (2018a). Estimation of the nutrient and chlorophyll a reference conditions in Taihu Lake based on a new method with Extreme–Markov Theory. International Journal of Environmental Research and Public Health, 15(11): 2372
https://doi.org/10.3390/ijerph15112372
|
38 |
M Wang, L Ma, M Strokal, W Ma, X Liu, C Kroeze. (2018b). Hotspots for nitrogen and phosphorus losses from food production in China: a county-scale analysis. Environmental Science & Technology, 52(10): 5782–5791
https://doi.org/10.1021/acs.est.7b06138
pmid: 29671326
|
39 |
M Wang, X Xu, Z Wu, X Zhang, P Sun, Y Wen, Z Wang, X Lu, W Zhang, X Wang. et al.. (2019). Seasonal pattern of nutrient limitation in a eutrophic lake and quantitative analysis of the impacts from internal nutrient cycling. Environmental Science & Technology, 53(23): 13675–13686
https://doi.org/10.1021/acs.est.9b04266
pmid: 31599576
|
40 |
S Wang, J Li, B Zhang, E Spyrakos, A N Tyler, Q Shen, F Zhang, T Kuster, M K Lehmann, Y Wu, D Peng. (2018c). Trophic state assessment of global inland waters using a MODIS-derived Forel-Ule index. Remote Sensing of Environment, 217: 444–460
https://doi.org/10.1016/j.rse.2018.08.026
|
41 |
Z Wu, Y Liu, Z Liang, S Wu, H Guo. (2017). Internal cycling, not external loading, decides the nutrient limitation in eutrophic lake: A dynamic model with temporal Bayesian hierarchical inference. Water Research, 116: 231–240
https://doi.org/10.1016/j.watres.2017.03.039
pmid: 28343059
|
42 |
J Xia, J Zeng. (2022). Environmental factors assisted the evaluation of entropy water quality indices with efficient machine learning technique. Water Resources Management, 36(6): 2045–2060
https://doi.org/10.1007/s11269-022-03126-z
|
43 |
J XiongC LinZ CaoM HuK Xue X ChenR Ma (2022). Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: conventional or machine learning? Water Research, 215: 118213
|
44 |
E Yakovleva, P K Hopke, L Wallace. (1999). Receptor modeling assessment of particle total exposure assessment methodology data. Environmental Science & Technology, 33(20): 3645–3652
https://doi.org/10.1021/es981122i
|
45 |
C YangJ LiH Yin (2022a). Phosphorus internal loading and sediment diagenesis in a large eutrophic lake (Lake Chaohu, China). Environmental Pollution, 292(Pt B): 118471
|
46 |
C Yang, P Yang, J Geng, H Yin, K Chen. (2020). Sediment internal nutrient loading in the most polluted area of a shallow eutrophic lake (Lake Chaohu, China) and its contribution to lake eutrophication. Environmental Pollution, 262: 114292
https://doi.org/10.1016/j.envpol.2020.114292
pmid: 32179221
|
47 |
K Yang, Z Yu, Y Luo, Y Yang, L Zhao, X Zhou. (2018). Spatial and temporal variations in the relationship between lake water surface temperatures and water quality: a case study of Dianchi Lake. Science of the Total Environment, 624: 859–871
https://doi.org/10.1016/j.scitotenv.2017.12.119
pmid: 29274610
|
48 |
N Yang, L Wang, L Lin, Y Li, W Zhang, L Niu, H Zhang, L Wang. (2022b). Pelagic-benthic coupling of the microbial food web modifies nutrient cycles along a cascade-dammed river. Frontiers of Environmental Science & Engineering, 16(4): 50
https://doi.org/10.1007/s11783-021-1484-5
|
49 |
Q Yu, F Wang, W Yan, F Zhang, S Lv, Y Li. (2018). Carbon and nitrogen burial and response to climate change and anthropogenic disturbance in Chaohu Lake, China. International Journal of Environmental Research and Public Health, 15(12): 2734
https://doi.org/10.3390/ijerph15122734
pmid: 30518045
|
50 |
F Yuan, Y D Wei, J Gao, W Chen. (2019). Water crisis, environmental regulations and location dynamics of pollution-intensive industries in China: a study of the Taihu Lake watershed. Journal of Cleaner Production, 216: 311–322
https://doi.org/10.1016/j.jclepro.2019.01.177
|
51 |
H Yuan, H Wang, Y Zhou, B Jia, J Yu, Y Cai, Z Yang, E Liu, Q Li, H Yin. (2021). Water-level fluctuations regulate the availability and diffusion kinetics process of phosphorus at lake water-sediment interface. Water Research, 200: 117258
https://doi.org/10.1016/j.watres.2021.117258
pmid: 34058482
|
52 |
Q Zhang, Z Li, L Zhu, F Zhang, E Sekerinski, J C Han, Y Zhou. (2021). Real-time prediction of river chloride concentration using ensemble learning. Environmental Pollution, 291: 118116
https://doi.org/10.1016/j.envpol.2021.118116
pmid: 34537597
|
53 |
X Zhang, B Li, H Xu, M Wells, B Tefsen, B Qin. (2019). Effect of micronutrients on algae in different regions of Taihu, a large, spatially diverse, hypereutrophic lake. Water Research, 151: 500–514
https://doi.org/10.1016/j.watres.2018.12.023
pmid: 30641465
|
54 |
Y Zhang, P Luo, S Zhao, S Kang, P Wang, M Zhou, J Lyu. (2020). Control and remediation methods for eutrophic lakes in the past 30 years. Water Science and Technology, 81(6): 1099–1113
https://doi.org/10.2166/wst.2020.218
pmid: 32597398
|
55 |
Z Zhou, C Lin, S Li, S Liu, F Li, B Yuan. (2022). Four kinds of capping materials for controlling phosphorus and nitrogen release from contaminated sediment using a static simulation experiment. Frontiers of Environmental Science & Engineering, 16(3): 29
https://doi.org/10.1007/s11783-021-1463-x
|
56 |
Q Zhu, A Gu, D Li, T Zhang, L Xiang. (2021). Online recognition of drainage type based on UV-vis spectra and derivative neural network algorithm. Frontiers of Environmental Science & Engineering, 15(6): 136
https://doi.org/10.1007/s11783-021-1430-6
|
57 |
L Zou, H Li, S Wang, K Zheng, Y Wang, G Du, J Li. (2019). Characteristic and correlation analysis of influent and energy consumption of wastewater treatment plants in Taihu Basin. Frontiers of Environmental Science & Engineering, 13(6): 83
https://doi.org/10.1007/s11783-019-1167-7
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|