Please wait a minute...
Frontiers of Environmental Science & Engineering

ISSN 2095-2201

ISSN 2095-221X(Online)

CN 10-1013/X

Postal Subscription Code 80-973

2018 Impact Factor: 3.883

Front. Environ. Sci. Eng.    2024, Vol. 18 Issue (2) : 17    https://doi.org/10.1007/s11783-024-1777-6
RESEARCH ARTICLE
Development of gradient boosting-assisted machine learning data-driven model for free chlorine residual prediction
Wiley Helm1, Shifa Zhong1,2, Elliot Reid1, Thomas Igou1, Yongsheng Chen1()
1. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
2. School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, China
 Download: PDF(2678 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

● A machine learning approach was applied to predict free chlorine residuals.

● Annual data were obtained from chlorination unit at a 98 MGD water treatment plant.

● The last model iteration returned a high prediction value ( R 2 = 0.937).

● Non-intuitive parameters were found to be highly significant to predictions.

Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR.

Keywords Machine learning      Data-driven modeling      Drinking water treatment      Disinfection      Chlorination     
Corresponding Author(s): Yongsheng Chen   
Issue Date: 12 October 2023
 Cite this article:   
Wiley Helm,Shifa Zhong,Elliot Reid, et al. Development of gradient boosting-assisted machine learning data-driven model for free chlorine residual prediction[J]. Front. Environ. Sci. Eng., 2024, 18(2): 17.
 URL:  
https://academic.hep.com.cn/fese/EN/10.1007/s11783-024-1777-6
https://academic.hep.com.cn/fese/EN/Y2024/V18/I2/17
Fig.1  DWT utility process flow diagram. Schematic depicts process and sample flows, chemical injections, and analytical equipment.
Input parameterMeanStandard deviationMin.PercentilesMax.
25%50%75%
Raw water pH6.630.230.606.476.696.787.08
Raw water temperature (°C)19.422.8614.0017.1719.0022.0034.00
Raw water turbidity (NTU)0.736.080.000.440.560.65371.00
Filter water pH9.300.146.909.249.319.389.79
Filter water turbidity (NTU)0.030.010.000.030.030.030.30
Finished water pH7.250.087.007.197.257.307.89
Filter water Fl (mg/L)0.830.060.000.800.830.871.15
Filter water PO4 (mg/L)1.173.260.100.991.021.06107.00
Flocculated turbidity (NTU)0.400.950.100.250.330.4429.90
Finished water Fe (mg/L)0.020.010.000.010.020.020.07
Raw water Mn (mg/L)0.010.010.000.010.010.010.20
Finished water Mn (mg/L)0.000.000.000.000.000.000.06
Raw water alkalinity (mg/L as CaCO3)14.860.9810.5014.0015.0015.5021.50
Finished water alkalinity (mg/L as CaCO3)21.422.522.0019.4221.0023.0029.50
Channel pH8.150.733.207.728.248.709.73
Channel Cl2 (mg/L)2.010.20?1.301.941.992.054.99
Lime pump 1 stroke (%)25.198.120.0018.5624.1232.0752.19
Lime pump 2 stroke (%)0.000.000.000.000.000.000.00
Post chlorinator 1 feed rate (kg/d)166.4480.270.00161.90180.95200.91518.37
Post chlorinator 2 feed rate (kg/d)137.8797.510.001.36173.24199.09476.19
Post chlorinator 3 feed rate (kg/d)82.54118.820.000.001.36180.50519.73
Phosphate feed rate (mL/min)399.0091.000.00365.00384.00432.00889.00
Phosphate dosage (mg/L)1.970.041.901.951.952.012.01
Fluoride feed rate (mL/min)267.0065.000.00230.00262.00290.00685.00
Fluoride dosage (mg/L)0.820.080.700.760.800.900.97
Raw tank 1 flow (MLD)77.2240.130.0063.2268.14105.23233.94
Raw tank 2 flow (MLD)68.1432.550.0051.4854.8988.96233.94
Central header flow (MLD)73.0673.820.380.3879.87137.03323.65
Transfer pump flow (MLD)71.1771.540.380.3857.16140.82227.12
Plant water flow (L/min)1665.58435.3215.141514.161563.371665.584163.95
Plant water flow (MLD)2.380.640.002.202.232.385.98
Process flow (MLD)146.8728.010.76132.87142.71154.44326.30
Backwash water turbidity (NTU)0.030.010.000.030.030.030.27
Cl2A (mg/L)1.910.16?1.301.851.891.942.63
Tab.1  Model input parameters
Fig.2  Model performance evaluation (iteration 1). Ground truth chlorine analyzer records (Cl2A) are plotted against predicted values. Test set (orange) results are superimposed over the training set (blue). The model was able to predict Cl2A well (R2test = 0.854).
Fig.3  SHAP force plot (iteration 1). The x-axis is the prediction number, ordered from greatest to lease FCR prediction value. The y-axis is the predicted FCR in mg/L. The predicted value occurs where the red and blue arrows intersect. The red arrows have positive impact on the prediction with the blue arrow have a negative impact.
Fig.4  SHAP force plot single prediction review (iteration 1). A SHAP force plot review of a single predicted data point with a FCR value of 1.89 mg/L. The red arrows have positive impact on the prediction, while blue arrows have a negative impact.
Fig.5  SHAP bee swarm plot (iteration 1).
Fig.6  Model prediction performance evaluation (iteration 2). Test set results (orange) are superimposed over training data (blue). The model performance improved over the previous iteration, with R2test = 0.937.
ParameterSampling locationFeature rankExpected FCR correlationModeled FCR correlation
Temperature (°C)Raw Water4NegativeMixed
pHRaw Water17PositiveNegative
Filter32Positive
Channel27Mixed
Finished Water9Negative
Turbidity (NTU)Raw Water15NegativePositive
Flocculated11Positive
Filtered Water25Positive
Backwash Wet Well29Mixed
Iron (mg/L)Finished Water31NegativeMixed
Manganese (mg/L)Raw Water16NegativeNegative
Finished Water24Positive
FCR (mg/L)Channel1PositivePositive
Chlorine Dose (mg/L)Post-Chlorinator 123PositiveMixed
Post-Chlorinator 226Mixed
Post-Chlorinator 37Positive
Tab.2  Comparison of expected and modeled feature impacts on FCR predictions
Fig.7  Model prediction performance evaluation (iteration 3). Test set results (orange) are superimposed over training data (blue). The model performance decreased compared to the previous iteration, with R2test = 0.858.
Fig.8  Model prediction performance evaluation (iteration 4). Test set results (orange) are superimposed over training data (blue). The model performance decreased compared to the previous iteration, with R2test = 0.845.
Model iterationInput parametersR2testRelative performance vs. Base caseDescription
1330.854Base case; pretreated data without modification
2330.937+9.72%Rolling average; time interval increased from 1 to 6 h and normalized over interval
3270.858+0.47%Rolling average from iteration 2 was consolidated by removing/combining selected parameters, and adding 3 secondary manipulations
4120.845?1.05%Applied a rolling average and removed parameters considered to be non-causal, the FCR measured upstream of the clearwells, and pre-filtration water quality parameters
Tab.3  Comparison of expected and modeled feature impacts on FCR predictions
1 M P Abdullah , L F Yee , S Ata , A Abdullah , B Ishak , K N Z Abidin . (2009). The study of interrelationship between raw water quality parameters, chlorine demand and the formation of disinfection by-products. Physics and Chemistry of the Earth Parts A/B/C, 34(13–16): 806–811
https://doi.org/10.1016/j.pce.2009.06.014
2 Felipe L André , Cosme Rodrigues Dos S Fábio , Gustavo D Cleber . (2018). Artificial neural networks to control chlorine dosing in a water treatment plant. Acta Scientiarum. Technology, 40(1): 1–9
3 P F Boulos . (2017). Optimal scheduling of pipe replacement. Journal−American Water Works Association, 109(1): 42–46
https://doi.org/10.5942/jawwa.2017.109.0002
4 B BuysschaertL VermijsA NakaN BoonB De Gusseme (2018). Online flow cytometric monitoring of microbial water quality in a full-scale water treatment plant. npj Clean Water, 1(1): 16
5 R M Clark , M Sivaganesan . (2002). Predicting chlorine residuals in drinking water: second order model. Journal of Water Resources Planning and Management, 128(2): 152–161
https://doi.org/10.1061/(ASCE)0733-9496(2002)128:2(152
6 Y CriderS SultanaL UnicombJ DavisS P LubyA J Pickering (2018). Can you taste it? Taste detection and acceptability thresholds for chlorine residual in drinking water in Dhaka, Bangladesh. Science of the Total Environment, 613–614: 840–846
7 I Delpla , A V Jung , E Baures , M Clement , O Thomas . (2009). Impacts of climate change on surface water quality in relation to drinking water production. Environment International, 35(8): 1225–1233
https://doi.org/10.1016/j.envint.2009.07.001
8 A Di Nardo , M Di Natale , R Greco , G F Santonastaso . (2014). Ant algorithm for smart water network partitioning. Procedia Engineering, 70: 525–534
https://doi.org/10.1016/j.proeng.2014.02.058
9 K FishA M OsbornJ B Boxall (2017). Biofilm structures (EPS and bacterial communities) in drinking water distribution systems are conditioned by hydraulics and influence discolouration. Science of the Total Environment, 593–594: 571–580
10 I Frateur , C Deslouis , L Kiene , Y Levi , B Tribollet . (1999). Free chlorine consumption induced by cast iron corrosion in drinking water distribution systems. Water Research, 33(8): 1781–1790
https://doi.org/10.1016/S0043-1354(98)00369-8
11 T Fujioka , A T Hoang , H Aizawa , H Ashiba , M Fujimaki , M Leddy . (2018). Real-time online monitoring for assessing removal of bacteria by reverse osmosis. Environmental Science & Technology Letters, 5(6): 389–393
https://doi.org/10.1021/acs.estlett.8b00200
12 G A Gagnon , J L Rand , K C O’leary , A C Rygel , C Chauret , R C Andrews . (2005). Disinfectant efficacy of chlorite and chlorine dioxide in drinking water biofilms. Water Research, 39(9): 1809–1817
https://doi.org/10.1016/j.watres.2005.02.004
13 D C Gang , T E Clevenger , K S Banerji . (2003). Modeling chlorine decay in surface water. Journal of Environmental Informatics, 1(1): 21–27
https://doi.org/10.3808/jei.200300003
14 H Gao , S Zhong , W Zhang , T Igou , E Berger , E Reid , Y Zhao , D Lambeth , L Gan , M A Afolabi , Z Tong , G Lan , Y Chen . (2022). Revolutionizing membrane design using machine learning-Bayesian optimization. Environmental Science & Technology, 56(4): 2572–2581
https://doi.org/10.1021/acs.est.1c04373
15 M J Gray , W Y Wholey , U Jakob . (2013). Bacterial responses to reactive chlorine species. Annual Review of Microbiology, 67(1): 141–160
https://doi.org/10.1146/annurev-micro-102912-142520
16 A HolzingerR GoebelR FongT MoonK R MüllerW Samek (2022). xxAI-beyond explainable artificial intelligence. In: Proceedings of International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, Vienna, Austria, 18 July 2020. Cham: Springer, 3–10
17 L H H Hsu , E Hoque , P Kruse , P Ravi Selvaganapathy . (2015). A carbon nanotube based resettable sensor for measuring free chlorine in drinking water. Applied Physics Letters, 106(6): 063102
https://doi.org/10.1063/1.4907631
18 L Li , S Rong , R Wang , S Yu . (2021). Recent advances in artificial intelligence and machine learning for nonlinear relationship analysis and process control in drinking water treatment: a review. Chemical Engineering Journal, 405: 126673
https://doi.org/10.1016/j.cej.2020.126673
19 X Liu (2016). Methods and Applications of Longitudinal Data Analysis. Oxford: Academic Press, 441–473
20 M Lowe , R Qin , X Mao . (2022). A review on machine learning, artificial intelligence, and smart technology in water treatment and monitoring. Water (Basel), 14(9): 1384–1411
https://doi.org/10.3390/w14091384
21 Kenzie W R Mac , N J Hoxie , M E Proctor , M S Gradus , K A Blair , D E Peterson , J J Kazmierczak , D G Addiss , K R Fox , J B Rose . et al.. (1994). A massive outbreak in Milwaukee of Cryptosporidium infection transmitted through the public water supply. New England Journal of Medicine, 331(3): 161–167
https://doi.org/10.1056/NEJM199407213310304
22 L MerrickA Taly (2020). The Explanation Game: Explaining Machine Learning Models Using Shapley Values. Cham: Springer International Publishing, 17–38
23 C Onyutha , J C Kwio-Tamale . (2022). Modelling chlorine residuals in drinking water: a review. International Journal of Environmental Science and Technology, 19(11): 11613–11630
https://doi.org/10.1007/s13762-022-03924-3
24 J C Powell , N B Hallam , J R West , C F Forster , J Simms . (2000). Factors which control bulk chlorine decay rates. Water Research, 34(1): 117–126
https://doi.org/10.1016/S0043-1354(99)00097-4
25 E Reid , T Igou , Y Zhao , J Crittenden , C H Huang , P Westerhoff , B Rittmann , J E Drewes , Y Chen . (2023). The minus approach can redefine the standard of practice of drinking water treatment. Environmental Science & Technology, 57(18): 7150–7161
https://doi.org/10.1021/acs.est.2c09389
26 S D Richardson , S Y Kimura . (2020). Water analysis: emerging contaminants and current issues. Analytical Chemistry, 92(1): 473–505
https://doi.org/10.1021/acs.analchem.9b05269
27 B E Rittmann , V L Snoeyink . (1984). Achieving biologically stable drinking water. Journal–American Water Works Association, 76(10): 106–114
https://doi.org/10.1002/j.1551-8833.1984.tb05427.x
28 M Romano , Z Kapelan , D A Savić . (2014). Automated detection of pipe bursts and other events in water distribution systems. Journal of Water Resources Planning and Management, 140(4): 457–467
https://doi.org/10.1061/(ASCE)WR.1943-5452.0000339
29 D Saboe , K D Hristovski , S R Burge , R G Burge , E Taylor , D A Hoffman . (2021). Measurement of free chlorine levels in water using potentiometric responses of biofilms and applications for monitoring and managing the quality of potable water. Science of the Total Environment, 766: 144424
https://doi.org/10.1016/j.scitotenv.2020.144424
30 D L Sedlak , U Von Gunten . (2011). The chlorine dilemma. Science, 331(6013): 42–43
https://doi.org/10.1126/science.1196397
31 P W M H Smeets , G J Medema , J C Van Dijk . (2009). The Dutch secret: How to provide safe drinking water without chlorine in the Netherlands?. Drinking Water Engineering and Science, 2(1): 1–14
https://doi.org/10.5194/dwes-2-1-2009
32 I H Suffet , A Corado , D Chou , M J Mcguire , S Butterworth . (1996). AWWA taste and odor survey. Journal–American Water Works Association, 88(4): 168–180
https://doi.org/10.1002/j.1551-8833.1996.tb06542.x
33 M SundararajanA Najmi (2020). The many Shapley values for model explanation. In: Hal D III, Aarti S, editors. Proceedings of Machine Learning Research. Brookline, MA, USA: 119, 9269–9278
34 S Tinelli , I Juran . (2019). Artificial intelligence-based monitoring system of water quality parameters for early detection of non-specific bio-contamination in water distribution systems. Water Science and Technology: Water Supply, 19(6): 1785–1792
https://doi.org/10.2166/ws.2019.057
35 M Valdivia-Garcia , P Weir , D W Graham , D Werner . (2019). Predicted impact of climate change on trihalomethanes formation in drinking water treatment. Scientific Reports, 9(1): 9967
https://doi.org/10.1038/s41598-019-46238-0
36 B Warton , A Heitz , C Joll , R Kagi . (2006). A new method for calculation of the chlorine demand of natural and treated waters. Water Research, 40(15): 2877–2884
https://doi.org/10.1016/j.watres.2006.05.020
37 R E Wilson , I Stoianov , D O’hare . (2019). Continuous chlorine detection in drinking water and a review of new detection methods. Johnson Matthey Technology Review, 63(2): 103–118
https://doi.org/10.1595/205651318X15367593796080
38 Health Organization World (2017). Principles and Practices of Drinking-Water Chlorination: a Guide to Strengthening Chlorination Practices in Small to Medium Sized Water Supplies. New Delhi: World Health Organization Regional Office for South-East Asia
39 B Zhang , G Kotsalis , J Khan , Z Xiong , T Igou , G Lan , Y Chen . (2020a). Backwash sequence optimization of a pilot-scale ultrafiltration membrane system using data-driven modeling for parameter forecasting. Journal of Membrane Science, 612: 118464
https://doi.org/10.1016/j.memsci.2020.118464
40 K Zhang , S Zhong , H Zhang . (2020b). Predicting aqueous adsorption of organic compounds onto biochars, carbon nanotubes, granular activated carbons, and resins with machine learning. Environmental Science & Technology, 54(11): 7008–7018
https://doi.org/10.1021/acs.est.0c02526
41 S Zhong , D R Lambeth , T K Igou , Y Chen . (2022). Enlarging applicability domain of quantitative structure–activity relationship models through uncertainty-based active learning. ACS ES&T Engineering, 2(7): 1211–1220
https://doi.org/10.1021/acsestengg.1c00434
42 S Zhong , K Zhang , M Bagheri , J G Burken , A Gu , B Li , X Ma , B L Marrone , Z J Ren , J Schrier . et al.. (2021). Machine learning: new ideas and tools in environmental science and engineering. Environmental Science & Technology, 55(19): 12741–12754
https://doi.org/10.1021/acs.est.1c01339
[1] FSE-23072-OF-HW_suppl_1 Download
[1] Joe F. Bozeman III. Bolstering integrity in environmental data science and machine learning requires understanding socioecological inequity[J]. Front. Environ. Sci. Eng., 2024, 18(5): 65-.
[2] Hankun Yang, Yujuan Li, Hongyu Liu, Nigel J. D. Graham, Xue Wu, Jiawei Hou, Mengjie Liu, Wenyu Wang, Wenzheng Yu. The variation of DOM during long distance water transport by the China South to North Water Diversion Scheme and impact on drinking water treatment[J]. Front. Environ. Sci. Eng., 2024, 18(5): 59-.
[3] Marvin Yeung, Lan Tian, Yuhong Liu, Hairong Wang, Jinying Xi. Impacts of electrochemical disinfection on the viability and structure of the microbiome in secondary effluent water[J]. Front. Environ. Sci. Eng., 2024, 18(5): 58-.
[4] Qiannan Duan, Pengwei Yan, Yichen Feng, Qianru Wan, Xiaoli Zhu. Machine learning assisted adsorption performance evaluation of biochar on heavy metal[J]. Front. Environ. Sci. Eng., 2024, 18(5): 55-.
[5] Yanpeng Huang, Chao Wang, Yuanhao Wang, Guangfeng Lyu, Sijie Lin, Weijiang Liu, Haobo Niu, Qing Hu. Application of machine learning models in groundwater quality assessment and prediction: progress and challenges[J]. Front. Environ. Sci. Eng., 2024, 18(3): 29-.
[6] Yang Zhang, Mei Lei, Kai Li, Tienan Ju. Spatial prediction of soil contamination based on machine learning: a review[J]. Front. Environ. Sci. Eng., 2023, 17(8): 93-.
[7] Zhongyao Liang, Yaoyang Xu, Gang Zhao, Wentao Lu, Zhenghui Fu, Shuhang Wang, Tyler Wagner. Approaching the upper boundary of driver-response relationships: identifying factors using a novel framework integrating quantile regression with interpretable machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(6): 76-.
[8] Yirong Hu, Wenjie Du, Cheng Yang, Yang Wang, Tianyin Huang, Xiaoyi Xu, Wenwei Li. Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique[J]. Front. Environ. Sci. Eng., 2023, 17(5): 55-.
[9] Rui Liang, Chao Chen, Akash Kumar, Junyu Tao, Yan Kang, Dong Han, Xianjia Jiang, Pei Tang, Beibei Yan, Guanyi Chen. State-of-the-art applications of machine learning in the life cycle of solid waste management[J]. Front. Environ. Sci. Eng., 2023, 17(4): 44-.
[10] Xin Zhou, Xiaoya Ren, Yu Chen, Haopeng Feng, Jiangfang Yu, Kang Peng, Yuying Zhang, Wenhao Chen, Jing Tang, Jiajia Wang, Lin Tang. Bacteria inactivation by sulfate radical: progress and non-negligible disinfection by-products[J]. Front. Environ. Sci. Eng., 2023, 17(3): 29-.
[11] Min Cheng, Zhiyuan Zhang, Shihui Wang, Kexin Bi, Kong-qiu Hu, Zhongde Dai, Yiyang Dai, Chong Liu, Li Zhou, Xu Ji, Wei-qun Shi. A large-scale screening of metal-organic frameworks for iodine capture combining molecular simulation and machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(12): 148-.
[12] Jin Xue, Fangting Wang, Kun Zhang, Hehe Zhai, Dan Jin, Yusen Duan, Elly Yaluk, Yangjun Wang, Ling Huang, Yuewu Li, Thomas Lei, Qingyan Fu, Joshua S. Fu, Li Li. Elucidate long-term changes of ozone in Shanghai based on an integrated machine learning method[J]. Front. Environ. Sci. Eng., 2023, 17(11): 138-.
[13] Weishuai Li, Jingang Huang, Zhuoer Shi, Wei Han, Ting Lü, Yuanyuan Lin, Jianfang Meng, Xiaobing Xu, Pingzhi Hou. Machine learning enabled prediction and process optimization of VFA production from riboflavin-mediated sludge fermentation[J]. Front. Environ. Sci. Eng., 2023, 17(11): 135-.
[14] Yufeng Liao, Mengmeng Tang, Mengyuan Li, Peng Shi, Aimin Li, Yangyang Zhang, Yang Pan. Control strategies for disinfection byproducts by ion exchange resin, nanofiltration and their sequential combination[J]. Front. Environ. Sci. Eng., 2023, 17(10): 125-.
[15] Haoyang Xian, Pinjing He, Dongying Lan, Yaping Qi, Ruiheng Wang, Fan Lü, Hua Zhang, Jisheng Long. Predicting the elemental compositions of solid waste using ATR-FTIR and machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(10): 121-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed