Please wait a minute...
Frontiers of Environmental Science & Engineering

ISSN 2095-2201

ISSN 2095-221X(Online)

CN 10-1013/X

Postal Subscription Code 80-973

2018 Impact Factor: 3.883

Front. Environ. Sci. Eng.    2023, Vol. 17 Issue (12) : 152    https://doi.org/10.1007/s11783-023-1752-7
RESEARCH ARTICLE
Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies
Pengxiao Zhou1, Zhong Li1(), Yimei Zhang2,3(), Spencer Snowling4, Jacob Barclay4
1. Department of Civil Engineering, McMaster University, Hamilton, Ontario L8S 4L8, Canada
2. MOE Key Laboratory of Resources and Environmental System Optimization, College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China
3. Laboratory of Environmental Remediation and Functional Material, Suzhou Research Academy of North China Electric Power University, Suzhou 215213, China
4. Hatch Ltd., Sheridan Science & Technology Park, Mississauga, Ontario L5K 2R7, Canada
 Download: PDF(3221 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

● Online learning models accurately predict influent flow rate at wastewater plants.

● Models adapt to changing input-output relationships and are friendly to large data.

● Online learning models outperform conventional batch learning models.

● An optimal prediction strategy is identified through uncertainty analysis.

● The proposed models provide support for coping with emergencies like COVID-19.

Accurate influent flow rate prediction is important for operators and managers at wastewater treatment plants (WWTPs), as it is closely related to wastewater characteristics such as biochemical oxygen demand (BOD), total suspend solids (TSS), and pH. Previous studies have been conducted to predict influent flow rate, and it was proved that data-driven models are effective tools. However, most of these studies have focused on batch learning, which is inadequate for wastewater prediction in the era of COVID-19 as the influent pattern changed significantly. Online learning, which has distinct advantages of dealing with stream data, large data set, and changing data pattern, has a potential to address this issue. In this study, the performance of conventional batch learning models Random Forest (RF), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP), and their respective online learning models Adaptive Random Forest (aRF), Adaptive K-Nearest Neighbors (aKNN), and Adaptive Multi-Layer Perceptron (aMLP), were compared for predicting influent flow rate at two Canadian WWTPs. Online learning models achieved the highest R2, the lowest MAPE, and the lowest RMSE compared to conventional batch learning models in all scenarios. The R2 values on testing data set for 24-h ahead prediction of the aRF, aKNN, and aMLP at Plant A were 0.90, 0.73, and 0.87, respectively; these values at Plant B were 0.75, 0.78, and 0.56, respectively. The proposed online learning models are effective in making reliable predictions under changing data patterns, and they are efficient in dealing with continuous and large influent data streams. They can be used to provide robust decision support for wastewater treatment and management in the changing era of COVID-19 and also under other unprecedented emergencies that could change influent patterns.

Keywords Wastewater prediction      Data stream      Online learning      Batch learning      Influent flow rates     
Corresponding Author(s): Zhong Li,Yimei Zhang   
Issue Date: 24 July 2023
 Cite this article:   
Pengxiao Zhou,Zhong Li,Yimei Zhang, et al. Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies[J]. Front. Environ. Sci. Eng., 2023, 17(12): 152.
 URL:  
https://academic.hep.com.cn/fese/EN/10.1007/s11783-023-1752-7
https://academic.hep.com.cn/fese/EN/Y2023/V17/I12/152
Fig.1  Schema of the experiments (left: batch learning models; right: online learning models).
Plant APlant B
24-h aheadNo lead time24-h aheadNo lead time
R2MAPE (%)RMSE (m3/h)R2MAPE (%)RMSE (m3/h)R2MAPE (%)RMSE (MLD)R2MAPE (%)RMSE (MLD)
RF0.797.595663.730.787.795764.240.1732.35522.340.1730.32498.00
KNN0.5111.848901.190.5311.628721.910.0727.78383.110.0926.45372.18
MLP0.759.686714.230.778.956296.410.2429.84483.290.2529.96466.62
aRF0.907.354895.250.907.404905.840.7514.82206.320.7714.59201.58
aKNN0.738.676342.320.738.686348.840.7810.99181.020.7711.48184.46
aMLP0.875.564424.910.885.174193.200.5622.83252.120.6817.96221.03
Tab.1  Performance metrics for each model by plant and scenario (best performance in bold)
Fig.2  Scatter plots for 24-h ahead predictions from each model on testing data set: (a) online learning models for Plant A, (b) batch learning models for Plant A, (c) online learning models for Plant B, and (d) batch learning models for Plant B.
Fig.3  Performance comparison of online learning methods and batch learning methods: (a) histograms of prediction errors at Plant A, (b) boxplots of prediction errors at Plant A, (c) histograms of prediction errors at Plant B, and (d) boxplots of prediction errors at Plant B.
Fig.4  Performance comparison of online learning methods: (a) histograms of predictions at Plant A, (b) boxplots of predictions at Plant A, (c) histograms of predictions at Plant B, and (d) boxplots of predictions at Plant B.
Fig.5  Scatterplots of averaged online learning predictions versus observations at: (a) Plant A and (b) Plant B.
Fig.6  Comparison of cumulative density functions at (a) Plant A and (b) Plant B.
1 H Abu-Bakar, L, Williams S H Hallett (2021). Quantifying the impact of the COVID-19 lockdown on household water consumption patterns in England. npj Clean Water, 4: 1–9
https://doi.org/10.1038/s41545-021-00103-8
2 E Agirre-Basurko, G Ibarra-Berastegi, I Madariaga (2006). Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environmental Modelling & Software, 21(4): 430–446
https://doi.org/10.1016/j.envsoft.2004.07.008
3 N K Ahmed, A F Atiya,N E Gayar, H El-Shishiny (2010). An empirical comparison of machine learning models for time series forecasting. Econometric Reviews, 29(5–6): 594–621
https://doi.org/10.1080/07474938.2010.481556
4 V, Alfano S Ercolano (2020). The efficacy of lockdown against COVID-19: a cross-country panel analysis. Applied Health Economics and Health Policy, 18: 509–517
https://doi.org/10.1007/s40258-020-00596-3
5 M Andreides, P Dolejš, J Bartáček (2022). The prediction of WWTP influent characteristics: good practices and challenges. Journal of Water Process Engineering, 49: 103009
https://doi.org/10.1016/j.jwpe.2022.103009
6 M Ansari, F Othman, T Abunama, A El-Shafie (2018). Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia. Environmental Science and Pollution Research International, 25(12): 12139–12149
https://doi.org/10.1007/s11356-018-1438-z
7 H Bechmann, M K Nielsen, H Madsen, N Kjølstad Poulsen (1999). Grey-box modelling of pollutant loads from a sewer system. Urban Water, 1(1): 71–78
https://doi.org/10.1016/S1462-0758(99)00007-2
8 A Bifet, R Gavalda (2007). Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, pp. 443–448
9 G Boyd, D Na, Z Li, S Snowling, Q Zhang, P Zhou (2019). Influent forecasting for wastewater treatment plants in North America. Sustainability, 11(6): 1764
https://doi.org/10.3390/su11061764
10 L Breiman (2001). Random forests. Machine Learning, 45(1): 5–32
https://doi.org/10.1023/A:1010933404324
11 D Bzdok , M Krzywinski , N Altman . (2018). Machine learning: supervised methods. Nature Methods, 15(1): 5–6
https://doi.org/10.1038/nmeth.4551
12 R Caruana , A Niculescu-Mizil . (2006). An empirical comparison of supervised learning algorithms. ACM International Conference Proceeding Series, 148: 161–168
https://doi.org/10.1145/1143844.1143865
13 P Domingos, G Hulten (2000). Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80
14 Ó Fontenla-Romero, B Guijarro-Berdiñas, D Martinez-Rego, B Pérez-Sánchez, D Peteiro-Barral (2013). Online machine learning. In: Efficiency and Scalability Methods for Computational Intellect, IGI Global, pp. 27–54
15 S Gautam , L Hens . (2020). COVID-19: impact by and on the environment, health and economy. Environment, Development and Sustainability, 22(6): 4953–4954
https://doi.org/10.1007/s10668-020-00818-7
16 H M Gomes, J P Barddal, L E B Ferreira, A Bifet (2018). Adaptive random forests for data stream regression. In: ESANN
17 H M Gomes, A Bifet, J Read, J P Barddal, F Enembreck, B Pfharinger, G Holmes, T Abdessalem (2017). Adaptive random forests for evolving data stream classification. Machine Learning, 106(9–10): 1469–1495
https://doi.org/10.1007/s10994-017-5642-3
18 L S Hillary, K Farkas, K H Maher, A Lucaci, J Thorpe, M A Distaso, W H Gaze, S Paterson, T Burke, T R Connor, J E McDonald, S K Malham, D L Jones (2021). Monitoring SARS-CoV-2 in municipal wastewater to evaluate the success of lockdown measures for controlling COVID-19 in the UK. Water Research, 200, 117214
https://doi.org/10.1016/j.watres.2021.117214
19 S C H Hoi , D Sahoo , J Lu , P Zhao . (2021). Online learning: a comprehensive survey. Neurocomputing, 459: 249–289
https://doi.org/10.1016/j.neucom.2021.04.112
20 S C H Hoi , J Wang , P Zhao . (2014). Libol: a library for online learning algorithms. Journal of Machine Learning Research, 15: 495–499
21 L C Jain, M Seera, C P Lim, P Balasubramaniam (2014). A review of online learning in supervised neural networks. Neural Computing & Applications, 25(3–4): 491–509
https://doi.org/10.1007/s00521-013-1534-4
22 I Khan , D Shah , S S Shah . (2021). COVID-19 pandemic and its positive impacts on environment: an updated review. International Journal of Environmental Science and Technology, 18(2): 521–530
https://doi.org/10.1007/s13762-020-03021-3
23 M Kim , Y Kim , H Kim , W Piao , C Kim . (2016). Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Frontiers of Environmental Science & Engineering, 10(2): 299–310
https://doi.org/10.1007/s11783-015-0825-7
24 D J Kovacs, Z Li, B W Baetz, Y Hong, S Donnaz, X Zhao, P Zhou, H Ding, Q Dong (2022). Membrane fouling prediction and uncertainty analysis using machine learning: a wastewater treatment plant case study. Journal of Membrane Science, 660: 120817
https://doi.org/10.1016/j.memsci.2022.120817
25 S Ma, S Zeng, X Dong, J Chen, Olsson G (2014). Short-term prediction of influent flow rate and ammonia concentration in municipal wastewater treatment plants. Frontiers of Environmental Science & Engineering, 8, 128–136
https://doi.org/10.1007/s11783-013-0598-9
26 J Montiel , J Read , A Bifet , T Abdessalem . (2018). Scikit-multiflow: a multi-output streaming framework. Journal of Machine Learning Research, 19: 2914–2915
27 M Nemati , D Tran . (2022). The impact of COVID-19 on urban water consumption in the United States. Water, 14: 3096
https://doi.org/10.3390/w14193096
28 F Pedregosa, G Varoquaux, A Gramfort, V Michel, B Thirion, O Grisel, M Blondel, P Prettenhofer, R, Weiss V Dubourg (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12: 2825–2830
29 Z Pu , J Yan , L Chen , Z Li , W Tian , T Tao , K Xin . (2023). A hybrid Wavelet-CNN-LSTM deep learning model for short-term urban water demand forecasting. Frontiers of Environmental Science & Engineering, 17(2): 22
https://doi.org/10.1007/s11783-023-1622-3
30 S H Safaei , S Young , Z Samimi , F Parvizi , A Shokrollahi , M Baniamer . (2022). Technology development for the removal of Covid-19 pharmaceutical active compounds from water and wastewater: a review. Journal of Environmental Informatics, 40(2): 141–156
31 K Taunk, S De, S Verma, A Swetapadma (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, 1255–1260
https://doi.org/10.1109/ICCS45141.2019.9065747
32 Z Wang , Q Wang , T Wu . (2023). A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Frontiers of Environmental Science & Engineering, 17(7): 88
https://doi.org/10.1007/s11783-023-1688-y
33 X Wei , A Kusiak . (2015). Short-term prediction of influent flow in wastewater treatment plant. Stochastic Environmental Research and Risk Assessment, 29(1): 241–249
https://doi.org/10.1007/s00477-014-0889-0
34 X Wei , A Kusiak , H R Sadat . (2013). Prediction of influent flow rate: data-mining approach. Journal of Energy Engineering, 139(2): 118–123
https://doi.org/10.1061/(ASCE)EY.1943-7897.0000103
35 Q Zhang , Z Li , S Snowling , A Siam , W El-Dakhakhni . (2019). Predictive models for wastewater flow forecasting based on time series analysis and artificial neural network. Water Science and Technology, 80(2): 243–253
https://doi.org/10.2166/wst.2019.263
36 P Zhou , Z Li , S Snowling , B W Baetz , D Na , G Boyd . (2019a). A random forest model for inflow prediction at wastewater treatment plants. Stochastic Environmental Research and Risk Assessment, 33(10): 1781–1792
https://doi.org/10.1007/s00477-019-01732-9
37 P Zhou , Z Li , S Snowling , R Goel , Q Zhang . (2019b). Short-term wastewater influent prediction based on random forests and multi-layer perceptron. Journal of Environmental Informatics Letters, 1: 87–93
38 P Zhou , Z Li , S Snowling , R Goel , Q Zhang . (2022). Multi-step ahead prediction of hourly influent characteristics for wastewater treatment plants: a case study from North America. Environmental Monitoring and Assessment, 194(5): 1–14
https://doi.org/10.1007/s10661-022-09957-y
39 J Zhu , P R Anderson . (2019). Performance evaluation of the ISMLR package for predicting the next day’s influent wastewater flowrate at Kirie WRP. Water Science and Technology, 80(4): 695–706
https://doi.org/10.2166/wst.2019.309
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed