Please wait a minute...
Frontiers of Environmental Science & Engineering

ISSN 2095-2201

ISSN 2095-221X(Online)

CN 10-1013/X

Postal Subscription Code 80-973

2018 Impact Factor: 3.883

Front. Environ. Sci. Eng.    2024, Vol. 18 Issue (5) : 54    https://doi.org/10.1007/s11783-024-1814-5
A benchmark-based method for evaluating hyperparameter optimization techniques of neural networks for surface water quality prediction
Xuan Wang1,2,3, Yan Dong1,2, Jing Yang1,2,3, Zhipeng Liu1,2, Jinsuo Lu1,2,3()
1. School of Environmental and Municipal Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
2. Shaanxi Key Laboratory of Environmental Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
3. State Key Laboratory of Green Building in West China, Xi’an University of Architecture and Technology, Xi’an 710055, China
 Download: PDF(3404 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

● Manually adjustment of hyperparameters is highly random and computational expensive.

● Five HPO techniques were implemented in surface water quality prediction NN models.

● The proposed benchmark-based method for HPO evaluation is feasible and robust.

● TPE-based BO was the recommended HPO method for its satisfactory performance.

Neural networks (NNs) have been used extensively in surface water prediction tasks due to computing algorithm improvements and data accumulation. An essential step in developing an NN is the hyperparameter selection. In practice, it is common to manually determine hyperparameters in the studies of NNs in water resources tasks. This may result in considerable randomness and require significant computation time; therefore, hyperparameter optimization (HPO) is essential. This study adopted five representatives of the HPO techniques in the surface water quality prediction tasks, including the grid sampling (GS), random search (RS), genetic algorithm (GA), Bayesian optimization (BO) based on the Gaussian process (GP), and the tree Parzen estimator (TPE). For the evaluation of these techniques, this study proposed a method: first, the optimal hyperparameter value sets achieved by GS were regarded as the benchmark; then, the other HPO techniques were evaluated and compared with the benchmark in convergence, optimization orientation, and consistency of the optimized values. The results indicated that the TPE-based BO algorithm was recommended because it yielded stable convergence, reasonable optimization orientation, and the highest consistency rates with the benchmark values. The optimization consistency rates via TPE for the hyperparameters hidden layers, hidden dimension, learning rate, and batch size were 86.7%, 73.3%, 73.3%, and 80.0%, respectively. Unlike the evaluation of HPO techniques directly based on the prediction performance of the optimized NN in a single HPO test, the proposed benchmark-based HPO evaluation approach is feasible and robust.

Keywords Neural networks      Hyperparameter optimization      Surface water quality prediction      Bayes optimization      Genetic algorithm     
Corresponding Author(s): Jinsuo Lu   
Issue Date: 18 January 2024
 Cite this article:   
Xuan Wang,Yan Dong,Jing Yang, et al. A benchmark-based method for evaluating hyperparameter optimization techniques of neural networks for surface water quality prediction[J]. Front. Environ. Sci. Eng., 2024, 18(5): 54.
 URL:  
https://academic.hep.com.cn/fese/EN/10.1007/s11783-024-1814-5
https://academic.hep.com.cn/fese/EN/Y2024/V18/I5/54
Fig.1  Inputs and outputs of the surface water quality prediction NN, the number of lagged time steps equals one.
Hyperparameters HPO value range GS value set
Input steps {1, 6, 12} {1, 6, 12}
Hidden layers {1, 2, 3, 4, 5, 6, 7, 8} {1, 2, 3, 4, 5, 6, 7, 8}
Hidden dimension {10, 11, 12, …, 419, 420} {20, 60, 100, …, 380, 420}
Learning rate [0.0001, 0.003] {0.0001, 0.0003, 0.001, 0.003, 0.01}
Batch size {2, 4, 8, 16, 32, 64} {2, 4, 8, 16, 32, 64}
Tab.1  Hyperparameters and the HPO value ranges
Fig.2  Boxplot of the NNs’ validation MSE with different hyperparameter values, site Luzhou (Caption: lower and upper limit of the whisker refer to 5%–95% of the data). (a) Hidden layers, (b) hidden dimension, (c) learing rate, (d) batch size.
Fig.3  Site Luzhou, WQ variables series predicted by the optimal NN models.
Site WQ variable RMSE MAE MAPE (%) R2
RS GA GP TPE RS GA GP TPE RS GA GP TPE RS GA GP TPE
Anqing pH 0.14 0.17 0.08 0.20 0.12 0.12 0.06 0.17 1.6 1.5 0.8 2.2 0.997 0.996 0.999 0.995
DO 1.29 0.83 0.43 1.47 1.13 0.70 0.21 1.25 11.9 7.5 2.4 12.9 0.905 0.961 0.989 0.876
TOC 1.17 1.18 1.37 1.18 0.52 0.57 0.69 0.55 17.9 21.7 26.7 20.2 0.754 0.748 0.662 0.747
Yichang pH 0.17 0.08 0.09 0.11 0.15 0.05 0.07 0.07 1.9 0.7 0.9 1.0 0.996 0.999 0.999 0.999
DO 0.20 0.16 0.25 0.24 0.16 0.11 0.21 0.19 1.5 1.1 2.0 1.8 0.999 0.999 0.998 0.998
TOC 0.38 0.81 0.60 0.82 0.28 0.30 0.30 0.35 21.6 24.4 24.6 28.2 0.989 0.951 0.973 0.949
Nanning pH 0.14 0.09 0.14 0.13 0.12 0.07 0.11 0.10 1.7 0.9 1.5 1.4 0.998 0.999 0.998 0.999
DO 0.45 0.23 0.30 0.30 0.35 0.18 0.25 0.24 5.4 3.0 3.9 3.8 0.965 0.991 0.984 0.984
TOC 0.26 0.28 0.40 0.36 0.21 0.22 0.31 0.27 11.5 12.0 17.4 15.0 0.985 0.983 0.964 0.970
Jiujiang pH 0.20 0.12 0.23 0.14 0.15 0.09 0.20 0.11 2.1 1.3 2.8 1.5 0.990 0.996 0.986 0.995
DO 0.48 0.36 0.36 0.33 0.41 0.32 0.23 0.25 4.2 3.2 2.4 2.6 0.988 0.994 0.994 0.995
TOC 0.80 0.73 1.01 0.72 0.49 0.49 0.72 0.54 11.9 13.0 19.9 15.4 0.857 0.880 0.769 0.884
Luzhou pH 0.32 0.26 0.27 0.26 0.20 0.12 0.14 0.12 2.7 1.7 2.0 1.6 0.987 0.992 0.991 0.991
DO 0.84 0.82 0.86 0.92 0.36 0.39 0.41 0.58 3.1 3.4 3.7 5.3 0.976 0.977 0.975 0.971
TOC 0.36 0.38 0.45 0.38 0.24 0.25 0.34 0.25 14.0 15.0 19.9 14.7 0.988 0.986 0.981 0.986
Tab.2  Validation performance metrics of the optimal hyperparameters, the “input steps” equals 1, the minimum errors were in bold
Fig.4  Convergence curve of the three HPO processes (GA, GP, TPE), site Luzhou (Caption: in the figure of GA, the solid lines represent the population’s average, while the dotted lines represent the population’s optimum. In the figures of GP and TPE, the lines represent the 10-step moving average value to mitigate volatility).
Fig.5  Distribution of the sampling values and map of the optimization orientation for hidden layers by the three HPO methods, site Luzhou (Caption: the points’ color changes from red to blue represent the iteration number increased from one to the max, the same as below).
Fig.6  Distribution of the sampling values and map of the optimization orientation for learning rate by the three HPO methods, site Luzhou.
Site Input steps Hidden layers Hidden dimension Learning rate (10−3) Batch size
RS GA GP TPE RS GA GP TPE RS GA GP TPE RS GA GP TPE
Anqing 1 2 2 1 4 155 316 107 352 1.7 1.4 3.0 0.3 16 64 64 16
6 2 2 4 2 385 50 30 45 1.4 2.0 0.6 1.0 16 32 4 16
12 2 4 3 2 142 420 31 76 1.6 0.4 1.6 2.3 4 32 8 64
Yichang 1 1 1 1 1 86 57 39 37 1.0 1.0 3.0 2.5 4 16 64 32
6 4 7 5 4 246 218 386 314 0.5 1.8 1.5 0.6 16 64 64 64
12 2 7 8 4 33 58 85 398 1.6 1.0 1.9 0.3 64 64 64 32
Nanning 1 2 1 1 1 112 65 32 78 0.4 0.4 3.0 1.4 4 8 64 32
6 1 2 1 2 47 78 67 41 1.1 2.5 3.0 0.6 16 64 64 4
12 2 2 8 1 123 94 409 47 0.3 2.1 0.1 0.4 2 64 64 2
Jiujiang 1 1 1 1 1 53 50 115 41 1.2 0.7 3.0 0.8 4 8 64 2
6 2 2 1 2 91 188 70 93 0.3 3.0 0.5 0.6 4 64 2 32
12 1 1 1 1 80 65 30 31 2.3 2.7 1.2 1.4 32 16 16 4
Luzhou 1 1 1 1 1 48 56 63 110 1.6 1.7 3.0 1.1 8 32 64 8
6 1 1 1 1 103 235 45 92 0.8 1.9 2.9 1.7 4 16 64 16
12 1 2 1 1 14 178 19 62 1.1 0.7 1.2 1.7 2 16 16 16
Consistency rate (%) 86.7 86.7 60.0 86.7 46.7 66.7 53.3 73.3 73.3 46.7 33.3 73.3 66.7 66.7 33.3 80.0
Tab.3  Optimized hyperparameters of the HPO methods and the consistency rates with the benchmark value ranges (The consistent ones were in bold)
Fig.7  The NN models’ (a) optimized hyperparameters and (b) the corresponding performance (estimated by validation MSE) optimized by the GA, GP, and TPE each repeated three times.
1 H Afzaal, A A Farooque, F Abbas, B Acharya, T Esau. (2020). Groundwater estimation from major physical hydrology components using artificial neural networks and deep learning. Water, 12(1): 5
2 C Audet, M Kokkolaras. (2016). Blackbox and derivative-free optimization: Theory, algorithms and applications. Optimization and Engineering, 17(1): 1–2
https://doi.org/10.1007/s11081-016-9307-4
3 J Bergstra, R Bardenet, Y Bengio, B Kégl (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Granada, December 12–15, 2011
4 J Bergstra, Y Bengio. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13: 281–305
5 J Bergstra, D Yamins, D Cox (2013). Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference. Scipy, Austin, June 24–29, 2013
6 L Davis (1991). Handbook of Genetic Algorithms. New York: Thomson Publishing Group
7 J Diez-Sierra, M del Jesus. (2020). Long-term rainfall prediction using atmospheric synoptic patterns in semi-arid climates with statistical and machine learning methods. Journal of Hydrology, 586(1): 124789
https://doi.org/10.1016/j.jhydrol.2020.124789
8 S S Du, B Poczós, X Zhai, A Singh (2019). Gradient descent provably optimizes over-parameterized neural networks. 7th International Conference on Learning Representations, ICLR 2019, 1–19. New Orleans, May 6th–9th, 2019
9 X Fu, Q Zheng, G Jiang, K Roy, L Huang, C Liu, K Li, H Chen, X Song, J Chen. et al.. (2023). Water quality prediction of copper-molybdenum mining-beneficiation wastewater based on the PSO-SVR model. Frontiers of Environmental Science & Engineering, 17(8): 98
https://doi.org/10.1007/s11783-023-1698-9
10 S Galelli, G B Humphrey, H R Maier, A Castelletti, G C Dandy, M S Gibbs. (2014). An evaluation framework for input variable selection algorithms for environmental data-driven models. Environmental Modelling & Software, 62: 33–51
https://doi.org/10.1016/j.envsoft.2014.08.015
11 K Greff, R K Srivastava, J Koutník, B R Steunebrink, J Schmidhuber. (2017). LSTM: a search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10): 2222–2232
https://doi.org/10.1109/TNNLS.2016.2582924
12 H Hong, P Tsangaratos, I Ilia, C Loupasakis, Y Wang. (2020). Introducing a novel multi-layer perceptron network based on stochastic gradient descent optimized by a meta-heuristic algorithm for landslide susceptibility mapping. Science of the Total Environment, 742: 140549
https://doi.org/10.1016/j.scitotenv.2020.140549
13 G Kang, J Z Gao, G Xie (2017). Data-driven water quality analysis and prediction: a survey. In: Proceedings of 3rd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2017, 224–232
14 D P, Kingma J Ba (2015). Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representation (ICLR), 2015. San Diego, May 7–9, 2015
15 G B Kingston, M F Lambert, H R Maier. (2005). Bayesian training of artificial neural networks used for water resources modeling. Water Resources Research, 41(12): 2005WR004152
https://doi.org/10.1029/2005WR004152
16 G B Kingston, H R Maier, M F Lambert. (2008). Bayesian model selection applied to artificial neural networks used for water resources modeling. Water Resources Research, 44(4): 2007WR006155
https://doi.org/10.1029/2007WR006155
17 A Klein, S Falkner, S Bartels, P Hennig, F Hutter (2017). Fast Bayesian optimization of machine learning hyperparameters on large datasets. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, April 20–22, 2017
18 A Krizhevsky, I Sutskever, G E Hinton. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84–90
https://doi.org/10.1145/3065386
19 J Li, Z Chen, X Li, X Yi, Y Zhao, X He, Z Huang, M A Hassaan, A El Nemr, M Huang. (2023). Water quality soft-sensor prediction in anaerobic process using deep neural network optimized by Tree-structured Parzen Estimator. Frontiers of Environmental Science & Engineering, 17(6): 67
https://doi.org/10.1007/s11783-023-1667-3
20 Q Li, F Dietrich, E M Bollt, I G Kevrekidis. (2017). Extended dynamic mode decomposition with dictionary learning: a data-driven adaptive spectral decomposition of the Koopman operator. Chaos, 27(10): 103111
https://doi.org/10.1063/1.4993854
21 X Li, A C Zecchin, H R Maier. (2015). Improving partial mutual information-based input variable selection by consideration of boundary issues associated with bandwidth estimation. Environmental Modelling & Software, 71: 78–96
https://doi.org/10.1016/j.envsoft.2015.05.013
22 J Ma, J C P Cheng, C Lin, Y Tan, J Zhang. (2019). Improving air quality prediction accuracy at larger temporal resolutions using deep learning and transfer learning techniques. Atmospheric Environment, 214(8): 116885
https://doi.org/10.1016/j.atmosenv.2019.116885
23 H R Maier, A Jain, G C Dandy, K P Sudheer. (2010). Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions. Environmental Modelling & Software, 25(8): 891–909
https://doi.org/10.1016/j.envsoft.2010.02.003
24 R Martinez-Cantin. (2015). BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits. Journal of Machine Learning Research, 15: 3735–3739
25 N J Mount, H R Maier, E Toth, A Elshorbagy, D Solomatine, F J Chang, R J Abrahart. (2016). Data-driven modelling approaches for socio-hydrology: opportunities and challenges within the Panta Rhei Science Plan. Hydrological Sciences Journal, 61(7): 1192–1208
https://doi.org/10.1080/02626667.2016.1159683
26 J Müller, J Park, R Sahu, C Varadharajan, B Arora, B Faybishenko, D Agarwal (2021). Surrogate optimization of deep neural networks for groundwater predictions. Journal of Global Optimization, 81(1), 203–231
27 A Najah, A El-Shafie, O A Karim, A H El-Shafie. (2013). Application of artificial neural networks for water quality prediction. Neural Computing & Applications, 22(S1): 187–201
https://doi.org/10.1007/s00521-012-0940-3
28 F Noé, F Nüske. (2013). A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling & Simulation, 11(2): 635–655
https://doi.org/10.1137/110858616
29 V Nourani, B Pradhan, H Ghaffari, S S Sharifi. (2014). Landslide susceptibility mapping at Zonouz Plain, Iran using genetic programming and comparison with frequency ratio, logistic regression, and artificial neural network models. Natural Hazards, 71(1): 523–547
https://doi.org/10.1007/s11069-013-0932-3
30 Faruk D Ömer. (2010). A hybrid neural network and ARIMA model for water quality time series prediction. Engineering Applications of Artificial Intelligence, 23(4): 586–594
https://doi.org/10.1016/j.engappai.2009.09.015
31 Y Ozaki, Y Tanigaki, S Watanabe, M Onishi (2020). Multiobjective tree-structured parzen estimator for computationally expensive optimization problems. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. New York, July 8–12, 2020
32 J Rodriguez-Perez, C Leigh, B Liquet, C Kermorvant, E Peterson, D Sous, K Mengersen. (2020). Detecting technical anomalies in high-frequency water-quality data using artificial neural networks. Environmental Science & Technology, 54(21): 13719–13730
https://doi.org/10.1021/acs.est.0c04069
33 G Rong, K Li, Y Su, Z Tong, X Liu, J Zhang, Y Zhang, T Li. (2021). Comparison of Tree-structured Parzen estimator optimization in three typical neural network models for landslide susceptibility assessment. Remote Sensing, 13(22): 4694
https://doi.org/10.3390/rs13224694
34 J Schmidhuber. (2015). Deep learning in neural networks: an overview. Neural Networks, 61: 85–117
https://doi.org/10.1016/j.neunet.2014.09.003
35 J Snoek, H Larochelle, R P Adams. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 4: 2951–2959
36 M R Spiegel (2018). Schaum’s Outlines Statistics. McGraw-Hill Education New York​
37 N Srivastava, G Hinton, A Krizhevsky, I Sutskever, R Salakhutdinov. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1): 1929–1958
38 W Tian, Z Liao, X Wang. (2019). Transfer learning for neural network model in chlorophyll-a dynamics prediction. Environmental Science and Pollution Research International, 26(29): 29857–29871
https://doi.org/10.1007/s11356-019-06156-0
39 W Tian, H Wu. (2021). Kernel Embedding based Variational Approach for Low-dimensional Approximation of Dynamical Systems. Computational Methods in Applied Mathematics, 21(3): 635–659
https://doi.org/10.1515/cmam-2020-0130
40 T Tiyasha, T M Tung, Z M Yaseen. (2020). A survey on river water quality modelling using artificial intelligence models: 2000–2020. Journal of Hydrology, 585(2): 124670
https://doi.org/10.1016/j.jhydrol.2020.124670
41 M Valipour, M E Banihabib, S M R Behbahani. (2013). Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez Dam Reservoir. Journal of Hydrology, 476: 433–441
https://doi.org/10.1016/j.jhydrol.2012.11.017
42 G Wang, X Lei, W Chen, H Shahabi, A Shirzadi. (2020). Hybrid computational intelligence methods for landslide susceptibility mapping. Symmetry, 12(3): 325
https://doi.org/10.3390/sym12030325
43 X Wang, W Tian, Z Liao. (2022). Framework for hyperparameter impact analysis and selection for water resources feedforward neural network. Water Resources Management, 36(11): 4201–4217
https://doi.org/10.1007/s11269-022-03248-4
44 Z Wang, Q Wang, T Wu. (2023). A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Frontiers of Environmental Science & Engineering, 17(7): 88
https://doi.org/10.1007/s11783-023-1688-y
45 S Watanabe (2023). Tree-structured parzen estimator: understanding its algorithm components and their roles for better empirical performance. arXiv: 2304.11127
46 W Wu, G C Dandy, H R Maier. (2014). Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environmental Modelling & Software, 54: 108–127
https://doi.org/10.1016/j.envsoft.2013.12.016
47 W Wu, R J May, H R Maier, G C Dandy. (2013). A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks. Water Resources Research, 49(11): 7598–7614
https://doi.org/10.1002/2012WR012713
48 K Yang, K Van Der Blom, T Bäck, M Emmerich (2019). Towards single- and multiobjective Bayesian global optimization for mixed integer problems. AIP Conference Proceedings, 2070(2): 020044-1–020044-4 10.1063/1.5090011
49 S R Young, D C Rose, T P Karnowski, S H Lim, R M Patton (2015). Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of MLHPC 2015: Machine Learning in High-Performance Computing Environments−Held in Conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. Austin, Texas, November 15–20, 2015
50 J Zhang, Y Zhu, X Zhang, M Ye, J Yang. (2018). Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas. Journal of Hydrology, 561: 918–929
https://doi.org/10.1016/j.jhydrol.2018.04.065
[1] FSE-23113-OF-WX_suppl_1 Download
[1] Ziming Zhao, Wenjun Sun, Madhumita B. Ray, Ajay K Ray, Tianyin Huang, Jiabin Chen. Optimization and modeling of coagulation-flocculation to remove algae and organic matter from surface water by response surface methodology[J]. Front. Environ. Sci. Eng., 2019, 13(5): 75-.
[2] Yue HUANG,Xin DONG,Siyu ZENG,Jining CHEN. An integrated model for structure optimization and technology screening of urban wastewater systems[J]. Front. Environ. Sci. Eng., 2015, 9(6): 1036-1048.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed