A neural network-based production process modeling and variable importance analysis approach in corn to sugar factory
Yi Tong1, Mou Shu2, Mingxin Li3, Yingwei Liu2, Ran Tao3, Congcong Zhou3, You Zhao1, Guoxing Zhao1, Yi Li1(), Yachao Dong3, Lei Zhang3, Linlin Liu3, Jian Du3()
1. COFCO Biotechnology Co., Ltd., Beijing 100005, China 2. COFCO Nutrition and Health Research Institute Co., Ltd., Beijing 102209, China 3. Institute of Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
Corn to sugar process has long faced the risks of high energy consumption and thin profits. However, it’s hard to upgrade or optimize the process based on mechanism unit operation models due to the high complexity of the related processes. Big data technology provides a promising solution as its ability to turn huge amounts of data into insights for operational decisions. In this paper, a neural network-based production process modeling and variable importance analysis approach is proposed for corn to sugar processes, which contains data preprocessing, dimensionality reduction, multilayer perceptron/convolutional neural network/recurrent neural network based modeling and extended weights connection method. In the established model, dextrose equivalent value is selected as the output, and 654 sites from the DCS system are selected as the inputs. LASSO analysis is first applied to reduce the data dimension to 155, then the inputs are dimensionalized to 50 by means of genetic algorithm optimization. Ultimately, variable importance analysis is carried out by the extended weight connection method, and 20 of the most important sites are selected for each neural network. The results indicate that the multilayer perceptron and recurrent neural network models have a relative error of less than 0.1%, which have a better prediction result than other models, and the 20 most important sites selected have better explicable performance. The major contributions derived from this work are of significant aid in process simulation model with high accuracy and process optimization based on the selected most important sites to maintain high quality and stable production for corn to sugar processes.
. [J]. Frontiers of Chemical Science and Engineering, 2023, 17(3): 358-371.
Yi Tong, Mou Shu, Mingxin Li, Yingwei Liu, Ran Tao, Congcong Zhou, You Zhao, Guoxing Zhao, Yi Li, Yachao Dong, Lei Zhang, Linlin Liu, Jian Du. A neural network-based production process modeling and variable importance analysis approach in corn to sugar factory. Front. Chem. Sci. Eng., 2023, 17(3): 358-371.
Liquid level of inlet preconcentration separator (mm)
X108
LIA2103_7
Liquid level of 7# saccharification tank (mm)
X34
PID1\PICA_3-PV
The pressure of steam at two kilograms (kPa)
X49
PID1\LIC_502A-PV
Liquid level of five-effect evaporator condensate water in set 1 (mm)
X32
PRESSURE1\PIA_2110_4
Pressure of 4# starch induced draft fan (kPa)
X24
SO2 content of corn after soaking
SO2 content of corn after soaking (ppm)
X36
TEMPR1\TIA_3
The temperature of steam at two kilograms (°C)
X105
TI2103_5_2
Temperature of 5# mashing tank (°C)
X112
V1102 (5.5-6.2)
pH value of starch emulsion
X17
The moisture in 18.01–1 (%)
Water content of dried germ (%)
X50
PID1\LIC_502B-PV
Liquid level of five-effect evaporator condensate water in set 2 (mm)
X11
The moisture in 3# (%)
Water content after starch drying (%)
X109
LIA2103_8
Liquid level of 8# saccharification tank (mm)
X1
Content of moldy grains
Content of moldy grains (%)
X58
PID1\TICA_1403_7-PV
Temperature of 7# steeping tank (°C)
X70
LEVEL2\LIA_1694_1
Emergency tank level
Tab.1
Number
Name
Meaning
X19
The moisture in 20.01-1/2 (%)
Water content of fiber after drying (%)
X114
Flow rates of glucoamylase (g·min–1)
Flow rates of glucoamylase (g·min–1)
X58
PID1\TICA_1403_7-PV
Temperature of 7# soaking liquid circulating heater (°C)
X105
TI2103_5_2
Temperature of 5# saccharifying tank (°C)
X92
AI1102
pH value of starch emulsion tank
X54
LEVEL2\LIA_1401_3_2
Liquid level of 2# soaking tank (mm)
X56
PRESSURE1\PIA_201A
Pressure of two-effect evaporator material liquid in set 1 (kPa)
X25
Top flow dry matter in 15.95–1 (%)
Dry matter content in Preconcentration centrifuge top stream (%)
X53
LEVEL1\LT_1401_3_9
Liquid level of 9# soaking tank (mm)
X107
TI2103_6_2
Temperature of 6 # saccharification tank (°C)
X73
LEVEL2\LIA_1658
Liquid level of waste liquid tank (mm)
X16
16.75 (Be)
Concentration of refined starch emulsion
X24
SO2 content of corn after soaking (ppm)
SO2 content of corn after soaking (ppm)
X10
The moisture in 2# (%)
Water content of 2# rotary valve discharge (%)
X23
The moisture of corn after soaking (%)
Water content of corn after soaking (%)
X104
TI2103_4_1
Temperature of 4 # saccharification tank (°C)
X7
Bonded starch D.S in 15.71 (%)
Content of bound starch in fiber screw extruder
X94
TC1107_1.MV
Temperature of primary flash uncondensable valve (°C)
X80
PID2\FIC_1541_2-PV
Flow rates of germ washing water in set 1 (t·h–1)
X59
TEMPR1\TI_301A
Temperture of three-effect evaporator material liquid in set 1 (K)
Tab.2
Number
Name
Meaning
X63
CURRENT1\CIA_15207
Current of 7# fine grinding facility (A)
X110
FIC2104_1
Outlet flow rates of clean saccharification fluid (kg·h–1)
X45
PID1\LIC1401_2_5-PV
Liquid level of 5# soaking tank (mm)
X13
15.05 (Be)
Degerming feed concentration in the first grand
X114
Flow rates of glucoamylase (g·min–1)
Flow rates of glucoamylase (g·min–1)
X21
Acidity of old acid (%)
Acidity of old acid (%)
X32
PRESSURE1\PIA_2110_4
Pressure of 4# starch induced draft fan (kPa)
X113
The dry matter
The dry matter content of liquefied liquid (%)
X78
CURRENT1\CIA_1569_5
Current of 5# fiber dehydration rotating sieve (A)
X33
PRESSURE1\PIA_2112_3
Pressure of 3# starch scraper conveying wind (kPa)
X51
PID1\LIC_302A-PV
Liquid level of three-effect evaporator condensate water in set 1 (mm)
X84
PRESSURE1\PIA_2001_1
Pressure of 1# fiber dryer (kPa)
X112
V1102 (5.5–6.2)
pH value of starch emulsion
X2
Fragment of grain (%)
Fragment of grain (%)
X87
TEMPR1\TE_2001_6
Temperature of exhaust gas in drying section (K)
X111
FIC2104_2
Outlet flow rates of turbid saccharification fluid (kg·h–1)
X79
PID1\FIC_2001_3-PV
The flowrates of 3# fiber dryer (g·s–1)
X67
FLOW2\FIA_1639
The flowrates of Level 12 washing step (g·s–1)
X64
CURRENT1\CIA_1671_2
Current of vice feed separator (A)
X81
FLOW1\FI_6
The flowrates of corn pulp (g·s–1)
Tab.3
Fig.11
Fig.12
1
A Kirmse, F Kuschicke, M Hoffmann. Industrial big data: from data to information to actions. 4th International Conference on Internet of Things. Big Data and Security, 2019,
2
W Tian, Y Ren, Y Dong, S Wang, L Bu. Fault monitoring based on mutual information feature engineering modeling in chemical process. Chinese Journal of Chemical Engineering, 2019, 27(10): 2491–2497 https://doi.org/10.1016/j.cjche.2018.11.008
3
K Kira, L A Rendell. The feature selection problem: traditional methods and a new algorithm. AAAI-92 Proceedings: Tenth National Conference on Artificial Intelligence, 1992, 129–134
H Malik, A K Yadav. A novel hybrid approach based on relief algorithm and fuzzy reinforcement learning approach for predicting wind speed. Sustainable Energy Technologies and Assessments, 2021, 43: 100920 https://doi.org/10.1016/j.seta.2020.100920
6
S Wold, M Sjostrom, L Eriksson. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 2001, 58(2): 109–130 https://doi.org/10.1016/S0169-7439(01)00155-1
7
H Li, Q Xu, Y Liang. Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. Analytica Chimica Acta, 2012, 740: 20–26 https://doi.org/10.1016/j.aca.2012.06.031
8
A Cutler, D R Cutler, J R Stevens. Random forests. Machine Learning, 2004, 45: 157–176
9
N Zavaljevski, F J Stevens, J Reifman. Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics, 2002, 18(5): 689–696 https://doi.org/10.1093/bioinformatics/18.5.689
10
Z Li, P Liu, W Wang, C Xu. Using support vector machine models for crash injury severity analysis. Accident; Analysis and Prevention, 2012, 45: 478–486 https://doi.org/10.1016/j.aap.2011.08.016
11
J D Olden, D A Jackson. Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling, 2002, 154(1–2): 135–150 https://doi.org/10.1016/S0304-3800(02)00064-9
12
Y H Yun, B C Deng, D S Cao, W T Wang, Y Z Liang. Variable importance analysis based on rank aggregation with applications in metabolomics for biomarker discovery. Analytica Chimica Acta, 2016, 911: 27–34 https://doi.org/10.1016/j.aca.2015.12.043
Y Dimopoulos, P Bourret, S Lek. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Processing Letters, 1995, 2(6): 1–4 https://doi.org/10.1007/BF02309007
15
I Dimopoulos, J Chronopoulos, A Chronopoulou-Sereli, S Lek. Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece). Ecological Modelling, 1999, 120(2–3): 157–165 https://doi.org/10.1016/S0304-3800(99)00099-X
16
G D Garson. Interpreting neural network connection weights. Artificial Intelligence Expert, 1991, 6: 47–51
17
M Scardi, L W Jr Harding. Developing an empirical model of phytoplankton primary production: a neural network case study. Ecological Modelling, 1999, 120(2–3): 213–223 https://doi.org/10.1016/S0304-3800(99)00103-9
18
S Lek, A Belaud, P Baran, I Dimopoulos, M Delacoste. Role of some environmental variables in trout abundance models using neural networks. Aquatic Living Resources, 1996, 9(1): 23–29 https://doi.org/10.1051/alr:1996004
19
S Lek, M Delacoste, P Baran, I Dimopoulos, J Lauga, S Aulagnier. Application of neural networks to modelling nonlinear relationships in ecology. Ecological Modelling, 1996, 90(1): 39–52 https://doi.org/10.1016/0304-3800(95)00142-5
20
G R Balls, D Palmer-Brown, G E Sanders. Investigating microclimatic influences on ozone injury in clover (Trifolium subterraneum) using artificial neural networks. New Phytologist, 1996, 132(2): 271–280 https://doi.org/10.1111/j.1469-8137.1996.tb01846.x
21
J Grahovac, A Jokic, J Dodic, D Vucurovic, S Dodic. Modelling and prediction of bioethanol production from intermediates and byproduct of sugar beet processing using neural networks. Renewable Energy, 2016, 85: 953–958 https://doi.org/10.1016/j.renene.2015.07.054
22
W R Hao, Z Z Lu, P F Wei, J Feng, B T Wang. A new method on ANN for variance based importance measure analysis of correlated input variables. Structural Safety, 2012, 38: 56–63 https://doi.org/10.1016/j.strusafe.2012.02.003
23
C R de Sa. Variance-based feature importance in neural networks. Discovery Science, 22nd International Conference, 2019, 306–315
24
M Hadzima-Nyarko, E K Nyarko, D Moric. A neural network based modelling and sensitivity analysis of damage ratio coefficient. Expert Systems with Applications, 2011, 38(10): 13405–13413 https://doi.org/10.1016/j.eswa.2011.04.169
25
P Cortez, M J Embrechts. Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 2013, 225: 1–17 https://doi.org/10.1016/j.ins.2012.10.039
26
E Hadjisolomou, K Stefanidis, G Papatheodorou, E Papastergiadou. Assessing the contribution of the environmental parameters to eutrophication with the use of the “PaD” and “PaD2” methods in a hypereutrophic lake. International Journal of Environmental Research and Public Health, 2016, 13(8): 764 https://doi.org/10.3390/ijerph13080764
27
B Yang, H Li. A novel convolutional neural network based approach to predictions of process dynamic time delay 286 sequences. Chemometrics and Intelligent Laboratory Systems, 2018, 174: 56–61 https://doi.org/10.1016/j.chemolab.2018.01.012
28
Y J Wang, H G Li. A novel intelligent modeling framework integrating the convolutional neural network with an adaptive time-series window and its application to industrial process operational optimization. Chemometrics and Intelligent Laboratory Systems, 2018, 179: 64–72 https://doi.org/10.1016/j.chemolab.2018.06.008
29
Y Wang, H Li. Industrial process time-series modeling based on adapted receptive field temporal convolution networks concerning multi-region operations. Computers & Chemical Engineering, 2020, 139: 106877 https://doi.org/10.1016/j.compchemeng.2020.106877
30
W Yang, C Yang, Z Y Hao, C Q Xie, M Z Li. Diagnosis of plant cold damage based on hyperspectral imaging and convolutional neural network. IEEE Access: Practical Innovations, Open Solutions, 2019, 7: 118239–118248 https://doi.org/10.1109/ACCESS.2019.2936892
31
Q Liu, L Zhang, K Tang, L Liu, J Du, Q Meng, R Gani. Machine learning-based atom contribution method for the prediction of charge density profiles and solvent design. AIChE Journal, 2021, 67(2): e17110 https://doi.org/10.1002/aic.17110
32
Q Liu, Y Jiang, L Zhang, J Du. A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship. Frontiers of Chemical Science and Engineering, 2022, 16(2): 152–167 https://doi.org/10.1007/s11705-021-2060-z
33
Z Chang, Y Zhang, W Chen. Electricity price prediction based on hybrid model of adam optimized LSTM neural network and wavelet transform. Energy, 2019, 187: 115804 https://doi.org/10.1016/j.energy.2019.07.134
34
M P Maples, D E Reichart, N C Konz, T A Berger, A S Trotter, J R Martin, D A Dutton, M L Paggen, R E Joyner, C P Salemi. Robust Chauvenet Outlier Rejection. Astrophysical Journal. Supplement Series, 2018, 238(1): 2 https://doi.org/10.3847/1538-4365/aad23d
35
G W Elko, M M Sondhi, J E West. Noise reduction processing arrangement for microphone arrays. Journal of the Acoustical Society of America, 1989, 88(6): 2919 https://doi.org/10.1121/1.399620
36
C López-Medina, L Ladehesa-Pineda, M Á Puche-Larrubia, A Escudero-Contreras, P Font-Ugalde, E Collantes-Estévez. Which factors explain the patient global assessment in patients with ankylosing spondylitis? A hierarchical cluster analysis on REGISPONSER-AS. Seminars in Arthritis and Rheumatism, 2021, 51(4): 1–5 https://doi.org/10.1016/j.semarthrit.2021.06.007
37
J Lin, S Li. Sparse recovery with coherent tight frames via analysis Dantzig selector and analysis LASSO. Applied and Computational Harmonic Analysis, 2014, 37(1): 126–139 https://doi.org/10.1016/j.acha.2013.10.003
38
J MacQueen. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967, 281–297
39
N Ranade, S Nagarajan, V Sarvothaman, V Ranade. ANN based modelling of hydrodynamic cavitation processes: biomass pre-treatment and wastewater treatment. Ultrasonics Sonochemistry, 2021, 72: 105428 https://doi.org/10.1016/j.ultsonch.2020.105428
40
X Zhang, L Liu, G Long, J Jiang, S Liu. Episodic memory govern schoices: an RNN-based reinforcement learning model for decision-making task. Neural Networks, 2021, 134: 1–10 https://doi.org/10.1016/j.neunet.2020.11.003
41
S Liu, I Lee. Sequence encoding incorporated CNN model for email document sentiment classification. Applied Soft Computing, 2021, 102: 107104 https://doi.org/10.1016/j.asoc.2021.107104