Please wait a minute...
Frontiers of Chemical Science and Engineering

ISSN 2095-0179

ISSN 2095-0187(Online)

CN 11-5981/TQ

Postal Subscription Code 80-969

2018 Impact Factor: 2.809

Front. Chem. Sci. Eng.    2022, Vol. 16 Issue (2) : 221-236    https://doi.org/10.1007/s11705-021-2061-y
RESEARCH ARTICLE
Dynamic response surface methodology using Lasso regression for organic pharmaceutical synthesis
Yachao Dong1,2, Christos Georgakis2(), Jacob Santos-Marques2, Jian Du1
1. Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
2. Department of Chemical and Biological Engineering and Systems Research Institute, Tufts University, Medford, MA 02155, USA
 Download: PDF(2467 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

To study the dynamic behavior of a process, time-resolved data are collected at different time instants during each of a series of experiments, which are usually designed with the design of experiments or the design of dynamic experiments methodologies. For utilizing such time-resolved data to model the dynamic behavior, dynamic response surface methodology (DRSM), a data-driven modeling method, has been proposed. Two approaches can be adopted in the estimation of the model parameters: stepwise regression, used in several of previous publications, and Lasso regression, which is newly incorporated in this paper for the estimation of DRSM models. Here, we show that both approaches yield similarly accurate models, while the computational time of Lasso is on average two magnitude smaller. Two case studies are performed to show the advantages of the proposed method. In the first case study, where the concentrations of different species are modeled directly, DRSM method provides more accurate models compared to the models in the literature. The second case study, where the reaction extents are modeled instead of the species concentrations, illustrates the versatility of the DRSM methodology. Therefore, DRSM with Lasso regression can provide faster and more accurate data-driven models for a variety of organic synthesis datasets.

Keywords data-driven modeling      pharmaceutical organic synthesis      Lasso regression      dynamic response surface methodology     
Corresponding Author(s): Christos Georgakis   
Online First Date: 13 July 2021    Issue Date: 10 January 2022
 Cite this article:   
Yachao Dong,Christos Georgakis,Jacob Santos-Marques, et al. Dynamic response surface methodology using Lasso regression for organic pharmaceutical synthesis[J]. Front. Chem. Sci. Eng., 2022, 16(2): 221-236.
 URL:  
https://academic.hep.com.cn/fcse/EN/10.1007/s11705-021-2061-y
https://academic.hep.com.cn/fcse/EN/Y2022/V16/I2/221
Algorithm 1. Procedures in calculating DRSM models
1. For a set of polynomial orders, R= 1:R max
2. For a set of tc values, perform linear regression for γ qr and select the tc that results in the smallest sum of squared errors.
3. Fine-tune tc via nonlinear regression and fix.
4. Eliminate the insignificant local parameters γqr, by SWR or Lasso.
5. Re-estimate the significant γ qr parameters by imposing the regression constraints.
6. Tabulate the BIC values.
7. Select R, and the corresponding model, with the smallest BIC value.
  
Fig.1  The dependence of three criteria: (a) GCV, (b) BIC and (c) AIC, on the Lasso weight λ, and (d) the number of local parameters. The BIC plot shows a clear minimum while the other two criteria fail to show such a minimum. For the BIC case, the number of local parameters γ vs. λ is given in the lower right subplot.
Fig.2  Comparison of fitting results of using Lasso and SWR for the product species. Time is shown on x-axis with unit of hour, while concentration is on y-axis with unit of equiv.
Item Order β0(θ ) Linear term Quadratic term
x1 x2 x3 x4 x5 x12 x22 x32 x42 x52
Lasso 0 0.950 0.100 –0.084 0.222 0.028 –0.031 –0.044 –0.045 –0.220 –0.034 –0.059
1 0.162 –0.137 0.047 –0.118 –0.075 0.026
2 –0.087 –0.018 0.070
Stepwise regression 0 0.945 0.105 –0.091 0.216 0.041 –0.035 –0.036 –0.047 –0.222 –0.035 –0.052
1 0.180 –0.141 0.045 –0.109 –0.072 0.033
2 –0.067 0.014 0.073
Tab.1  Comparison of γ values for intercept, linear, and quadratic terms estimated through SWR and Lasso for the product species a)
Item Order x1x2 x1x3 x2x3 x1x4 x2x4 x3x4 x1x5 x2x5 x3x5 x4x5
Lasso 0 0.020 –0.071 0.011 –0.017 0.041 0.042 –0.021 –0.028 0.045
1 –0.131 0.037 –0.021 –0.039 0.020 0.043
2 0.041
Stepwise regression 0 0.018 –0.069 0.059 0.064 –0.021 –0.019 –0.043 0.054
1 –0.125 0.027 –0.032 –0.023 –0.027 0.039 0.060
2 0.049
Tab.2  Comparison of γ values for 2FI terms estimated through SWR and Lasso for the product species a)
Species R tc Number of variables γ RMSE (in unit of equiv) RMSE/max
(Lasso)
SWR & Lasso SWR Lasso SWR (ref.) Lasso ΔLasso
S1 2 1.77 37 37 0.0059 0.0062 3.8% 3.8%
S2 2 2.18 28 31 0.0639 0.0702 9.9% 6.5%
S3 2 1.77 38 42 0.0061 0.0065 7.2% 6.1%
S4 2 2.18 28 30 0.0530 0.0570 7.4% 4.4%
S5 2 2.18 36 36 0.0463 0.0493 6.4% 4.5%
S6 1 4.24 28 27 0.0035 0.0035 –0.1% 5.5%
S7 1 2.59 22 24 0.0088 0.0089 1.4% 1.7%
S8 2 2.59 22 20 0.0084 0.0085 1.7% 6.1%
S9 2 4.24 28 24 0.0005 0.0005 –0.7% 6.2%
S10 2 4.24 26 28 0.0002 0.0003 3.4% 3.2%
Tab.3  DRSM model summary of SWR and Lasso for different species
Species Number of γ RMSEa (10–2 equiv) RMSEb (10–2 equiv)
Full Rd1 Rds12 Full Rd1 Rds12 Rd1 ΔRd1 Rd12 ΔRd12
S1 27 26 31 0.62 0.54 0.41 1.25 102% 1.24 101%
S2 28 12 29 7.02 9.17 6.46 17.96 156% 14.38 105%
S3 28 24 24 0.65 0.50 0.49 1.30 99% 1.22 87%
S4 28 19 31 5.70 7.42 4.53 15.48 172% 14.71 158%
S5 36 19 36 4.93 7.35 4.35 22.48 356% 16.23 229%
S6 28 24 27 0.35 0.13 0.43 0.77 123% 0.40 14%
S7 22 20 24 0.89 0.87 0.99 1.97 122% 0.97 9%
S8 22 17 19 0.85 0.84 0.93 1.89 121% 1.03 20%
S9 28 28 24 0.05 0.03 0.06 0.08 55% 0.05 10%
S10 26 27 26 0.03 0.02 0.03 0.06 152% 0.03 17%
Tab.4  Comparison of models estimated from different datasets in case study A
?Item Factor 1 Factor 2 Factor 3 Factor 4 Factor 5
Temperature/°C Solvent
volume/(L·mol–1)
Catalyst amount/ equiv Starting material 2/ equiv Water amount/ wt-%
Range Lower 30 0.7 0.01 1.04 0.05
Upper 65 1.2 0.15 1.20 0.50
Solutions in coded values M1 0.76 –1.00 –0.93 –1.00 –0.55
M2 0.73 –0.96 –0.81 –1.00 –0.48
M3 0.40 –0.84 –0.37 –1.00 –0.60
Solutions in original units M1 60.84 0.70 0.015 1.04 0.152
M2 60.31 0.71 0.024 1.04 0.167
M3 54.52 0.74 0.054 1.04 0.139
Tab.5  Factors in the design space and optimal conditions from different formulations
Fig.3  Contour plot of the concentration for product (in unit of equiv), with factors 1 and 3 varying at different values. Factor 2 is fixed at 0.7 L·mol–1, factor 4 is fixed at 1.04 equiv, and factor 5 is fixed at 0.152 wt-%, while time is fixed at 6 h. Blue square denotes the optimal condition of M1.
Fig.4  Contour plots of the concentration for product (in unit of equiv), with factors 1, 2, 3 and time varying at different values with units of °C, L·mol−1, equiv and hour, respectively. Factor 4 is fixed at 1.04 equiv, while factor 5 is fixed at 0.152 wt-%.
Reaction R tc Number of variables γ RMSE (in unit of equiv) RMSE/max
(Lasso)
SWR & Lasso SWR Lasso SWR (ref.) Lasso ΔLasso
R1 4 2.81 31 21 0.0032 0.0041 27% 0.4%
R2 4 2.81 30 20 0.0109 0.0150 38% 1.2%
R3 3 1.96 33 32 0.0135 0.0151 11% 1.5%
R4 2 2.82 26 26 0.0059 0.0059 0% 1.1%
R5 2 8.80 25 25 0.0045 0.0045 0% 2.9%
R6 2 8.80 19 19 0.0040 0.0040 0% 5.5%
R7 2 6.24 14 12 0.0027 0.0030 10% 6.6%
Tab.6  DRSM model summary of SWR and Lasso for different reactions
Item Order β0(θ ) Linear term 2FI term Quadratic term
x1 x2 x3 x1x2 x1x3 x2x3 x12 x22 x33
Lasso 0 0.3555 0.0584 0.0527 –0.0303 0.0064 –0.0035 –0.0017
1 0.3698 0.0146 0.0456 –0.0423 –0.0017
2 –0.0437 –0.0071 –0.0120 –0.0064 0.0059
3 –0.0329 0.0023
4 –0.0186
Stepwise regression 0 0.3592 0.0572 0.0511 –0.0298 0.0065 –0.0036 –0.0019 –0.0047 0.0028
1 0.3880 0.0174 0.0500 –0.0436 –0.0047 0.0028
2 –0.0316 –0.0489 –0.0088 –0.0121 –0.0043 0.0059
3 –0.0460 –0.0048 0.0018 0.0021 0.0023
4 0.0144 0.0091 0.0029 0.0019
Tab.7  Comparison of γ values estimated through SWR and Lasso for reaction R2 a)
Fig.5  Fitting results of using SWR and Lasso for reaction R2. Time is shown on x-axis with unit of hour, while reaction extent is on y-axis with unit of equiv. Experiment 17 is enlarged.
Fig.6  Fitting results of using SWR and Lasso for reaction R6. Time is shown on x-axis with unit of hour, while reaction extent is on y-axis with unit of equiv. Experiment 17 is enlarged.
Item Order β0(θ ) Linear term 2FI terms Quadratic term
x1 x2 x3 x1x2 x1x3 x2x3 x12 x22 x33
Lasso/SWR 0 0.0311 0.0085 –0.0029 0.0027 –0.0021 0.0018 0.0022
1 0.0446 0.0126 –0.0051 0.0050 –0.0021 0.0018 0.0051
2 0.0136 0.0041 –0.0022 0.0023 0.0029
Tab.8  Comparison of γ values estimated through SWR and Lasso for reaction R6 a)
1 C W Coley, N S Eyke, K F Jensen. Autonomous discovery in the chemical sciences part I: progress. Angewandte Chemie International Edition, 2020, 59: 2–38
2 R Van de Vijver, N M Vandewiele, P L Bhoorasingh, B L Slakman, F S Khanshan, H H Carstensen, M F Reyniers, G B Marin, R H West, K M Van Geem. Automatic mechanism and kinetic model generation for gas- and solution-phase processes: a perspective on best practices, recent advances, and future challenges. International Journal of Chemical Kinetics, 2015, 47(4): 199–231
https://doi.org/10.1002/kin.20902
3 F Qian, L Tao, W Sun, W Du. Development of a free radical kinetic model for industrial oxidation of p-xylene based on artificial neural network and adaptive immune genetic algorithm. Industrial & Engineering Chemistry Research, 2012, 51(8): 3229–3237
https://doi.org/10.1021/ie200737x
4 H Shi, T Zhou. Computational design of heterogeneous catalysts and gas separation materials for advanced chemical processing. Frontiers of Chemical Science and Engineering, 2021, 15(1): 49–59
https://doi.org/10.1007/s11705-020-1959-0
5 J A Selekman, J Qiu, K Tran, J Stevens, V Rosso, E Simmons, Y Xiao, J Janey. High-throughput automation in chemical process development. Annual Review of Chemical and Biomolecular Engineering, 2017, 8(1): 525–547
https://doi.org/10.1146/annurev-chembioeng-060816-101411
6 S Caron, N M Thomson. Pharmaceutical process chemistry: evolution of a contemporary data-rich laboratory environment. Journal of Organic Chemistry, 2015, 80(6): 2943–2958
https://doi.org/10.1021/jo502879m
7 J Ulrich, P Frohberg. Problems, potentials and future of industrial crystallization. Frontiers of Chemical Science and Engineering, 2013, 7(1): 1–8
https://doi.org/10.1007/s11705-013-1304-y
8 K V Gernaey, A E Cervera-Padrell, J M Woodley. A perspective on PSE in pharmaceutical process development and innovation. Computers & Chemical Engineering, 2012, 42: 15–29
https://doi.org/10.1016/j.compchemeng.2012.02.022
9 W Yue, X Chen, W Gui, Y Xie, H Zhang. A knowledge reasoning fuzzy-Bayesian network for root cause analysis of abnormal aluminum electrolysis cell condition. Frontiers of Chemical Science and Engineering, 2017, 11(3): 414–428
https://doi.org/10.1007/s11705-017-1663-x
10 D C Montgomery. Design and Analysis of Experiments. 8th edition. Hoboken: John Wiley & Sons, 2008
11 N Klebanov, C Georgakis. Dynamic response surface models: a data-driven approach for the analysis of time-varying process outputs. Industrial & Engineering Chemistry Research, 2016, 55(14): 4022–4034
https://doi.org/10.1021/acs.iecr.5b03572
12 Z Wang, C Georgakis. New dynamic response surface methodology for modeling nonlinear processes over semi-infinite time horizons. Industrial & Engineering Chemistry Research, 2017, 56(38): 10770–10782
https://doi.org/10.1021/acs.iecr.7b02381
13 Y Dong, C Georgakis, J Mustakis, J M Hawkins, L Han, K Wang, J P McMullen, S T Grosser, K Stone. Constrained version of the dynamic response surface methodology for modeling pharmaceutical reactions. Industrial & Engineering Chemistry Research, 2019, 58(30): 13611–13621
https://doi.org/10.1021/acs.iecr.9b00731
14 N R Domagalski, B C Mack, J E Tabora. Analysis of design of experiments with dynamic responses. Organic Process Research & Development, 2015, 19(11): 1667–1682
https://doi.org/10.1021/acs.oprd.5b00143
15 K Wang, L Han, J Mustakis, B Li, J Magano, D B Damon, A Dion, M T Maloney, R Post, R Li. Kinetic and data-driven reaction analysis for pharmaceutical process development. Industrial & Engineering Chemistry Research, 2020, 59(6): 2409–2421
https://doi.org/10.1021/acs.iecr.9b03578
16 E Alpaydin. Introduction to Machine Learning. 3rd edition. Cambridge: MIT Press, 2014
17 S Boyd, N Parikh, E Chu, B Peleato, J Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011, 3(1): 1–122
https://doi.org/10.1561/2200000016
18 S García-Muñoz, S Dolph, H W Ward II. Handling uncertainty in the establishment of a design space for the manufacture of a pharmaceutical product. Computers & Chemical Engineering, 2010, 34(7): 1098–1107
https://doi.org/10.1016/j.compchemeng.2010.02.027
19 J Santos-Marques, C Georgakis, J Mustakis, J M Hawkins. From DRSM models to the identification of the reaction stoichiometry in a complex pharmaceutical case study. AIChE Journal. American Institute of Chemical Engineers, 2019, 65(4): 1173–1185
https://doi.org/10.1002/aic.16515
20 Y Dong, C Georgakis, J Mustakis, J M Hawkins, L Han, K Wang, J P McMullen, S T Grosser, K Stone. Stoichiometry identification of pharmaceutical reactions using the constrained dynamic response surface methodology. AIChE Journal. American Institute of Chemical Engineers, 2019, 65(11): e16726
https://doi.org/10.1002/aic.16726
21 N Huri, M Feder. In selecting the Lasso regularization parameter via Bayesian principles, 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), 2016, 1–5
22 D C Montgomery, E A Peck, G G Vining. Introduction to Linear Regression Analysis. 5th edition. London: Wiley, 2012
23 G H Golub, M Heath, G Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 1979, 21(2): 215–223
https://doi.org/10.1080/00401706.1979.10489751
24 G Hanrahan, K Lu. Application of factorial and response surface methodology in modern experimental design and optimization. Critical Reviews in Analytical Chemistry, 2006, 36(3-4): 141–151
https://doi.org/10.1080/10408340600969478
25 G Singh, R S Pai, V K Devi. Response surface methodology and process optimization of sustained release pellets using Taguchi orthogonal array design and central composite design. Journal of Advanced Pharmaceutical Technology & Research, 2012, 3(1): 30–40
26 M A Bezerra, R E Santelli, E P Oliveira, L S Villar, L A Escaleira. Response surface methodology (RSM) as a tool for optimization in analytical chemistry. Talanta, 2008, 76(5): 965–977
https://doi.org/10.1016/j.talanta.2008.05.019
27 Y Dong, C Georgakis, J Mustakis, H Lu, J P McMullen. Optimization of pharmaceutical reactions using the dynamic response surface methodology. Computers & Chemical Engineering, 2020, 135: 106778
https://doi.org/10.1016/j.compchemeng.2020.106778
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed