Please wait a minute...
Frontiers of Chemical Science and Engineering

ISSN 2095-0179

ISSN 2095-0187(Online)

CN 11-5981/TQ

Postal Subscription Code 80-969

2018 Impact Factor: 2.809

Front. Chem. Sci. Eng.    2022, Vol. 16 Issue (2) : 152-167    https://doi.org/10.1007/s11705-021-2060-z
RESEARCH ARTICLE
A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship
Qilei Liu, Yinke Jiang, Lei Zhang(), Jian Du
Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
 Download: PDF(1623 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Chemical industry is always seeking opportunities to efficiently and economically convert raw materials to commodity chemicals and higher value-added chemical-based products. The life cycles of chemical products involve the procedures of conceptual product designs, experimental investigations, sustainable manufactures through appropriate chemical processes and waste disposals. During these periods, one of the most important keys is the molecular property prediction models associating molecular structures with product properties. In this paper, a framework combining quantum mechanics and quantitative structure-property relationship is established for fast molecular property predictions, such as activity coefficient, and so forth. The workflow of framework consists of three steps. In the first step, a database is created for collections of basic molecular information; in the second step, quantum mechanics-based calculations are performed to predict quantum mechanics-based/derived molecular properties (pseudo experimental data), which are stored in a database and further provided for the developments of quantitative structure-property relationship methods for fast predictions of properties in the third step. The whole framework has been carried out within a molecular property prediction toolbox. Two case studies highlighting different aspects of the toolbox involving the predictions of heats of reaction and solid-liquid phase equilibriums are presented.

Keywords molecular property      quantum mechanics      quantitative structure-property relationship      heat of reaction      solid-liquid phase equilibrium     
Corresponding Author(s): Lei Zhang   
Online First Date: 13 July 2021    Issue Date: 10 January 2022
 Cite this article:   
Qilei Liu,Yinke Jiang,Lei Zhang, et al. A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship[J]. Front. Chem. Sci. Eng., 2022, 16(2): 152-167.
 URL:  
https://academic.hep.com.cn/fcse/EN/10.1007/s11705-021-2060-z
https://academic.hep.com.cn/fcse/EN/Y2022/V16/I2/152
Name Open-source Toolbox provides (pseudo) experimental data by itself Applicability domain
PyQSAR [5] Yes No Biochemistry and chemical engineering
BioPPSy [6] Yes No Biochemistry
AZOrange [8] Yes No Biochemistry, chemical engineering and pharmacy
Bioalerts [9] Yes No Biochemistry and pharmacy
Camb [10] Yes No Biochemistry
eTOXlab [11] Yes No Biochemistry
Open3DQSAR [12] Yes No Pharmacy
OECD QSAR Toolbox [13] No Yes (read-across method) Biochemistry, chemical engineering and pharmacy
Tab.1  Toolboxes developed for QSPR/QSAR methods
Fig.1  The diagrammatic sketch of QM-QSPR framework.
Fig.2  The software architecture of QM-QSPR with its dataflow and workflow.
Fig.3  The workflow of case study 1.
Fig.4  Comparison results between QM calculated values and experimental data for (a) ΔfHg θ, (b) ΔfGgθ and (c) Sgθ at 298.15 K of 22 representative compounds (blank means experimental data are unavailable in database).
No. Compound CAS number Calculated
ΔHvapθ/(kJ·mol–1)
Experimental
ΔHvapθ/(kJ·mol–1)
Calculated
ΔGvapθ/(kJ·mol–1)
Experimental
ΔGvapθ/(kJ·mol–1)
Calculated
ΔSvapθ/(J·mol–1·K–1)
Experimental
ΔSvapθ/(J·mol–1·K–1)
1 Methane 74-82-8 4.4 –21.8 88.0
2 1-Octene 111-66-0 46.9 40.4 20.7 88.0
3 Ethylene 74-85-1 18.7 –7.5 88.0
4 1-Butyne 107-00-6 34.9 8.6 88.0
5 1-Bromohexane 111-25-1 50.3 24.1 88.0
6 Bromoethane 74-96-4 31.3 5.0 88.0
7 Propanol 71-23-8 48.5 47.5 22.2 8.8 88.0 129.1
8 Ether 60-29-7 34.6 27.4 8.4 –5.6 88.0 170.3
9 Benzaldehyde 100-52-7 63.2 37.0 88.0
10 Acetaldehyde 75-07-0 27.6 26.1 1.3 –5.4 88.0 103.4
11 Acetone 67-64-1 35.1 31.3 8.8 88.0 96.5
12 Methyl ethyl ketone 78-93-3 41.3 19.6 15.0 88.0 100.8
13 Hexanoic acid 142-62-1 72.8 46.5 88.0
14 Acetic acid 64-19-7 53.3 52.2 27.1 16.0 88.0 123.6
15 Ethyl acetate 141-78-6 38.5 35.7 12.3 5.3 88.0 106.1
16 Methyl formate 107-31-3 38.0 28.7 11.8 88.0
17 Propionitrile 107-12-0 38.3 12.0 88.0
18 Acetonitrile 75-05-8 35.0 39.6 8.8 5.4 88.0 93.7
19 Propylamine 107-10-8 44.4 31.3 18.2 88.0
20 Methanethiol 74-93-1 27.2 1.0 88.0
21 Water 7732-18-5 35.1 44.0 8.9 8.5 88.0 118.9
22 Carbon dioxide 124-38-9 18.1 19.8 –8.1 –8.4 88.0 94.4
Tab.2  The calculated values of ΔHvapθ, ΔGvapθ, ΔSvapθ at 298.15 K of 22 examples and their corresponding experimental data a)
No. Compound CAS number Calculated
ΔfH1θ/(kJ·mol–1)
Experimental
ΔfH1θ/(kJ·mol–1)
Calculated
ΔfG1θ/(kJ·mol–1)
Experimental
ΔfG1θ/(kJ·mol–1)
Calculated
S1θ/(J·mol–1·K–1)
Experimental
S1θ/(J·mol–1·K–1)
1 Methane 74-82-8 –78.1 –36.9 118.7
2 1-Octene 111-66-0 –146.2 –121.8 56.3 335.0
3 Ethylene 74-85-1 32.8 69.4 130.9
4 1-Butyne 107-00-6 122.4 174.7 201.2
5 1-Bromohexane 111-25-1 –215.0 –43.3 325.9
6 Bromoethane 74-96-4 –101.2 –42.7 198.4
7 Propanol 71-23-8 –309.5 –303.5 –190.1 –170.6 212.8 193.6
8 Ether 60-29-7 –303.2 –279.4 –155.8 –116.7 245.4 172.4
9 Benzaldehyde 100-52-7 –115.7 –48.8 244.5
10 Acetaldehyde 75-07-0 –192.0 –192.2 –134.6 –127.6 163.5 160.4
11 Acetone 67-64-1 –255.6 –248.3 –170.0 195.8 198.8
12 Methyl ethyl ketone 78-93-3 –288.5 –258.2 –174.0 225.7 239.1
13 Hexanoic acid 142-62-1 –581.9 –390.5 323.4
14 Acetic acid 64-19-7 –465.1 –484.4 –387.3 –390.2 197.6 159.9
15 Ethyl acetate 141-78-6 –496.0 –481.1 –362.9 265.7 256.7
16 Methyl formate 107-31-3 –406.1 –386.1 –327.9 196.4
17 Propionitrile 107-12-0 8.1 72.1 196.4
18 Acetonitrile 75-05-8 32.2 34.4 68.2 86.5 163.4 149.7
19 Propylamine 107-10-8 –125.7 –101.3 10.4 215.1
20 Methanethiol 74-93-1 –54.8 –19.9 165.2
21 Water 7732-18-5 –262.1 –285.8 –224.5 –237.1 106.6 69.9
22 Carbon dioxide 124-38-9 –408.7 –413.3 –386.2 –386.0 126.1 119.4
Tab.3  The calculated values of ΔfH1θ, ΔfG1 θ, S1θ at 298.15 K of 22 examples and their corresponding experimental data a)
Property R2 MAE
ΔfHgθ 0.990 28.1 kJ·mol–1
ΔfGgθ 0.988 27.7 kJ·mol–1
Sgθ 0.993 6.7 J·mol–1·K–1
ΔHvapθ 0.925 4.0 kJ·mol–1
ΔGvapθ 0.925 4.0 kJ·mol–1
ΔfH1θ 0.990 28.0 kJ·mol–1
ΔfG1θ 0.988 27.7 kJ·mol–1
S1θ 0.993 6.7 J·mol–1·K–1
Tab.4  The fitting results ( R2)/error criterion (MAE) of ΔfHgθ, ΔfGg θ, Sgθ, ΔvapH θ, ΔvapG θ, ΔfH1θ, ΔfG1 θ and S1θ between GC predictions and QM predictions
Fig.5  The number of group sets with ΔfHgθ, ΔfGg θ and Sgθ values in the MG1 based GC method and MG1-RDKit based GC method.
Fig.6  The diagrammatic sketch of synthesis pathway for aspirin.
Molecule CAS number Experiment
ΔfHgθ/(kJ·mol–1)
Unpackaged hybrid QM
ΔfHgθ/(kJ·mol–1)
MG1-RDKit
ΔfHgθ/(kJ·mol–1)
MG1
ΔfHgθ/(kJ·mol–1)
Salicylic acid 69-72-7 –494.8 –473.5 –458.4 –469.4
Acetic anhydride 108-24-7 –572.5 –588.3 –587.0 –575.2
Aspirin 50-78-2 –677.1 –667.1
Acetic acid 64-19-7 –432.8 –434.3 –415.9 –432.6
Tab.5  The properties, ΔfHg θ (298.15?K), for each compound in aspirin synthesis pathway obtained from different methods (the experimental data refers to ICAS software [42]; blank means data are unavailable)
Fig.7  Fig. 7 The workflow of case study 2.
Fig.8  Comparisons of solid-liquid phase equilibrium curves of solute ibuprofen and four solvents ((a) 1-propanol, (b) 2-methyl-1-propanol, (c) 3-Methyl-1-butanol and (d) ethyl acetate) obtained from the experiment, QM calculation and MLAC method.
1 P Kirkpatrick, C Ellis. Chemical space. Nature, 2004, 432(7019): 823
https://doi.org/10.1038/432823a
2 A R Katritzky, V S Lobanov, M Karelson. QSPR: the correlation and quantitative prediction of chemical and physical properties from structure. Chemical Society Reviews, 1995, 24(4): 279–287
https://doi.org/10.1039/cs9952400279
3 E J Mills. On melting point and boiling point as related to composition. Philosophical Magazine, 1884, 17(5): 173–187
4 J C Dearden, M T D Cronin, K L E Kaiser. How not to develop a quantitative structureactivity or structureproperty relationship (QSAR/QSPR). SAR and QSAR in Environmental Research, 2009, 20(3-4): 241–266
https://doi.org/10.1080/10629360902949567
5 S Kim, K H Cho. PyQSAR: a fast QSAR modeling platform using machine learning and jupyter notebook. Bulletin of the Korean Chemical Society, 2019, 40(1): 39–44
6 M Enciso, N Meftahi, M L Walker, B J Smith. BioPPSy: an open-source platform for QSAR/QSPR analysis. PLoS One, 2016, 11(11): e0166298
https://doi.org/10.1371/journal.pone.0166298
7 S Pirhadi, J Sunseri, D R Koes. Open source molecular modeling. Journal of Molecular Graphics & Modelling, 2016, 69: 127–143
https://doi.org/10.1016/j.jmgm.2016.07.008
8 J C Stålring, L A Carlsson, P Almeida, S Boyer. AZOrange—high performance open source machine learning for QSAR modeling in a graphical programming environment. Journal of Cheminformatics, 2011, 3(1): 28
https://doi.org/10.1186/1758-2946-3-28
9 I Cortes-Ciriano. Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets. Journal of Cheminformatics, 2016, 8(1): 13
https://doi.org/10.1186/s13321-016-0125-7
10 D S Murrell, I Cortes-Ciriano, G J P van Westen, I P Stott, A Bender, T E Malliavin, R C Glen. Chemically aware model builder (camb): an R package for property and bioactivity modelling of small molecules. Journal of Cheminformatics, 2015, 7(1): 45
https://doi.org/10.1186/s13321-015-0086-2
11 P Carrió, O López, F Sanz, M Pastor. eTOXlab, an open source modeling framework for implementing predictive models in production environments. Journal of Cheminformatics, 2015, 7(1): 8
https://doi.org/10.1186/s13321-015-0058-6
12 P Tosco, T Balle. Open3DQSAR: a new open-source software aimed at high-throughput chemometric analysis of molecular interaction fields. Journal of Molecular Modeling, 2011, 17(1): 201–208
https://doi.org/10.1007/s00894-010-0684-x
13 S D Dimitrov, R Diderich, T Sobanski, T S Pavlov, G V Chankov, A S Chapkanov, Y H Karakolev, S G Temelkov, R A Vasilev, K D Gerova, et al.. QSAR Toolbox—workflow and major functionalities. SAR and QSAR in Environmental Research, 2016, 27(3): 203–219
https://doi.org/10.1080/1062936X.2015.1136680
14 J Kostal. Advances in Molecular Toxicology. 1st ed. Cambridge: Elsevier, 2016, 139–186
15 A Krokhotin, N V Dokholyan. Methods in Enzymology. 1st ed. Waltham: Elsevier, 2015, 65–89
16 J Polanski. Comprehensive Chemometrics. 1st ed. Oxford: Elsevier, 2009, 459–506
17 R Salomon-Ferrer, D A Case, R C Walker. An overview of the Amber biomolecular simulation package. WIREs Computational Molecular Science, 2013, 3(2): 198–210
https://doi.org/10.1002/wcms.1121
18 S Jo, T Kim, V G Iyer, W Im. CHARMM-GUI: a web-based graphical user interface for CHARMM. Journal of Computational Chemistry, 2008, 29(11): 1859–1865
https://doi.org/10.1002/jcc.20945
19 H J C Berendsen, D van der Spoel, R van Drunen. GROMACS: a message-passing parallel molecular dynamics implementation. Computer Physics Communications, 1995, 91(1): 43–56
https://doi.org/10.1016/0010-4655(95)00042-E
20 S Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 1995, 117(1): 1–19
https://doi.org/10.1006/jcph.1995.1039
21 J C Phillips, R Braun, W Wang, J Gumbart, E Tajkhorshid, E Villa, C Chipot, R D Skeel, L Kalé, K Schulten. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 2005, 26(16): 1781–1802
https://doi.org/10.1002/jcc.20289
22 W Li, C Chen, D Zhao, S Li. LSQC: low scaling quantum chemistry program. International Journal of Quantum Chemistry, 2015, 115(10): 641–646
https://doi.org/10.1002/qua.24831
23 Gaussian 16. Revision A.03. Wallingford, CT: Gaussian, Inc., 2016.
24 F Neese. The ORCA program system. WIREs Computational Molecular Science, 2012, 2(1): 73–78
https://doi.org/10.1002/wcms.81
25 M W Schmidt, K K Baldridge, J A Boatz, S T Elbert, M S Gordon, J H Jensen, S Koseki, N Matsunaga, K A Nguyen, S Su, et al.. General atomic and molecular electronic structure system. Journal of Computational Chemistry, 1993, 14(11): 1347–1363
https://doi.org/10.1002/jcc.540141112
26 J P Stewart James. MOPAC: a semiempirical molecular orbital program. Journal of Computer-Aided Molecular Design, 1990, 4(1): 1–103
https://doi.org/10.1007/BF00128336
27 F Neese, F Wennmohs, A Hansen, U Becker. Efficient, approximate and parallel hartreefock and hybrid DFT calculations. A ‘chain-of-spheres’ algorithm for the hartreefock exchange. Chemical Physics, 2009, 356(1): 98–109
https://doi.org/10.1016/j.chemphys.2008.10.036
28 N M O’Boyle, M Banck, C A James, C Morley, T Vandermeersch, G R Hutchison. Open Babel: an open chemical toolbox. Journal of Cheminformatics, 2011, 3(1): 33
https://doi.org/10.1186/1758-2946-3-33
29 R A Mata, M A Suhm. Benchmarking quantum chemical methods: are we heading in the right direction? Angewandte Chemie International Edition, 2017, 56(37): 11011–11018
https://doi.org/10.1002/anie.201611308
30 L Vereecken, D R Glowacki, M J Pilling. Theoretical chemical kinetics in tropospheric chemistry: methodologies and applications. Chemical Reviews, 2015, 115(10): 4063–4114
https://doi.org/10.1021/cr500488p
31 J Zheng, Y Zhao, D G Truhlar. The DBH24/08 database and its use to assess electronic structure model chemistries for chemical reaction barrier heights. Journal of Chemical Theory and Computation, 2009, 5(4): 808–821
https://doi.org/10.1021/ct800568m
32 J Řezáč, P Hobza. Describing noncovalent interactions beyond the common approximations: how accurate is the “gold standard,” CCSD(T) at the complete basis set limit? Journal of Chemical Theory and Computation, 2013, 9(5): 2151–2155
https://doi.org/10.1021/ct400057w
33 J Sun, J W Furness, Y Zhang. Mathematical Physics in Theoretical Chemistry. 1st ed. Amsterdam: Elsevier, 2019, 119–159
34 L Goerigk, A Hansen, C Bauer, S Ehrlich, A Najibi, S Grimme. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Physical Chemistry Chemical Physics, 2017, 19(48): 32184–32215
https://doi.org/10.1039/C7CP04913G
35 P Politzer, Y Ma, P Lane, M C Concha. Computational prediction of standard gas, liquid, and solid-phase heats of formation and heats of vaporization and sublimation. International Journal of Quantum Chemistry, 2005, 105(4): 341–347
https://doi.org/10.1002/qua.20709
36 J G Speight. Book Lange’s Handbook of Chemistry. 16th ed. New York: McGraw-Hill, 2005, 515–560.
37 Q Liu, L Zhang, L Liu, J Du, Q Meng, R Gani. Computer-aided reaction solvent design based on transition state theory and COSMO-SAC. Chemical Engineering Science, 2019, 202: 300–317
https://doi.org/10.1016/j.ces.2019.03.023
38 C M Hsieh, S I Sandler, S T Lin. Improvements of COSMO-SAC for vaporliquid and liquidliquid equilibrium predictions. Fluid Phase Equilibria, 2010, 297(1): 90–97
https://doi.org/10.1016/j.fluid.2010.06.011
39 W L Chen, C M Hsieh, L Yang, C C Hsu, S T Lin. A critical evaluation on the performance of COSMO-SAC models for vaporliquid and liquidliquid equilibrium predictions based on different quantum chemical calculations. Industrial & Engineering Chemistry Research, 2016, 55(34): 9312–9322
https://doi.org/10.1021/acs.iecr.6b02345
40 R Gani. Group contribution-based property estimation methods: advances and perspectives. Current Opinion in Chemical Engineering, 2019, 23: 184–196
https://doi.org/10.1016/j.coche.2019.04.007
41 M Mattei, G M Kontogeorgis, R Gani. Modeling of the critical micelle concentration (CMC) of nonionic surfactants with an extended group-contribution method. Industrial & Engineering Chemistry Research, 2013, 52(34): 12236–12246
https://doi.org/10.1021/ie4016232
42 A S Hukkerikar, B Sarup, A Ten Kate, J Abildskov, G Sin, R Gani. Group-contribution+ (GC+) based estimation of properties of pure components: improved property estimation and uncertainty analysis. Fluid Phase Equilibria, 2012, 321: 25–43
https://doi.org/10.1016/j.fluid.2012.02.010
43 A T C Goh. Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 1995, 9(3): 143–151
https://doi.org/10.1016/0954-1810(94)00011-S
44 Q Liu, L Zhang, L Liu, J Du, A K Tula, M Eden, R Gani. OptCAMD: an optimization-based framework and tool for molecular and mixture product design. Computers & Chemical Engineering, 2019, 124: 285–301
https://doi.org/10.1016/j.compchemeng.2019.01.006
45 T Lu, F Chen. Multiwfn: a multifunctional wavefunction analyzer. Journal of Computational Chemistry, 2012, 33(5): 580–592
https://doi.org/10.1002/jcc.22885
46 T Lu, F Chen. Quantitative analysis of molecular surface based on improved marching tetrahedra algorithm. Journal of Molecular Graphics & Modelling, 2012, 38: 314–323
https://doi.org/10.1016/j.jmgm.2012.07.004
47 T E Oliphant. Python for scientific computing. Computing in Science & Engineering, 2007, 9(3): 10–20
https://doi.org/10.1109/MCSE.2007.58
48 Q Liu, L Zhang, K Tang, Y Feng, J Zhang, Y Zhuang, L Liu, J Du. Computer-aided reaction solvent design considering inertness using group contribution-based reaction thermodynamic model. Chemical Engineering Research & Design, 2019, 152: 123–133
https://doi.org/10.1016/j.cherd.2019.09.018
49 D W Oxtoby, H P Gillis, A Campion, H H Helal, K P Gaither. Book Principles of Modern Chemistry. 7th ed. Belmont: CENGAGE Learning, 2011, 596
50 E Mullins, R Oldland, Y A Liu, S Wang, S I Sandler, C C Chen, M Zwolak, K C Seavey. Sigma-profile database for using COSMO-based thermodynamic methods. Industrial & Engineering Chemistry Research, 2006, 45(12): 4389–4415
https://doi.org/10.1021/ie060370h
51 J J Rooney. Trouton’s rule. Nature, 1990, 348(6300): 398–398
https://doi.org/10.1038/348398b0
52 Q Liu, L Zhang, K Tang, L Liu, J Du, Q Meng, R Gani. Machine learning-based atom contribution method for the prediction of surface charge density profiles and solvent design. AIChE Journal. American Institute of Chemical Engineers, 2021, 67(2): e17110
https://doi.org/10.1002/aic.17110
53 M Gastegger, L Schwiedrzik, M Bittermann, F Berzsenyi, P Marquetand. WACSF—weighted atom-centered symmetry functions as descriptors in machine learning potentials. Journal of Chemical Physics, 2018, 148(24): 241709
https://doi.org/10.1063/1.5019667
54 S Wang, Z Song, J Wang, Y Dong, M Wu. Solubilities of ibuprofen in different pure solvents. Journal of Chemical & Engineering Data, 2010, 55(11): 5283–5285
https://doi.org/10.1021/je100255z
55 J Hong, D Hua, X Wang, H Wang, J Li. Solidliquidgas equilibrium of the ternaries ibuprofen+ myristic acid+ CO2 and ibuprofen+ tripalmitin+ CO2. Journal of Chemical & Engineering Data, 2010, 55(1): 297–302
https://doi.org/10.1021/je900342a
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed