QSPR Modeling of Gibbs Free Energy of Organic Compounds by Weighting of Nearest Neighboring Codes

We examine the encoding of chemical structure of organic compounds by Labeled Hydrogen-Filled Graphs (LHFGs). Quantitative Structure-Property Relationships (QSPR) for a representative set of 150 organic molecules have been derived by means of the optimization of correlation weights of local invariants of the LHFGs. We have tested as local invariants Morgan extended connectivity of zero-and ﬁrst order, numbers of path of length 2 (P2) and valence shells of distance of 2 (S2) associated with each atom in the molecular structure, and the Nearest Neighboring Codes (NNC). The best statistical characteristics for the Gibbs free energy has been obtained for the NNC weighting.Statisticalparameterscorrespondingtothismodelarethefollowing n = 100, r 2 = 0 . 9974, s = 5 . 136 kJ/mol, F = 38319 (training set); n = 50, r 2 = 0 . 9990, s = 3 . 405 kJ/mol, F = 48717 (test set). Some possible further developments are pointed out.


INTRODUCTION
Thermodynamics, the study of the laws governing interconvertibility of different forms of energy into heat, enables us to discuss physical chemistry properties quantitatively and to make useful predictions.In particular, chemical phenomena can be developed quite independently of the atomic and molecular theory by this powerful method.The energy changes associated with chemical reactions are themselves of considerable importance, and even greater chemical interest, however, stems from the fact that the equilibrium position of a reacting system can be related to these energy changes.
The supposition of the early thermochemists that equilibrium was always approached with a decrease in the internal energy of the system, a principle which applies to macroscale mechanical systems, has been shown to be inadequate for chemical systems.A more careful state- 1 INIFTA, Chemistry Department, Faculty of Exact Sciences, La Plata University, Diag.113 and 64, C.C. 16,1900 La Plata, Argentina. 2Institute of Geology and Geophysics, Republic of Uzbekistan Academy of Sciences, Tashkent, Uzbekistan 3 To whom correspondence should be addressed; e-mail: castro@ dalton.quimica.unlp.edu.ar,jubert@arnet.com.arment of the role of the internal energy or enthalpy is that in systems of constant entropy the equilibrium position is that of lowest energy.In systems of constant energy, the equilibrium position is that of highest entropy.However, in most chemical processes neither the energy nor entropy is held constant.Gibbs free energy is the suitable thermodynamic function when both energy and entropy change.
The free energy is often considered to be the most important quantity in thermodynamics.The free energy is usually expressed as the Helmholtz function, A, or the Gibbs function, G.The Helmholtz free energy is appropriate to a system with constant number of particles, temperature and volume (constant NVT), whereas the Gibbs free energy is appropriate to constant number of particles, temperature and pressure (constant NPT).Most experiments are conducted under conditions of constant temperature and pressure, where the Gibbs function is the suitable free energy quantity.
Unfortunately, the free energy is a difficult quantity to obtain experimentally for systems such as liquids or flexible macromolecules that have many minimum energy configurations separated by low-energy barriers.Associated quantities such as the entropy and the chemical potential are also difficult to calculate.The free energy cannot be accurately determined from a standard molecular dynamics or Monte Carlo simulation because such simulations do not adequately sample from those regions of phase space that make important contributions to the free energy.However, there is a suitable option to model thermodynamic properties within the realm of Quantitative Structure-Property Relationships (QSPR) theory.In fact, several previous theoretical studies for predicting these properties have shown to be rather valuable tools [1][2][3][4][5][6][7][8][9][10].In particular, we have resorted to the modeling of free energy of 60 hydrocarbons employing the maximum topological distances based indices as molecular descriptors for QSPR [2].But it is a well-known fact that there is a large set of options to choose molecular descriptors, so that we have deemed interesting and potentially valuable to complement this analysis for a larger set of organic compounds using different molecular descriptors.
The present study aims to model the Gibbs free energy (1G • ) by means of QSPR for a set of 150 organic molecules whose experimental data were taken from Ref. [11].

METHOD
The QSPR analysis is performed by means of the Optimization of Correlation Weights of Local Graph Invariants (OCWLGI) in Labeled Hydrogen-Filled Graphs (LHFG) described elsewhere [12][13][14][15][16] have been used.As local invariants we have examined the following indices: • standard vertex degree in the LHFG, 0 EC; • Morgan extended connectivity index of first order in LHFG, 1 EC; • numbers of path of length two associated with each vertex in LHFG as described in a recent article [17] and denoted as P2; • valence shells values of distance two, as described before [17] and denoted as S2.
We do not describe the method for brevity sake since it has been presented before, so that the pertinent details can be consulted in the precedent article [17].Standard ORIGIN software was employed for numerical calculations.
The molecular descriptors employed in this study are calculated with formula where the a k is the chemical element that is image of k-th vertex in the LHFG, the LI k is the numerical value of the local invariant in the LHFG.The CW(a k ) and the CW(LI k ) are correlation weights associated with the chemical element and the local invariant values in the LHFG, respectively.The sum in Eq. ( 1) is extended over all vertices in the LHFG.The simple additive formula (1) is arbitrary and there is complete freedom to apply any other alternative algebraic relationship.

RESULTS AND DISCUSSION
Statistical characteristics of Gibbs free energies values (in kJ/mol) from a previous article [11] are presented in Table I.These results allow us to verify that the best linear approximation takes place when using the optimal descriptor D(a k , NNC k ).
The NNC is calculated as where N t is total number of the k-the vertex neighbors in the LHFG, N c is number of carbon atoms connected with the k-th vertex and the N h is number of hydrogen atoms bonded with the k-th vertex in the LHFG.
We have tried several partitions of the entire molecular set in a training set and a test set, but final results do not depend significantly of the particular way to assign molecules to each set, so that we present results for a representative case.It must be pointed out that results derived for test set are true predictions since these molecules are not included in the training set.QSPRs under consideration have been obtained with software described in detail in a recent article [18].This algorithm is based on the Monte Carlo technique.Starting values of all these correlation weights are defined as 1.0.The optimum values corresponding to the correlation weights are those yielding the largest possible correlation coefficient between property under consideration and the chosen descriptor/s calculated by means of Eq. (1) (Tables II-VI).
The analysis of the final results allows as to note that all the remaining local invariants provides reasonable satisfactory models to predict the Gibbs free energies.First probes from the Table I    Results obtained from application of Eqs. ( 2)-( 6) are shown in Tables VII-XI.In order to judge the relative merits of the present approach one must take into account that the experimental values vary within an ample range of values (i.e.370 kJ/mol) and the average absolute deviation, for example, in the test molecular set calculated on the basis of Eq. ( 2) is only 8 kJ/mol.Since there are no available theoretical data of Gibbs free energies corresponding to the same molecular set, there is no possibility of a direct comparison of the relative merits with other possible theoretical approaches.There are other theoretical approaches to the calculation of this property within the realm of the QSPR theory and semiempirical molecular orbital methods [19,20], but the molecular sets are rather restricted with respect to that one chosen in this study.
A relatively serious disadvantage of the present approach is the capability to discern among isomers, since in the majority of the cases the descriptors are the same.Evidently, in order to complement this sort of flexible topological descriptors it should be necessary to incorpo-

CONCLUSIONS
The optimization of correlation weights of local graph invariants gives reasonable good models to mimic Gibbs free energies.The best linear approximation for this molecular set is that one based on the correlation weights of the nearest neighboring codes.However, the remaining molecular descriptors show a sensible behavior as predictor of this thermodynamic quantity.
A possible option to improve present results should be to employ fitting linear polynomials depending upon several variables or/and to try higher-order relationships instead of the linear ones or/and to apply other functional more complex relationship than the simplest polynomial mathematical structure.Besides, it is necessary to study the way to discern among isomers.Research along these lines is under development and results will be published elsewhere in the forthcoming future.

Table I .
Statistical Characteristics of Optimal Descriptors of Eq. (1) with r the Correlation Coefficient, s the Standard Error Estimation, and F is Fischer F-ratio

Table II .
Correlation Weights for the Molecular Descriptor of D(a, 0 EC)

Table III .
Correlation Weights Corresponding to the Molecular Descriptor of D(a, 1 EC)

Table IV .
Correlation

Table VI .
Correlation Weights for the Molecular Descriptor of D(a, NNC)