| [201199] | 1 | /* | 
|---|
|  | 2 | * Project: MoleCuilder | 
|---|
|  | 3 | * Description: creates and alters molecular systems | 
|---|
|  | 4 | * Copyright (C)  2010 University of Bonn. All rights reserved. | 
|---|
|  | 5 | * Please see the LICENSE file or "Copyright notice" in builder.cpp for details. | 
|---|
|  | 6 | */ | 
|---|
|  | 7 |  | 
|---|
|  | 8 | /** | 
|---|
|  | 9 | * \file potentials.dox | 
|---|
|  | 10 | * | 
|---|
|  | 11 | * Created on: Nov 28, 2012 | 
|---|
|  | 12 | *    Author: heber | 
|---|
|  | 13 | */ | 
|---|
|  | 14 |  | 
|---|
|  | 15 | /** \page potentials Empirical Potentials and FunctionModels | 
|---|
|  | 16 | * | 
|---|
|  | 17 | * On this page we explain what is meant with the Potentials sub folder. | 
|---|
|  | 18 | * | 
|---|
|  | 19 | * First, we are based on fragmenting a molecular system, i.e. dissecting its | 
|---|
|  | 20 | * bond structure into connected subgraphs, calculating the energies of the | 
|---|
|  | 21 | * fragments (ab-initio) and summing up to a good approximation of the total | 
|---|
| [caece4] | 22 | * energy of the whole system, \sa fragmentation. | 
|---|
| [201199] | 23 | * Second, having calculated these energies, there quickly comes up the thought | 
|---|
|  | 24 | * that one actually calculates quite similar systems all time and if one could | 
|---|
|  | 25 | * not cache results in an intelligent (i.e. interpolating) fashion ... | 
|---|
|  | 26 | * | 
|---|
|  | 27 | * That's where so-called empirical potentials come into play. They are | 
|---|
|  | 28 | * functions depending on a number of "fitted" parameters and the variable | 
|---|
|  | 29 | * distances within a molecular fragment (i.e. the bond lengths) in order to | 
|---|
|  | 30 | * give a value for the total energy without the need to solve a complex | 
|---|
| [caece4] | 31 | * ab-initio model (essentially, not solving the electronic Schrödinger equation | 
|---|
|  | 32 | * anymore). | 
|---|
| [201199] | 33 | * | 
|---|
|  | 34 | * Empirical potentials have been thought of by fellows such as Lennard-Jones, | 
|---|
|  | 35 | * Morse, Tersoff, Stillinger and Weber, etc. And in their honor, the | 
|---|
|  | 36 | * potential form is named after its inventor. Hence, we speak e.g. of a | 
|---|
|  | 37 | * Lennard-Jones potential. | 
|---|
|  | 38 | * | 
|---|
|  | 39 | * So, what we have to do in order to cache results is the following procedure: | 
|---|
|  | 40 | * -# gather similar fragments | 
|---|
|  | 41 | * -# perform a fit procedure to obtain the parameters for the empirical | 
|---|
|  | 42 | *    potential | 
|---|
|  | 43 | * -# evaluate the potential instead of an ab-initio calculation | 
|---|
|  | 44 | * | 
|---|
|  | 45 | * The terms we use, model the classes that are implemented: | 
|---|
|  | 46 | * -# EmpiricalPotential: Contains the interface to a function that can be | 
|---|
|  | 47 | *    evaluated given a number of arguments_t, i.e. distances. Also, one might | 
|---|
|  | 48 | *    want to evaluate derivatives. | 
|---|
|  | 49 | * -# FunctionModel: Is a function that can be fitted, i.e. that has internal | 
|---|
|  | 50 | *    parameters to be set and got. | 
|---|
|  | 51 | * -# argument_t: The Argument stores not only the distance but also the index | 
|---|
|  | 52 | *    pair of the associated atoms and also their charges, to let the potential | 
|---|
|  | 53 | *    check on validity. | 
|---|
|  | 54 | * -# SerializablePotential: Eventually, one wants to store to or parse from | 
|---|
|  | 55 | *    a file all potential parameters. This functionality is encoded in this | 
|---|
|  | 56 | *    class. | 
|---|
|  | 57 | * -# HomologyGraph: "Similar" fragments in our case have to have the same bond | 
|---|
|  | 58 | *    graph. It is stored in the HomologyGraph that acts as representative | 
|---|
|  | 59 | * -# HomologyContainer: This container combines, in multimap fashion, all | 
|---|
|  | 60 | *    similar fragments with their energies together, with the HomologyGraph | 
|---|
|  | 61 | *    as their "key". | 
|---|
|  | 62 | * -# TrainingData: Here, we combine InputVector and OutputVector that contain | 
|---|
|  | 63 | *    the set of distances required for the FunctionModel (e.g. only a single | 
|---|
|  | 64 | *    distance/argument for a pair potential, three for an angle potential, | 
|---|
|  | 65 | *    etc.) and also the expected OutputVector. This in combination with the | 
|---|
|  | 66 | *    FunctionModel is the basis for the non-linear regression used for the | 
|---|
|  | 67 | *    fitting procedure. | 
|---|
|  | 68 | * -# Extractors: These set of functions yield the set of distances from a | 
|---|
|  | 69 | *    given fragment that is stored in the HomologyContainer. | 
|---|
|  | 70 | * -# FunctionApproximation: Contains the interface to the levmar package where | 
|---|
|  | 71 | *    the Levenberg-Marquardt (Newton + Trust region) algorithm is used to | 
|---|
|  | 72 | *    perform the fit. | 
|---|
|  | 73 | * | 
|---|
| [1ba8a1] | 74 | * \section potentials-fit-potential-action What happens in FitPotentialAction. | 
|---|
|  | 75 | * | 
|---|
|  | 76 | *  First, either a potential file is parsed via PotentialDeserializer or charges | 
|---|
|  | 77 | *  and a potential type from the given options. This is used to instantiate | 
|---|
|  | 78 | *  EmpiricalPotentials via the PotentialFactory, stored within the | 
|---|
|  | 79 | *  PotentialRegistry. This is the available set of potentials (without requiring | 
|---|
|  | 80 | *  any knowledge as to the nature of the fragment employed in fitting). | 
|---|
|  | 81 | * | 
|---|
|  | 82 | *  Second, the given fragment is used to get a suitable HomologyGraph from | 
|---|
|  | 83 | *  the World's HomologyContainer. This is given to a CompoundPotential, that in | 
|---|
|  | 84 | *  turn browses through the PotentialRegistry, picking out those | 
|---|
|  | 85 | *  EmpiricalPotential's that match with a subset of the FragmentNode's of the | 
|---|
|  | 86 | *  given graph. These are stored as a list of FunctionModel's within the | 
|---|
|  | 87 | *  CompoundPotential instance. Here comes the specific fragment into play, | 
|---|
|  | 88 | *  picking a subset from the available potentials. | 
|---|
|  | 89 | * | 
|---|
|  | 90 | *  Third, we need to setup the training data. For this we need vectors of input | 
|---|
|  | 91 | *  and output data that are obtained from the HomologyContainer with the | 
|---|
|  | 92 | *  HomologyGraph as key. The output vector in our case is simply a number | 
|---|
|  | 93 | *  (although the interface allows for more). The input vector is the set of | 
|---|
|  | 94 | *  distances. In order to pre-process the input data for the specific model | 
|---|
|  | 95 | *  a filter is required in the TrainingData's constructor. The purpose of the | 
|---|
|  | 96 | *  filter is to pick out the subset of distance arguments for each model one | 
|---|
|  | 97 | *  after the other and concatenate them to a list. On evaluation of the model | 
|---|
|  | 98 | *  this concatenated list of distances is given to the model and it may easily | 
|---|
|  | 99 | *  dissect the list and hand over each contained potential its subset of | 
|---|
|  | 100 | *  arguments. See Extractors for more information. | 
|---|
|  | 101 | * | 
|---|
|  | 102 | *  Afterwards, training may commence: The goal is to find a set of parameters | 
|---|
|  | 103 | *  for the model such that it as good as possible reproduces the output vector | 
|---|
|  | 104 | *  for a given input vector. This non-linear regression is contained in the | 
|---|
|  | 105 | *  levmar package and its functionality is wrapped in the FunctionApproximation | 
|---|
|  | 106 | *  class. An instance is initialized with both the gathered training data and | 
|---|
|  | 107 | *  the model containing a list of potentials. See | 
|---|
|  | 108 | *  [FunctionApproximation-details] for more details. | 
|---|
|  | 109 | * | 
|---|
|  | 110 | *  During the fitting procedure, first the derivatives of the model is checked | 
|---|
|  | 111 | *  for consistency, then the model is initialized with a sensible guess of | 
|---|
|  | 112 | *  starting parameters, and afterwards the Levenberg-Marquardt algorithm | 
|---|
|  | 113 | *  commences that makes numerous calls to evaluate the model and its derivative | 
|---|
|  | 114 | *  to find the minimum in the L2-norm. | 
|---|
|  | 115 | * | 
|---|
|  | 116 | *  This is done more than once as high-dimensional regression is sensititive the | 
|---|
|  | 117 | *  the starting values as there are possible numerous local minima. The lowest | 
|---|
|  | 118 | *  of the found minima is taken, either via a given threshold or the best of a | 
|---|
|  | 119 | *  given number of attempts. | 
|---|
|  | 120 | * | 
|---|
|  | 121 | *  Eventually, these parameters of the best model are streamed via | 
|---|
|  | 122 | *  PotentialSerializer back into a potential file. Each EmpiricalPotential in | 
|---|
|  | 123 | *  the CompoundPotential making up the whole model is also a | 
|---|
|  | 124 | *  SerializablePotential. Hence, each in turn writes a single line with its | 
|---|
|  | 125 | *  respective subset of parameters and particle types, describing this | 
|---|
|  | 126 | *  particular fit function. | 
|---|
|  | 127 | * | 
|---|
|  | 128 | * \section potentials-function-evaluation How does the model evaluation work | 
|---|
|  | 129 | * | 
|---|
|  | 130 | *  We now come to the question of how the model and its derivative are actually | 
|---|
|  | 131 | *  evaluated. We have an input vector from the training data and we have the | 
|---|
|  | 132 | *  model with a specific set of parameters. | 
|---|
|  | 133 | * | 
|---|
|  | 134 | *  FunctionModel is just an abstract interface that is implemented by the | 
|---|
|  | 135 | *  potential functions, such as CompoundPotential, that combines multiple | 
|---|
|  | 136 | *  potentials into a single function for fitting, or PairPotential_Harmonic, | 
|---|
|  | 137 | *  that is a specific fit function for harmonic bonds. | 
|---|
|  | 138 | * | 
|---|
|  | 139 | *  The main issue with the evaluation is picking the right set of distances from | 
|---|
|  | 140 | *  ones given in the input vector and feed it to each potential contained in | 
|---|
|  | 141 | *  CompoundPotential. Note that the distances have already been prepared by | 
|---|
|  | 142 | *  the TrainingData instantiation. | 
|---|
|  | 143 | * | 
|---|
|  | 144 | *  Initially, the HomologyGraph only contains a list of configurations of a | 
|---|
|  | 145 | *  specific fragments (i.e. the position of each atom in the fragment) and an | 
|---|
|  | 146 | *  energy value. These first have to be converted into distances. | 
|---|
|  | 147 | * | 
|---|
|  | 148 | * | 
|---|
| [48d20d] | 149 | * \section potentials-howto-use Howto use the potentials | 
|---|
| [201199] | 150 | * | 
|---|
|  | 151 | *  We just give a brief run-down in terms of code on how to use the potentials. | 
|---|
| [48d20d] | 152 | *  Here, we just describe what to do in order to perform the fitting. This is | 
|---|
|  | 153 | *  basically what is implemented in FragmentationFitPotentialAction. | 
|---|
| [201199] | 154 | * | 
|---|
|  | 155 | *  \code | 
|---|
|  | 156 | *  // we need the homology container and the representative graph we want to | 
|---|
|  | 157 | *  // fit to. | 
|---|
|  | 158 | *  HomologyContainer homologies; | 
|---|
|  | 159 | *  const HomologyGraph graph = getSomeGraph(homologies); | 
|---|
|  | 160 | *  Fragment::charges_t h2o; | 
|---|
|  | 161 | *  h2o += 8,1,1; | 
|---|
|  | 162 | *  // TrainingData needs so called Extractors to get the required distances | 
|---|
| [caece4] | 163 | *  // from the stored fragment. These functions are bound via boost::bind. | 
|---|
| [201199] | 164 | *  TrainingData AngleData( | 
|---|
|  | 165 | *      boost::bind(&Extractors::gatherDistancesFromFragment, | 
|---|
|  | 166 | *          boost::bind(&Fragment::getPositions, _1), | 
|---|
|  | 167 | *          boost::bind(&Fragment::getCharges, _1), | 
|---|
|  | 168 | *          boost::cref(h2o), | 
|---|
|  | 169 | *          _2) | 
|---|
|  | 170 | *      ); | 
|---|
|  | 171 | *  // now we extract the distances and energies and store them | 
|---|
|  | 172 | *  AngleData(homologies.getHomologousGraphs(graph)); | 
|---|
|  | 173 | *  // give ParticleTypes of this potential to make it unique | 
|---|
|  | 174 | *  PairPotential_Angle::ParticleTypes_t types = | 
|---|
|  | 175 | *        boost::assign::list_of<PairPotential_Angle::ParticleType_t> | 
|---|
|  | 176 | *        (8)(1)(1) | 
|---|
|  | 177 | *      ; | 
|---|
|  | 178 | *  PairPotential_Angle angle(types); | 
|---|
|  | 179 | *  // give initial parameter | 
|---|
|  | 180 | *  FunctionModel::parameters_t params(PairPotential_Angle::MAXPARAMS, 0.); | 
|---|
| [48d20d] | 181 | *  // ... set some potential-specific initial parameters in params struct | 
|---|
| [201199] | 182 | *  angle.setParameters(params); | 
|---|
|  | 183 | * | 
|---|
|  | 184 | *  // use the potential as a FunctionModel along with prepared TrainingData | 
|---|
|  | 185 | *  FunctionModel &model = angle; | 
|---|
|  | 186 | *  FunctionApproximation approximator(AngleData, model); | 
|---|
|  | 187 | *  approximator(FunctionApproximation::ParameterDerivative); | 
|---|
|  | 188 | * | 
|---|
|  | 189 | *  // obtain resulting parameters and check remaining L_2 and L_max error | 
|---|
|  | 190 | *  angleparams = model.getParameters(); | 
|---|
|  | 191 | *  LOG(1, "INFO: L2sum = " << AngleData(model) | 
|---|
|  | 192 | *      << ", LMax = " << AngleData(model) << "."); | 
|---|
|  | 193 | *  \endcode | 
|---|
|  | 194 | * | 
|---|
|  | 195 | *  The evaluation of the fitted potential is then trivial, e.g. | 
|---|
|  | 196 | *  \code | 
|---|
|  | 197 | *  // constructed someplace | 
|---|
|  | 198 | *  PairPotential_Angle angle(...); | 
|---|
|  | 199 | * | 
|---|
|  | 200 | *  // evaluate | 
|---|
|  | 201 | *  FunctionModel::arguments_t args; | 
|---|
| [48d20d] | 202 | *  // .. initialise args to the desired distances | 
|---|
| [201199] | 203 | *  const double value = angle(args)[0]; // output is a vector! | 
|---|
|  | 204 | *  \endcode | 
|---|
|  | 205 | * | 
|---|
| [48d20d] | 206 | * \section potentials-stability-of-fit note in stability of fit | 
|---|
|  | 207 | * | 
|---|
|  | 208 | *  As we always start from random initial parameters (within a certain sensible | 
|---|
| [0932c2] | 209 | *  range at least), the non-linear fit does not always converge. Note that the | 
|---|
|  | 210 | *  random values are drawn from the defined distribution and the uniform distributionm | 
|---|
|  | 211 | *  engine is obtained from the currently set, see \ref randomnumbers. Hence, you | 
|---|
|  | 212 | *  can manipulate both in order to get different results or to set the seed such that | 
|---|
|  | 213 | *  some "randomly" drawn value always work well (e.g. for testing). | 
|---|
|  | 214 | * | 
|---|
|  | 215 | *  In any case, For this case the FragmentationFitPotentialAction has the option | 
|---|
|  | 216 | *  "take-best-of" to allow for multiple fits where the best (in terms of l2 error) | 
|---|
|  | 217 | *  is taken eventually. Furthermore, you can use the "set-threshold" option to | 
|---|
|  | 218 | *  stop restarting the fit procedure first when the L2 error has dropped below the | 
|---|
|  | 219 | *  given threshold. | 
|---|
| [48d20d] | 220 | * | 
|---|
|  | 221 | * \section potentials-howto-add Howto add new potentials | 
|---|
|  | 222 | * | 
|---|
|  | 223 | *  Adding a new potential requires the following: | 
|---|
|  | 224 | *  -# Add the new modules to Potentials/Specifics | 
|---|
|  | 225 | *  -# Add a unit test on the potential in Potentials/Specifics/unittests | 
|---|
|  | 226 | *  -# Give the potential a type name and add it to PotentialTypes.def. Note | 
|---|
|  | 227 | *     that the name must not contain white space. | 
|---|
|  | 228 | *  -# Add the potential name as case to PotentialFactory such that it knows | 
|---|
|  | 229 | *     how to instantiate your new potential when requested. | 
|---|
| [0932c2] | 230 | *  -# Remember to use the the RandomNumberGenerator for getting random starting | 
|---|
|  | 231 | *     values! | 
|---|
| [48d20d] | 232 | * | 
|---|
|  | 233 | * PotentialTypes.def contains a boost::preprocessor sequence of all | 
|---|
|  | 234 | * potential names. PotentialFactory uses this sequence to build its enum to | 
|---|
|  | 235 | * type map and inverse which the user sees when specifying the potential to | 
|---|
|  | 236 | * fit via PotentialTypeValidator. | 
|---|
|  | 237 | * | 
|---|
|  | 238 | * | 
|---|
|  | 239 | * \date 2013-04-09 | 
|---|
| [201199] | 240 | */ | 
|---|