| [201199] | 1 | /*
 | 
|---|
 | 2 |  * Project: MoleCuilder
 | 
|---|
 | 3 |  * Description: creates and alters molecular systems
 | 
|---|
 | 4 |  * Copyright (C)  2010 University of Bonn. All rights reserved.
 | 
|---|
 | 5 |  * Please see the LICENSE file or "Copyright notice" in builder.cpp for details.
 | 
|---|
 | 6 |  */
 | 
|---|
 | 7 | 
 | 
|---|
 | 8 | /**
 | 
|---|
 | 9 |  * \file potentials.dox
 | 
|---|
 | 10 |  *
 | 
|---|
 | 11 |  * Created on: Nov 28, 2012
 | 
|---|
 | 12 |  *    Author: heber
 | 
|---|
 | 13 |  */
 | 
|---|
 | 14 | 
 | 
|---|
 | 15 | /** \page potentials Empirical Potentials and FunctionModels
 | 
|---|
 | 16 |  *
 | 
|---|
 | 17 |  * On this page we explain what is meant with the Potentials sub folder.
 | 
|---|
 | 18 |  *
 | 
|---|
 | 19 |  * First, we are based on fragmenting a molecular system, i.e. dissecting its
 | 
|---|
 | 20 |  * bond structure into connected subgraphs, calculating the energies of the
 | 
|---|
 | 21 |  * fragments (ab-initio) and summing up to a good approximation of the total
 | 
|---|
| [caece4] | 22 |  * energy of the whole system, \sa fragmentation.
 | 
|---|
| [201199] | 23 |  * Second, having calculated these energies, there quickly comes up the thought
 | 
|---|
 | 24 |  * that one actually calculates quite similar systems all time and if one could
 | 
|---|
 | 25 |  * not cache results in an intelligent (i.e. interpolating) fashion ...
 | 
|---|
 | 26 |  *
 | 
|---|
 | 27 |  * That's where so-called empirical potentials come into play. They are
 | 
|---|
 | 28 |  * functions depending on a number of "fitted" parameters and the variable
 | 
|---|
 | 29 |  * distances within a molecular fragment (i.e. the bond lengths) in order to
 | 
|---|
 | 30 |  * give a value for the total energy without the need to solve a complex
 | 
|---|
| [caece4] | 31 |  * ab-initio model (essentially, not solving the electronic Schrödinger equation
 | 
|---|
 | 32 |  * anymore).
 | 
|---|
| [201199] | 33 |  *
 | 
|---|
 | 34 |  * Empirical potentials have been thought of by fellows such as Lennard-Jones,
 | 
|---|
 | 35 |  * Morse, Tersoff, Stillinger and Weber, etc. And in their honor, the
 | 
|---|
 | 36 |  * potential form is named after its inventor. Hence, we speak e.g. of a
 | 
|---|
 | 37 |  * Lennard-Jones potential.
 | 
|---|
 | 38 |  *
 | 
|---|
 | 39 |  * So, what we have to do in order to cache results is the following procedure:
 | 
|---|
 | 40 |  * -# gather similar fragments
 | 
|---|
 | 41 |  * -# perform a fit procedure to obtain the parameters for the empirical
 | 
|---|
 | 42 |  *    potential
 | 
|---|
 | 43 |  * -# evaluate the potential instead of an ab-initio calculation
 | 
|---|
 | 44 |  *
 | 
|---|
 | 45 |  * The terms we use, model the classes that are implemented:
 | 
|---|
 | 46 |  * -# EmpiricalPotential: Contains the interface to a function that can be
 | 
|---|
 | 47 |  *    evaluated given a number of arguments_t, i.e. distances. Also, one might
 | 
|---|
 | 48 |  *    want to evaluate derivatives.
 | 
|---|
 | 49 |  * -# FunctionModel: Is a function that can be fitted, i.e. that has internal
 | 
|---|
 | 50 |  *    parameters to be set and got.
 | 
|---|
 | 51 |  * -# argument_t: The Argument stores not only the distance but also the index
 | 
|---|
 | 52 |  *    pair of the associated atoms and also their charges, to let the potential
 | 
|---|
 | 53 |  *    check on validity.
 | 
|---|
 | 54 |  * -# SerializablePotential: Eventually, one wants to store to or parse from
 | 
|---|
 | 55 |  *    a file all potential parameters. This functionality is encoded in this
 | 
|---|
 | 56 |  *    class.
 | 
|---|
 | 57 |  * -# HomologyGraph: "Similar" fragments in our case have to have the same bond
 | 
|---|
 | 58 |  *    graph. It is stored in the HomologyGraph that acts as representative
 | 
|---|
 | 59 |  * -# HomologyContainer: This container combines, in multimap fashion, all
 | 
|---|
 | 60 |  *    similar fragments with their energies together, with the HomologyGraph
 | 
|---|
 | 61 |  *    as their "key".
 | 
|---|
 | 62 |  * -# TrainingData: Here, we combine InputVector and OutputVector that contain
 | 
|---|
 | 63 |  *    the set of distances required for the FunctionModel (e.g. only a single
 | 
|---|
 | 64 |  *    distance/argument for a pair potential, three for an angle potential,
 | 
|---|
 | 65 |  *    etc.) and also the expected OutputVector. This in combination with the
 | 
|---|
 | 66 |  *    FunctionModel is the basis for the non-linear regression used for the
 | 
|---|
 | 67 |  *    fitting procedure.
 | 
|---|
 | 68 |  * -# Extractors: These set of functions yield the set of distances from a
 | 
|---|
 | 69 |  *    given fragment that is stored in the HomologyContainer.
 | 
|---|
 | 70 |  * -# FunctionApproximation: Contains the interface to the levmar package where
 | 
|---|
 | 71 |  *    the Levenberg-Marquardt (Newton + Trust region) algorithm is used to
 | 
|---|
 | 72 |  *    perform the fit.
 | 
|---|
 | 73 |  *
 | 
|---|
| [1ba8a1] | 74 |  * \section potentials-fit-potential-action What happens in FitPotentialAction.
 | 
|---|
 | 75 |  *
 | 
|---|
 | 76 |  *  First, either a potential file is parsed via PotentialDeserializer or charges
 | 
|---|
 | 77 |  *  and a potential type from the given options. This is used to instantiate
 | 
|---|
 | 78 |  *  EmpiricalPotentials via the PotentialFactory, stored within the
 | 
|---|
 | 79 |  *  PotentialRegistry. This is the available set of potentials (without requiring
 | 
|---|
 | 80 |  *  any knowledge as to the nature of the fragment employed in fitting).
 | 
|---|
 | 81 |  *
 | 
|---|
 | 82 |  *  Second, the given fragment is used to get a suitable HomologyGraph from
 | 
|---|
 | 83 |  *  the World's HomologyContainer. This is given to a CompoundPotential, that in
 | 
|---|
 | 84 |  *  turn browses through the PotentialRegistry, picking out those
 | 
|---|
 | 85 |  *  EmpiricalPotential's that match with a subset of the FragmentNode's of the
 | 
|---|
 | 86 |  *  given graph. These are stored as a list of FunctionModel's within the
 | 
|---|
 | 87 |  *  CompoundPotential instance. Here comes the specific fragment into play,
 | 
|---|
 | 88 |  *  picking a subset from the available potentials.
 | 
|---|
 | 89 |  *
 | 
|---|
 | 90 |  *  Third, we need to setup the training data. For this we need vectors of input
 | 
|---|
 | 91 |  *  and output data that are obtained from the HomologyContainer with the
 | 
|---|
 | 92 |  *  HomologyGraph as key. The output vector in our case is simply a number
 | 
|---|
 | 93 |  *  (although the interface allows for more). The input vector is the set of
 | 
|---|
 | 94 |  *  distances. In order to pre-process the input data for the specific model
 | 
|---|
 | 95 |  *  a filter is required in the TrainingData's constructor. The purpose of the
 | 
|---|
 | 96 |  *  filter is to pick out the subset of distance arguments for each model one
 | 
|---|
 | 97 |  *  after the other and concatenate them to a list. On evaluation of the model
 | 
|---|
 | 98 |  *  this concatenated list of distances is given to the model and it may easily
 | 
|---|
 | 99 |  *  dissect the list and hand over each contained potential its subset of
 | 
|---|
 | 100 |  *  arguments. See Extractors for more information.
 | 
|---|
 | 101 |  *
 | 
|---|
 | 102 |  *  Afterwards, training may commence: The goal is to find a set of parameters
 | 
|---|
 | 103 |  *  for the model such that it as good as possible reproduces the output vector
 | 
|---|
 | 104 |  *  for a given input vector. This non-linear regression is contained in the
 | 
|---|
 | 105 |  *  levmar package and its functionality is wrapped in the FunctionApproximation
 | 
|---|
 | 106 |  *  class. An instance is initialized with both the gathered training data and
 | 
|---|
 | 107 |  *  the model containing a list of potentials. See
 | 
|---|
 | 108 |  *  [FunctionApproximation-details] for more details.
 | 
|---|
 | 109 |  *
 | 
|---|
 | 110 |  *  During the fitting procedure, first the derivatives of the model is checked
 | 
|---|
 | 111 |  *  for consistency, then the model is initialized with a sensible guess of
 | 
|---|
 | 112 |  *  starting parameters, and afterwards the Levenberg-Marquardt algorithm
 | 
|---|
 | 113 |  *  commences that makes numerous calls to evaluate the model and its derivative
 | 
|---|
 | 114 |  *  to find the minimum in the L2-norm.
 | 
|---|
 | 115 |  *
 | 
|---|
 | 116 |  *  This is done more than once as high-dimensional regression is sensititive the
 | 
|---|
 | 117 |  *  the starting values as there are possible numerous local minima. The lowest
 | 
|---|
 | 118 |  *  of the found minima is taken, either via a given threshold or the best of a
 | 
|---|
 | 119 |  *  given number of attempts.
 | 
|---|
 | 120 |  *
 | 
|---|
 | 121 |  *  Eventually, these parameters of the best model are streamed via
 | 
|---|
 | 122 |  *  PotentialSerializer back into a potential file. Each EmpiricalPotential in
 | 
|---|
 | 123 |  *  the CompoundPotential making up the whole model is also a
 | 
|---|
 | 124 |  *  SerializablePotential. Hence, each in turn writes a single line with its
 | 
|---|
 | 125 |  *  respective subset of parameters and particle types, describing this
 | 
|---|
 | 126 |  *  particular fit function.
 | 
|---|
 | 127 |  *
 | 
|---|
 | 128 |  * \section potentials-function-evaluation How does the model evaluation work
 | 
|---|
 | 129 |  *
 | 
|---|
 | 130 |  *  We now come to the question of how the model and its derivative are actually
 | 
|---|
 | 131 |  *  evaluated. We have an input vector from the training data and we have the
 | 
|---|
 | 132 |  *  model with a specific set of parameters.
 | 
|---|
 | 133 |  *
 | 
|---|
 | 134 |  *  FunctionModel is just an abstract interface that is implemented by the
 | 
|---|
 | 135 |  *  potential functions, such as CompoundPotential, that combines multiple
 | 
|---|
 | 136 |  *  potentials into a single function for fitting, or PairPotential_Harmonic,
 | 
|---|
 | 137 |  *  that is a specific fit function for harmonic bonds.
 | 
|---|
 | 138 |  *
 | 
|---|
 | 139 |  *  The main issue with the evaluation is picking the right set of distances from
 | 
|---|
 | 140 |  *  ones given in the input vector and feed it to each potential contained in
 | 
|---|
 | 141 |  *  CompoundPotential. Note that the distances have already been prepared by
 | 
|---|
 | 142 |  *  the TrainingData instantiation.
 | 
|---|
 | 143 |  *
 | 
|---|
 | 144 |  *  Initially, the HomologyGraph only contains a list of configurations of a
 | 
|---|
 | 145 |  *  specific fragments (i.e. the position of each atom in the fragment) and an
 | 
|---|
 | 146 |  *  energy value. These first have to be converted into distances.
 | 
|---|
 | 147 |  *
 | 
|---|
 | 148 |  *
 | 
|---|
| [48d20d] | 149 |  * \section potentials-howto-use Howto use the potentials
 | 
|---|
| [201199] | 150 |  *
 | 
|---|
 | 151 |  *  We just give a brief run-down in terms of code on how to use the potentials.
 | 
|---|
| [48d20d] | 152 |  *  Here, we just describe what to do in order to perform the fitting. This is
 | 
|---|
 | 153 |  *  basically what is implemented in FragmentationFitPotentialAction.
 | 
|---|
| [201199] | 154 |  *
 | 
|---|
 | 155 |  *  \code
 | 
|---|
 | 156 |  *  // we need the homology container and the representative graph we want to
 | 
|---|
 | 157 |  *  // fit to.
 | 
|---|
 | 158 |  *  HomologyContainer homologies;
 | 
|---|
 | 159 |  *  const HomologyGraph graph = getSomeGraph(homologies);
 | 
|---|
| [c5e75f3] | 160 |  *  Fragment::atomicnumbers_t h2o;
 | 
|---|
| [201199] | 161 |  *  h2o += 8,1,1;
 | 
|---|
 | 162 |  *  // TrainingData needs so called Extractors to get the required distances
 | 
|---|
| [caece4] | 163 |  *  // from the stored fragment. These functions are bound via boost::bind.
 | 
|---|
| [201199] | 164 |  *  TrainingData AngleData(
 | 
|---|
 | 165 |  *      boost::bind(&Extractors::gatherDistancesFromFragment,
 | 
|---|
 | 166 |  *          boost::bind(&Fragment::getPositions, _1),
 | 
|---|
| [c5e75f3] | 167 |  *          boost::bind(&Fragment::getAtomicNumbers, _1),
 | 
|---|
| [201199] | 168 |  *          boost::cref(h2o),
 | 
|---|
 | 169 |  *          _2)
 | 
|---|
 | 170 |  *      );
 | 
|---|
 | 171 |  *  // now we extract the distances and energies and store them
 | 
|---|
 | 172 |  *  AngleData(homologies.getHomologousGraphs(graph));
 | 
|---|
 | 173 |  *  // give ParticleTypes of this potential to make it unique
 | 
|---|
 | 174 |  *  PairPotential_Angle::ParticleTypes_t types =
 | 
|---|
 | 175 |  *        boost::assign::list_of<PairPotential_Angle::ParticleType_t>
 | 
|---|
 | 176 |  *        (8)(1)(1)
 | 
|---|
 | 177 |  *      ;
 | 
|---|
 | 178 |  *  PairPotential_Angle angle(types);
 | 
|---|
 | 179 |  *  // give initial parameter
 | 
|---|
 | 180 |  *  FunctionModel::parameters_t params(PairPotential_Angle::MAXPARAMS, 0.);
 | 
|---|
| [48d20d] | 181 |  *  // ... set some potential-specific initial parameters in params struct
 | 
|---|
| [201199] | 182 |  *  angle.setParameters(params);
 | 
|---|
 | 183 |  *
 | 
|---|
 | 184 |  *  // use the potential as a FunctionModel along with prepared TrainingData
 | 
|---|
 | 185 |  *  FunctionModel &model = angle;
 | 
|---|
 | 186 |  *  FunctionApproximation approximator(AngleData, model);
 | 
|---|
 | 187 |  *  approximator(FunctionApproximation::ParameterDerivative);
 | 
|---|
 | 188 |  *
 | 
|---|
 | 189 |  *  // obtain resulting parameters and check remaining L_2 and L_max error
 | 
|---|
 | 190 |  *  angleparams = model.getParameters();
 | 
|---|
 | 191 |  *  LOG(1, "INFO: L2sum = " << AngleData(model)
 | 
|---|
 | 192 |  *      << ", LMax = " << AngleData(model) << ".");
 | 
|---|
 | 193 |  *  \endcode
 | 
|---|
 | 194 |  *
 | 
|---|
 | 195 |  *  The evaluation of the fitted potential is then trivial, e.g.
 | 
|---|
 | 196 |  *  \code
 | 
|---|
 | 197 |  *  // constructed someplace
 | 
|---|
 | 198 |  *  PairPotential_Angle angle(...);
 | 
|---|
 | 199 |  *
 | 
|---|
 | 200 |  *  // evaluate
 | 
|---|
 | 201 |  *  FunctionModel::arguments_t args;
 | 
|---|
| [48d20d] | 202 |  *  // .. initialise args to the desired distances
 | 
|---|
| [201199] | 203 |  *  const double value = angle(args)[0]; // output is a vector!
 | 
|---|
 | 204 |  *  \endcode
 | 
|---|
 | 205 |  *
 | 
|---|
| [48d20d] | 206 |  * \section potentials-stability-of-fit note in stability of fit
 | 
|---|
 | 207 |  *
 | 
|---|
 | 208 |  *  As we always start from random initial parameters (within a certain sensible
 | 
|---|
| [0932c2] | 209 |  *  range at least), the non-linear fit does not always converge. Note that the
 | 
|---|
 | 210 |  *  random values are drawn from the defined distribution and the uniform distributionm
 | 
|---|
 | 211 |  *  engine is obtained from the currently set, see \ref randomnumbers. Hence, you
 | 
|---|
 | 212 |  *  can manipulate both in order to get different results or to set the seed such that
 | 
|---|
 | 213 |  *  some "randomly" drawn value always work well (e.g. for testing). 
 | 
|---|
 | 214 |  *  
 | 
|---|
 | 215 |  *  In any case, For this case the FragmentationFitPotentialAction has the option 
 | 
|---|
 | 216 |  *  "take-best-of" to allow for multiple fits where the best (in terms of l2 error) 
 | 
|---|
 | 217 |  *  is taken eventually. Furthermore, you can use the "set-threshold" option to
 | 
|---|
 | 218 |  *  stop restarting the fit procedure first when the L2 error has dropped below the
 | 
|---|
 | 219 |  *  given threshold.
 | 
|---|
| [48d20d] | 220 |  *
 | 
|---|
 | 221 |  * \section potentials-howto-add Howto add new potentials
 | 
|---|
 | 222 |  *
 | 
|---|
 | 223 |  *  Adding a new potential requires the following:
 | 
|---|
 | 224 |  *  -# Add the new modules to Potentials/Specifics
 | 
|---|
 | 225 |  *  -# Add a unit test on the potential in Potentials/Specifics/unittests
 | 
|---|
 | 226 |  *  -# Give the potential a type name and add it to PotentialTypes.def. Note
 | 
|---|
 | 227 |  *     that the name must not contain white space.
 | 
|---|
 | 228 |  *  -# Add the potential name as case to PotentialFactory such that it knows
 | 
|---|
 | 229 |  *     how to instantiate your new potential when requested.
 | 
|---|
| [0932c2] | 230 |  *  -# Remember to use the the RandomNumberGenerator for getting random starting
 | 
|---|
 | 231 |  *     values!
 | 
|---|
| [48d20d] | 232 |  *
 | 
|---|
 | 233 |  * PotentialTypes.def contains a boost::preprocessor sequence of all
 | 
|---|
 | 234 |  * potential names. PotentialFactory uses this sequence to build its enum to
 | 
|---|
 | 235 |  * type map and inverse which the user sees when specifying the potential to
 | 
|---|
 | 236 |  * fit via PotentialTypeValidator.
 | 
|---|
 | 237 |  *
 | 
|---|
 | 238 |  *
 | 
|---|
 | 239 |  * \date 2013-04-09
 | 
|---|
| [201199] | 240 |  */
 | 
|---|