Context Navigation

potentials.dox@ 2315f3

Visit:

ForceAnnealing_goodresults ForceAnnealing_tocheck

Last change on this file since 2315f3 was c5e75f3, checked in by Frederik Heber <heber@…>, 9 years ago

ParticleType_t is now an unsigned int.

this allows direct use of atomicNumber_t and also signed integers dont' make much sense for a particle type, i.e. an id.

Property mode set to 100644

File size: 11.9 KB

Rev	Line
[201199]	1	/*
	2	* Project: MoleCuilder
	3	* Description: creates and alters molecular systems
	4	* Copyright (C) 2010 University of Bonn. All rights reserved.
	5	* Please see the LICENSE file or "Copyright notice" in builder.cpp for details.
	6	*/
	7
	8	/**
	9	* \file potentials.dox
	10	*
	11	* Created on: Nov 28, 2012
	12	* Author: heber
	13	*/
	14
	15	/** \page potentials Empirical Potentials and FunctionModels
	16	*
	17	* On this page we explain what is meant with the Potentials sub folder.
	18	*
	19	* First, we are based on fragmenting a molecular system, i.e. dissecting its
	20	* bond structure into connected subgraphs, calculating the energies of the
	21	* fragments (ab-initio) and summing up to a good approximation of the total
[caece4]	22	* energy of the whole system, \sa fragmentation.
[201199]	23	* Second, having calculated these energies, there quickly comes up the thought
	24	* that one actually calculates quite similar systems all time and if one could
	25	* not cache results in an intelligent (i.e. interpolating) fashion ...
	26	*
	27	* That's where so-called empirical potentials come into play. They are
	28	* functions depending on a number of "fitted" parameters and the variable
	29	* distances within a molecular fragment (i.e. the bond lengths) in order to
	30	* give a value for the total energy without the need to solve a complex
[caece4]	31	* ab-initio model (essentially, not solving the electronic SchrÃ¶dinger equation
	32	* anymore).
[201199]	33	*
	34	* Empirical potentials have been thought of by fellows such as Lennard-Jones,
	35	* Morse, Tersoff, Stillinger and Weber, etc. And in their honor, the
	36	* potential form is named after its inventor. Hence, we speak e.g. of a
	37	* Lennard-Jones potential.
	38	*
	39	* So, what we have to do in order to cache results is the following procedure:
	40	* -# gather similar fragments
	41	* -# perform a fit procedure to obtain the parameters for the empirical
	42	* potential
	43	* -# evaluate the potential instead of an ab-initio calculation
	44	*
	45	* The terms we use, model the classes that are implemented:
	46	* -# EmpiricalPotential: Contains the interface to a function that can be
	47	* evaluated given a number of arguments_t, i.e. distances. Also, one might
	48	* want to evaluate derivatives.
	49	* -# FunctionModel: Is a function that can be fitted, i.e. that has internal
	50	* parameters to be set and got.
	51	* -# argument_t: The Argument stores not only the distance but also the index
	52	* pair of the associated atoms and also their charges, to let the potential
	53	* check on validity.
	54	* -# SerializablePotential: Eventually, one wants to store to or parse from
	55	* a file all potential parameters. This functionality is encoded in this
	56	* class.
	57	* -# HomologyGraph: "Similar" fragments in our case have to have the same bond
	58	* graph. It is stored in the HomologyGraph that acts as representative
	59	* -# HomologyContainer: This container combines, in multimap fashion, all
	60	* similar fragments with their energies together, with the HomologyGraph
	61	* as their "key".
	62	* -# TrainingData: Here, we combine InputVector and OutputVector that contain
	63	* the set of distances required for the FunctionModel (e.g. only a single
	64	* distance/argument for a pair potential, three for an angle potential,
	65	* etc.) and also the expected OutputVector. This in combination with the
	66	* FunctionModel is the basis for the non-linear regression used for the
	67	* fitting procedure.
	68	* -# Extractors: These set of functions yield the set of distances from a
	69	* given fragment that is stored in the HomologyContainer.
	70	* -# FunctionApproximation: Contains the interface to the levmar package where
	71	* the Levenberg-Marquardt (Newton + Trust region) algorithm is used to
	72	* perform the fit.
	73	*
[1ba8a1]	74	* \section potentials-fit-potential-action What happens in FitPotentialAction.
	75	*
	76	* First, either a potential file is parsed via PotentialDeserializer or charges
	77	* and a potential type from the given options. This is used to instantiate
	78	* EmpiricalPotentials via the PotentialFactory, stored within the
	79	* PotentialRegistry. This is the available set of potentials (without requiring
	80	* any knowledge as to the nature of the fragment employed in fitting).
	81	*
	82	* Second, the given fragment is used to get a suitable HomologyGraph from
	83	* the World's HomologyContainer. This is given to a CompoundPotential, that in
	84	* turn browses through the PotentialRegistry, picking out those
	85	* EmpiricalPotential's that match with a subset of the FragmentNode's of the
	86	* given graph. These are stored as a list of FunctionModel's within the
	87	* CompoundPotential instance. Here comes the specific fragment into play,
	88	* picking a subset from the available potentials.
	89	*
	90	* Third, we need to setup the training data. For this we need vectors of input
	91	* and output data that are obtained from the HomologyContainer with the
	92	* HomologyGraph as key. The output vector in our case is simply a number
	93	* (although the interface allows for more). The input vector is the set of
	94	* distances. In order to pre-process the input data for the specific model
	95	* a filter is required in the TrainingData's constructor. The purpose of the
	96	* filter is to pick out the subset of distance arguments for each model one
	97	* after the other and concatenate them to a list. On evaluation of the model
	98	* this concatenated list of distances is given to the model and it may easily
	99	* dissect the list and hand over each contained potential its subset of
	100	* arguments. See Extractors for more information.
	101	*
	102	* Afterwards, training may commence: The goal is to find a set of parameters
	103	* for the model such that it as good as possible reproduces the output vector
	104	* for a given input vector. This non-linear regression is contained in the
	105	* levmar package and its functionality is wrapped in the FunctionApproximation
	106	* class. An instance is initialized with both the gathered training data and
	107	* the model containing a list of potentials. See
	108	* [FunctionApproximation-details] for more details.
	109	*
	110	* During the fitting procedure, first the derivatives of the model is checked
	111	* for consistency, then the model is initialized with a sensible guess of
	112	* starting parameters, and afterwards the Levenberg-Marquardt algorithm
	113	* commences that makes numerous calls to evaluate the model and its derivative
	114	* to find the minimum in the L2-norm.
	115	*
	116	* This is done more than once as high-dimensional regression is sensititive the
	117	* the starting values as there are possible numerous local minima. The lowest
	118	* of the found minima is taken, either via a given threshold or the best of a
	119	* given number of attempts.
	120	*
	121	* Eventually, these parameters of the best model are streamed via
	122	* PotentialSerializer back into a potential file. Each EmpiricalPotential in
	123	* the CompoundPotential making up the whole model is also a
	124	* SerializablePotential. Hence, each in turn writes a single line with its
	125	* respective subset of parameters and particle types, describing this
	126	* particular fit function.
	127	*
	128	* \section potentials-function-evaluation How does the model evaluation work
	129	*
	130	* We now come to the question of how the model and its derivative are actually
	131	* evaluated. We have an input vector from the training data and we have the
	132	* model with a specific set of parameters.
	133	*
	134	* FunctionModel is just an abstract interface that is implemented by the
	135	* potential functions, such as CompoundPotential, that combines multiple
	136	* potentials into a single function for fitting, or PairPotential_Harmonic,
	137	* that is a specific fit function for harmonic bonds.
	138	*
	139	* The main issue with the evaluation is picking the right set of distances from
	140	* ones given in the input vector and feed it to each potential contained in
	141	* CompoundPotential. Note that the distances have already been prepared by
	142	* the TrainingData instantiation.
	143	*
	144	* Initially, the HomologyGraph only contains a list of configurations of a
	145	* specific fragments (i.e. the position of each atom in the fragment) and an
	146	* energy value. These first have to be converted into distances.
	147	*
	148	*
[48d20d]	149	* \section potentials-howto-use Howto use the potentials
[201199]	150	*
	151	* We just give a brief run-down in terms of code on how to use the potentials.
[48d20d]	152	* Here, we just describe what to do in order to perform the fitting. This is
	153	* basically what is implemented in FragmentationFitPotentialAction.
[201199]	154	*
	155	* \code
	156	* // we need the homology container and the representative graph we want to
	157	* // fit to.
	158	* HomologyContainer homologies;
	159	* const HomologyGraph graph = getSomeGraph(homologies);
[c5e75f3]	160	* Fragment::atomicnumbers_t h2o;
[201199]	161	* h2o += 8,1,1;
	162	* // TrainingData needs so called Extractors to get the required distances
[caece4]	163	* // from the stored fragment. These functions are bound via boost::bind.
[201199]	164	* TrainingData AngleData(
	165	* boost::bind(&Extractors::gatherDistancesFromFragment,
	166	* boost::bind(&Fragment::getPositions, _1),
[c5e75f3]	167	* boost::bind(&Fragment::getAtomicNumbers, _1),
[201199]	168	* boost::cref(h2o),
	169	* _2)
	170	* );
	171	* // now we extract the distances and energies and store them
	172	* AngleData(homologies.getHomologousGraphs(graph));
	173	* // give ParticleTypes of this potential to make it unique
	174	* PairPotential_Angle::ParticleTypes_t types =
	175	* boost::assign::list_of<PairPotential_Angle::ParticleType_t>
	176	* (8)(1)(1)
	177	* ;
	178	* PairPotential_Angle angle(types);
	179	* // give initial parameter
	180	* FunctionModel::parameters_t params(PairPotential_Angle::MAXPARAMS, 0.);
[48d20d]	181	* // ... set some potential-specific initial parameters in params struct
[201199]	182	* angle.setParameters(params);
	183	*
	184	* // use the potential as a FunctionModel along with prepared TrainingData
	185	* FunctionModel &model = angle;
	186	* FunctionApproximation approximator(AngleData, model);
	187	* approximator(FunctionApproximation::ParameterDerivative);
	188	*
	189	* // obtain resulting parameters and check remaining L_2 and L_max error
	190	* angleparams = model.getParameters();
	191	* LOG(1, "INFO: L2sum = " << AngleData(model)
	192	* << ", LMax = " << AngleData(model) << ".");
	193	* \endcode
	194	*
	195	* The evaluation of the fitted potential is then trivial, e.g.
	196	* \code
	197	* // constructed someplace
	198	* PairPotential_Angle angle(...);
	199	*
	200	* // evaluate
	201	* FunctionModel::arguments_t args;
[48d20d]	202	* // .. initialise args to the desired distances
[201199]	203	* const double value = angle(args)[0]; // output is a vector!
	204	* \endcode
	205	*
[48d20d]	206	* \section potentials-stability-of-fit note in stability of fit
	207	*
	208	* As we always start from random initial parameters (within a certain sensible
[0932c2]	209	* range at least), the non-linear fit does not always converge. Note that the
	210	* random values are drawn from the defined distribution and the uniform distributionm
	211	* engine is obtained from the currently set, see \ref randomnumbers. Hence, you
	212	* can manipulate both in order to get different results or to set the seed such that
	213	* some "randomly" drawn value always work well (e.g. for testing).
	214	*
	215	* In any case, For this case the FragmentationFitPotentialAction has the option
	216	* "take-best-of" to allow for multiple fits where the best (in terms of l2 error)
	217	* is taken eventually. Furthermore, you can use the "set-threshold" option to
	218	* stop restarting the fit procedure first when the L2 error has dropped below the
	219	* given threshold.
[48d20d]	220	*
	221	* \section potentials-howto-add Howto add new potentials
	222	*
	223	* Adding a new potential requires the following:
	224	* -# Add the new modules to Potentials/Specifics
	225	* -# Add a unit test on the potential in Potentials/Specifics/unittests
	226	* -# Give the potential a type name and add it to PotentialTypes.def. Note
	227	* that the name must not contain white space.
	228	* -# Add the potential name as case to PotentialFactory such that it knows
	229	* how to instantiate your new potential when requested.
[0932c2]	230	* -# Remember to use the the RandomNumberGenerator for getting random starting
	231	* values!
[48d20d]	232	*
	233	* PotentialTypes.def contains a boost::preprocessor sequence of all
	234	* potential names. PotentialFactory uses this sequence to build its enum to
	235	* type map and inverse which the user sees when specifying the potential to
	236	* fit via PotentialTypeValidator.
	237	*
	238	*
	239	* \date 2013-04-09
[201199]	240	*/

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format