Levenberg–Marquardt algorithm

From Wikipedia, the free encyclopedia

In mathematics and computing, the Levenberg–Marquardt algorithm (or LMA) provides a numerical solution to the problem of minimizing a function, generally nonlinear, over a space of parameters of the function. These minimization problems arise especially in least squares curve fitting and nonlinear programming.

The LMA interpolates between the Gauss-Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. On the other hand, for well-behaved functions and reasonable starting parameters, the LMA tends to be a bit slower than the GNA.

The LMA is a very popular curve-fitting algorithm; most software with generic curve-fitting capabilities provide an implementation of it.

1 The problem
2 The solution
- 2.1 Choice of damping parameter
3 Example
4 References
5 External links

[edit] The problem

Its main application is in the least squares curve fitting problem: given a set of empirical data pairs of independent and dependent variables, (x_i, y_i), optimize the parameters β of the model curve f(x,β) so that the sum of the squares of the deviations

$S(\boldsymbol \beta) = \sum_{i=1}^m [y_i - f(x_i, \ \boldsymbol \beta) ]^2$

becomes minimal.

[edit] The solution

Like other numeric minimization algorithms, the Levenberg-Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector β. In many cases, an uninformed standard guess like β^T=(1,1,...,1) will work fine; in other cases, the algorithm converges only if the initial guess is already somewhat close to the final solution.

In each iteration step, the parameter vector β is replaced by a new estimate β + δ. To determine δ, the functions f(β + δ) are approximated by their linearizations

$f(x_i,\boldsymbol \beta+\boldsymbol \delta) \approx f(x _i,\boldsymbol \beta) + J_i \boldsymbol \delta \!$

where $J_i=\frac{\partial f(x_i,\beta)}{\partial \beta}$ is the gradient (row-vector in this case) of f with respect to β.

At a minimum of the sum of squares $S$ , the gradient of $S$ with respect to δ is 0. Differentiating the squares in the definition of $S$ , using the above first-order approximation of $f(x_i,\boldsymbol \beta+\boldsymbol \delta)$ , and setting the result to zero leads to:

$\mathbf{(J^{T}J)\boldsymbol \delta = J^{T} [y - f(\boldsymbol \beta)]} \!$

where $\mathbf{J}$ is the Jacobian matrix whose i-th row equals $J i$ , and where $\mathbf{f}$ and $\mathbf{y}$ are vectors with i-th component $f(x_i,\boldsymbol \beta)$ and $y i$ , respectively. This is a set of linear equations which can be solved for δ.

Levenberg's contribution is to replace this equation by a 'damped version'

$\mathbf{(J^{T}J + \lambda I)\boldsymbol \delta = J^{T} [y - f(\boldsymbol \beta)]}\!$

where I is the identity matrix, giving as the increment δ to the estimated parameter vector β.

The (non-negative) damping factor λ is adjusted at each iteration. If reduction of S is rapid a smaller value can be used bringing the algorithm closer to the GNA, whereas if an iteration gives insufficient reduction in the residual λ can be increased giving a step closer to the gradient descent direction. Note that the gradient of S with respect to β equals $(-2)(J^{T} [y - f(\boldsymbol \beta) ] )^{T}$ , therefore, for large values of λ, the step will be taken approximately in the direction of the gradient. If either the calculated step length δ, or the reduction of sum of squares reduction from the latest parameter vector β + δ, fall below predefined limits, the iteration is aborted and the last parameter vector β is considered to be the solution.

Levenberg's algorithm has the disadvantage that if the value of damping factor λ is large, inverting J^TJ+λ I is not used at all. Marquardt provided the insight that we can scale each component of the gradient according to the curvature so that there is larger movement along the directions where the gradient is smaller. This avoids slow convergence in the direction of small gradient. Therefore, Marquardt replaced the identity matrix I with the diagonal of the Hessian matrix J^TJ, resulting in the Levenberg-Marquardt algorithm

$\mathbf{(J^{T}J + \lambda\, diag(J^{T}J))\boldsymbol \delta = J^{T} [y - f(\boldsymbol \beta)]}\!$ .

A similar damping factor appears in Tikhonov regularization, which is used to solve linear ill-posed problems, as well as in ridge regression, an estimation technique in statistics.

[edit] Choice of damping parameter

Various more or less heuristic arguments have been put forward for the best choice for the damping parameter λ. Theoretical arguments exist showing why some of these choices guaranteed local convergence of the algorithm; however these choices can make the global convergence of the algorithm suffer from the undesirable properties of steepest-descent, in particular very slow convergence close to the optimum.

The absolute values of any choice depends on how well-scaled the initial problem is. Marquardt recommended starting with a value λ₀ and a factor ν>1. Initially setting λ=λ₀ and computing the residual sum of squares S(β) after one step from the starting point with the damping factor of λ=λ₀ and secondly with λ/ν. If both of these are worse than the initial point then the damping is increased by successive multiplication by ν until a better point is found with a new damping factor of λν^k for some k.

If use of the damping factor λ/ν results in a reduction in squared residual then this is taken as the new value of λ (and the new optimum location is taken as that obtained with this damping factor) and the process continues; if using λ/ν resulted in a worse residual, but using λ resulted in a better residual then λ is left unchanged and the new optimum is taken as the value obtained with λ as damping factor.

[edit] Example

In this example we try to fit the function $y = a cos(b X) + b sin(a X)$ using the Levenberg–Marquardt algorithm implemented in GNU Octave as the leasqr function. The 3 graphs Fig 1,2,3 show progressively better fitting for the parameters a=100, b=102 used in the initial curve. Only when the parameters in Fig 3 are chosen closest to the original,are the curves fitting exactly. This equation is an example of very sensitive initial conditions for the Levenberg-Marquardt algorithm. One reason for this sensitivity is the existence of multiple minima - the function $cos(β x)$ has minima at parameter value $\hat \beta$ and $\hat \beta +2n \pi.$

[edit] References

The algorithm was first published by Kenneth Levenberg, while working at the Frankford Army Arsenal. It was rediscovered by Donald Marquardt who worked as a statistician at DuPont and independently by Girard, Wynn and Morrison.

Kenneth Levenberg (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares". The Quarterly of Applied Mathematics 2: 164–168.
A. Girard (1958). "". Rev. Opt 37: 225, 397.
C.G. Wynne (1959). "". Proc. Phys. Soc. London 73: 777.
D.D. Morrison (1960). "". Jet Propulsion Laboratory Seminar proceedings.
Donald Marquardt (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". SIAM Journal on Applied Mathematics 11: 431–441.
Philip E. Gill and Walter Murray (1978). "Algorithms for the solution of the nonlinear least-squares problem". SIAM Journal on Numerical Analysis 15 (5): 977–992. doi:10.1137/0715063.

[edit] External links

Detailed description of the algorithm can be found in Numerical Recipes in C, Chapter 15.5: Nonlinear models
C. T. Kelley, Iterative Methods for Optimization, SIAM Frontiers in Applied Mathematics, no 18, 1999, ISBN: 0-89871-433-8. Online copy
History of the algorithm in SIAM news
A tutorial by Ananth Ranganathan

Available implementations:

M. Lourakis' levmar is a GPL C/C++ programming language implementation. Matlab, Perl (PDL) and python interfaces to levmar are also available; see PDL::Fit::Levmar and pylevmar.
MINPACK has a FORTRAN implementation lmdif.
A C programming language implementation can be found at sourceforge::lmfit.
Open source, multidimensional Java programming language implementations are offered by J.P. Lewis and J. Holopainen.
gnuplot uses its own implementation gnuplot.info.
ALGLIB has implementations in C# / C++ / Delphi / Visual Basic / etc.
The MINPACK routines are wrapped in the Python library scipy as scipy.optimize.leastsq.
The GNU Scientific Library library has a C implementation.
Fityk has a C++ implementation.
An IDL implementation called MPFIT (based on MINPACK) is offered by Craig Markwardt
FitOO is an old proof of concept using the algorithm in an OpenOffice.org macro
Python Equations Implementation in Python. BSD license.

Categories: Optimization algorithms

See also ebooksgratis.com: no banners, no cookies, totally FREE.

Levenberg–Marquardt algorithm

From Wikipedia, the free encyclopedia

Contents

[edit] The problem

[edit] The solution

[edit] Choice of damping parameter

[edit] Example

[edit] References

[edit] External links

Views

Navigation

Interaction

Search

Languages