info prev up next book cdrom email home

Nonlinear Least Squares Fitting

Given a function $f(x)$ of a variable $x$ tabulated at $m$ values $y_1=f(x_1)$, ..., $y_m=f(x_m)$, assume the function is of known analytic form depending on $n$ parameters $f(x; \lambda_1, \ldots, \lambda_n)$, and consider the overdetermined set of $m$ equations

$\displaystyle y_1$ $\textstyle =$ $\displaystyle f(x_1; \lambda_1, \lambda_2, \ldots, \lambda_n)$ (1)
$\displaystyle y_m$ $\textstyle =$ $\displaystyle f(x_m; \lambda_1, \lambda_2, \ldots, \lambda_n).$ (2)

We desire to solve these equations to obtain the values $\lambda_1$, ..., $\lambda_n$ which best satisfy this system of equations. Pick an initial guess for the $\lambda_i$ and then define
\begin{displaymath}
d\beta_i=y_i-f(x_i; \lambda_1, \ldots, \lambda_n).
\end{displaymath} (3)

Now obtain a linearized estimate for the changes $d\lambda_i$ needed to reduce $d\beta_i$ to 0,
\begin{displaymath}
d\beta_i=\sum_{j=1}^n \left.{{\partial f\over \partial \lambda_j} \,d\lambda_j}\right\vert _{x_j,\boldsymbol{\lambda}}
\end{displaymath} (4)

for $i=1$, ..., $n$. This can be written in component form as
\begin{displaymath}
d\beta_i=A_{ij}\,d\lambda_i,
\end{displaymath} (5)

where ${\hbox{\sf A}}$ is the $m\times n$ Matrix
\begin{displaymath}
A_{ij}=\left[{\matrix{
\left.{\partial f\over\partial\lambda...
...}\right\vert _{x_m,\boldsymbol{\lambda}} & \cdots\cr}}\right].
\end{displaymath} (6)

In more concise Matrix form,
\begin{displaymath}
d\boldsymbol{\beta}= {\hbox{\sf A}}\,d\boldsymbol{\lambda},
\end{displaymath} (7)

where $d\boldsymbol{\beta}$ and $d\boldsymbol{\lambda}$ are $m$-Vectors. Applying the Matrix Transpose of ${\hbox{\sf A}}$ to both sides gives
\begin{displaymath}
{\hbox{\sf A}}^{\rm T}\,d\boldsymbol{\beta}=({\hbox{\sf A}}^{\rm T}{\hbox{\sf A}})\,d\boldsymbol{\lambda}.
\end{displaymath} (8)

Defining
$\displaystyle {\hbox{\sf a}}$ $\textstyle \equiv$ $\displaystyle {\hbox{\sf A}}^{\rm T}{\hbox{\sf A}}$ (9)
$\displaystyle {\bf b}$ $\textstyle \equiv$ $\displaystyle {\hbox{\sf A}}^{\rm T}\,d\boldsymbol{\beta}$ (10)

in terms of the known quantities ${\hbox{\sf A}}$ and $d\boldsymbol{\beta}$ then gives the Matrix Equation
\begin{displaymath}
{\hbox{\sf a}}d\boldsymbol{\lambda}={\bf b},
\end{displaymath} (11)

which can be solved for $d\boldsymbol{\lambda}$ using standard matrix techniques such as Gaussian Elimination. This offset is then applied to $\boldsymbol{\lambda}$ and a new $d\beta$ is calculated. By iteratively applying this procedure until the elements of $d\boldsymbol{\lambda}$ become smaller than some prescribed limit, a solution is obtained. Note that the procedure may not converge very well for some functions and also that convergence is often greatly improved by picking initial values close to the best-fit value. The sum of square residuals is given by $R^2=d\boldsymbol{\beta}\cdot d\boldsymbol{\beta}$ after the final iteration.


\begin{figure}\begin{center}\BoxedEPSF{NonlinearLeastSquares.epsf}\end{center}\end{figure}

An example of a nonlinear least squares fit to a noisy Gaussian Function

\begin{displaymath}
f(A, x_0, \sigma; x)=A e^{-(x-x_0)^2/(2\sigma^2)}
\end{displaymath} (12)

is shown above, where the thin solid curve is the initial guess, the dotted curves are intermediate iterations, and the heavy solid curve is the fit to which the solution converges. The actual parameters are $(A, x_0, \sigma)= (1, 20, 5)$, the initial guess was (0.8, 15, 4), and the converged values are (1.03105, 20.1369, 4.86022), with $R^2=0.148461$. The Partial Derivatives used to construct the matrix ${\hbox{\sf A}}$ are
$\displaystyle {\partial f\over\partial A}$ $\textstyle =$ $\displaystyle e^{-(x-x_0)^2/(2\sigma^2)}$ (13)
$\displaystyle {\partial f\over\partial x_0}$ $\textstyle =$ $\displaystyle {A(x-x_0)\over\sigma^2}e^{-(x-x_0)^2/(2\sigma^2)}$ (14)
$\displaystyle {\partial f\over\partial\sigma}$ $\textstyle =$ $\displaystyle {A(x-x_0)^2\over\sigma^3}e^{-(x-x_0)^2/(2\sigma^2)}.$ (15)

The technique could obviously be generalized to multiple Gaussians, to include slopes, etc., although the convergence properties generally worsen as the number of free parameters is increased.


An analogous technique can be used to solve an overdetermined set of equations. This problem might, for example, arise when solving for the best-fit Euler Angles corresponding to a noisy Rotation Matrix, in which case there are three unknown angles, but nine correlated matrix elements. In such a case, write the $n$ different functions as $f_i(\lambda_1, \ldots, \lambda_n)$ for $i=1$, ..., $n$, call their actual values $y_i$, and define

\begin{displaymath}
{\hbox{\sf A}}=\left[{\matrix{
\left.{\partial f_1\over\par...
...al\lambda_n}\right\vert _{\boldsymbol{\lambda}_i}\cr}}\right],
\end{displaymath} (16)

and
\begin{displaymath}
d\boldsymbol{\beta}={\bf y}-f_i(\lambda_1, \ldots, \lambda_n),
\end{displaymath} (17)

where $\boldsymbol{\lambda}_i$ are the numerical values obtained after the $i$th iteration. Again, set up the equations as
\begin{displaymath}
{\hbox{\sf A}}\,d\boldsymbol{\lambda}=d\boldsymbol{\beta},
\end{displaymath} (18)

and proceed exactly as before.

See also Least Squares Fitting, Linear Regression, Moore-Penrose Generalized Matrix Inverse



info prev up next book cdrom email home

© 1996-9 Eric W. Weisstein
1999-05-25