# Stochastic approximation

Stochastic approximation methods are a family of iterative stochastic optimization algorithms that attempt to find zeroes or extrema of functions which cannot be computed directly, but only estimated via noisy observations. The first, and prototypical, algorithms of this kind were the Robbins-Monro and Kiefer-Wolfowitz algorithms. __NOTOC__

Robbins-Monro algorithm

In the Robbins-Monro algorithm, introduced in 1951A Stochastic Approximation Method, Herbert Robbins and Sutton Monro, "Annals of Mathematical Statistics" 22, #3 (September 1951), pp. 400&ndash;407.] , one has a function $M\left(x\right)$ for which one wishes to find the value of $x$, $x_0$, satisfying $M\left(x_0\right)=alpha$. However, what is observable is not $M\left(x\right)$, but rather a random variable $N\left(x\right)$ such that $E\left(N\left(x\right)|x\right)=M\left(x\right)$. The algorithm is then to construct a sequence $x_1, x_2, dots$ which satisfies::$x_\left\{n+1\right\}=x_n+a_n\left(alpha-N\left(x_n\right)\right)$.Here, $a_1, a_2, dots$ is a sequence of positive step sizes. Robbins and Monro proved that, if $N\left(x\right)$ is uniformly bounded, $M\left(x\right)$ is nondecreasing, $M\text{'}\left(x_0\right)$ exists and is positive, and if $a_n$ satisfies a set of bounds (fulfilled if one takes $a_n=1/n$), then $x_n$ converges in $L^2$ (and hence also in probability) to $x_0$., Theorem 2. In general, the $a_n\text{'}s$ need not equal $1/n$. However, to ensure convergence, they should converge to zero, and in order to average out the noise in $N\left(x\right)$, they should converge slowly.

Kiefer-Wolfowitz algorithm

In the Kiefer-Wolfowitz algorithm [Stochastic Estimation of the Maximum of a Regression Function, J. Kiefer and J. Wolfowitz, "Annals of Mathematical Statistics" 23, #3 (September 1952), pp. 462&ndash;466.] , introduced a year after the Robbins-Monro algorithm, one wishes to find the maximum, $x_0$, of the unknown $M\left(x\right)$and constructs a sequence $x_1,x_2,dots$ such that::$x_\left\{n+1\right\}=x_n+a_nfrac\left\{N\left(x_n+c_n\right)-N\left(x_n-c_n\right)\right\}\left\{c_n\right\}$.Here, $a_1, a_2, dots$ is a sequence of positive step sizes which serve the same function as in the Robbins-Monro algorithm, and $c_1, c_2, dots$ is a sequence of positive step sizes which are used to estimate, via finite differences, the derivative of $M$. Kiefer and Wolfowitz showed that, if $a_n$ and $c_n$ satisfy various bounds (fulfilled by taking $a_n=1/n$, $c_n=\left(1/n\right)^\left\{1/3\right\}$), and $M\left(x\right)$ and $N\left(x\right)$ satisfy some technical conditions, then the sequence $x_n$ converges in probability to $x_0$.

ubsequent developments

An extensive theoretical literature has grown up around these algorithms, concerning conditions for convergence, rates of convergence, multivariate and other generalizations, proper choice of step size, possible noise models, and so on."Stochastic Approximation Algorithms and Applications", Harold J. Kushner and G. George Yin, New York: Springer-Verlag, 1997. ISBN 038794916X; 2nd ed., titled "Stochastic Approximation and Recursive Algorithms and Applications", 2003, ISBN 0387008942.] , ["Stochastic Approximation and Recursive Estimation", Mikhail Borisovich Nevel'son and Rafail Zalmanovich Has'minski, translated by Israel Program for Scientific Translations and B. Silver, Providence, RI: American Mathematical Society, 1973, 1976. ISBN 0821815970.] These methods are also applied in control theory, in which case the unknown function which we wish to optimize or find the zero of may vary in time. In this case, the step size $a_n$ should not converge to zero but should be chosen so as to track the function., 2nd ed., chapter 3

ee also

* Stochastic gradient descent
* Stochastic optimization

References

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• stochastic approximation — tikimybinė aproksimacija statusas T sritis automatika atitikmenys: angl. stochastic approximation vok. stochastische Approximation rus. стохастическая аппроксимация, f pranc. approximation stochastique, f ryšiai: sinonimas – stochastinė… …   Automatikos terminų žodynas

• Stochastic optimization — (SO) methods are optimization algorithms which incorporate probabilistic (random) elements, either in the problem data (the objective function, the constraints, etc.), or in the algorithm itself (through random parameter values, random choices,… …   Wikipedia

• Stochastic gradient descent — is a general optimization algorithm, but is typically used to fit the parameters of a machine learning model.In standard (or batch ) gradient descent, the true gradient is used to update the parameters of the model. The true gradient is usually… …   Wikipedia

• approximation stochastique — tikimybinė aproksimacija statusas T sritis automatika atitikmenys: angl. stochastic approximation vok. stochastische Approximation rus. стохастическая аппроксимация, f pranc. approximation stochastique, f ryšiai: sinonimas – stochastinė… …   Automatikos terminų žodynas

• Stochastic context-free grammar — A stochastic context free grammar (SCFG; also probabilistic context free grammar, PCFG) is a context free grammar in which each production is augmented with a probability. The probability of a derivation (parse) is then the product of the… …   Wikipedia

• stochastische Approximation — tikimybinė aproksimacija statusas T sritis automatika atitikmenys: angl. stochastic approximation vok. stochastische Approximation rus. стохастическая аппроксимация, f pranc. approximation stochastique, f ryšiai: sinonimas – stochastinė… …   Automatikos terminų žodynas

• Polynomial-time approximation scheme — In computer science, a polynomial time approximation scheme (abbreviated PTAS) is a type of approximation algorithm for optimization problems (most often, NP hard optimization problems).A PTAS is an algorithm which takes an instance of an… …   Wikipedia

• Local-density approximation — The local density approximation (LDA) is an approximation of the exchange correlation (XC) energy functional in density functional theory (DFT) by taking the XC energy of an electron in a homogeneous electron gas of a density equal to the density …   Wikipedia

• List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra …   Wikipedia

• List of mathematics articles (S) — NOTOC S S duality S matrix S plane S transform S unit S.O.S. Mathematics SA subgroup Saccheri quadrilateral Sacks spiral Sacred geometry Saddle node bifurcation Saddle point Saddle surface Sadleirian Professor of Pure Mathematics Safe prime Safe… …   Wikipedia