# Lagrange multipliers

In mathematical optimization problems, the method of

**Lagrange multipliers**, named afterJoseph Louis Lagrange , is a method for finding the extrema of a function of several variables subject to one or more constraints; it is the basic tool in nonlinear constrained optimization.Simply put, the technique is able to determine where on a particular set of points (such as a

circle ,sphere , or plane) a particular function is the smallest (or largest).More formally, Lagrange multipliers compute the

stationary point s of the constrained function. By Fermat's theorem, extrema occur either at these points, or on the boundary, or at points where the function is not differentiable.It reduces finding

stationary point s of a constrained function in "n" variables with "k" constraints to finding stationary points of an unconstrained function in "n+k" variables. The method introduces a new unknown scalar variable (called the Lagrange multiplier) for each constraint, and defines a new function (called the**Lagrangian**) in terms of the original function, the constraints, and the Lagrange multipliers.**Introduction**Consider a two-dimensional case. Suppose we have a function $f(x,y)$ we wish to maximize or minimize subject to the constraint

:$gleft(\; x,\; y\; ight)\; =\; c,$

where "c" is a constant. We can visualize contours of $f$ given by

:$f\; left(\; x,\; y\; ight)=d\_n$

for various values of $d\_n$, and the contour of $g$ given by $g\; (\; x,\; y\; )\; =\; c$.

Suppose we walk along the contour line with $g\; =\; c$. In general the contour lines of $f$ and $g$ may be distinct, so traversing the contour line for $g\; =\; c$ could intersect with or cross the contour lines of $f$. This is equivalent to saying that while moving along the contour line for $g\; =\; c$ the value of $f$ can vary. Only when the contour line for $g\; =\; c$ touches contour lines of $f$ tangentially, we do not increase or decrease the value of $f$ - that is, when the contour lines touch but do not cross.

This occurs exactly when the

tangential component of thetotal derivative vanishes: $df\_parallel\; =\; 0$, which is at the constrainedstationary points of $f$ (which include the constrained local extrema, assuming $f$ is differentiable). Computationally, this is when the gradient of $f$ is normal to the constraint(s): when$abla\; f\; =\; lambda\; abla\; g$ for some scalar $lambda$ (where $abla$ is the gradient). Note that the constant $lambda$ is required because, even though the directions of both gradient vectors are equal, the magnitudes of the gradient vectors are most likely not equal.A familiar example can be obtained from weather maps, with their

contour line s for temperature and pressure: the constrained extrema will occur where the superposed maps show touching lines (isopleths).Geometrically we translate the tangency condition to saying that the

gradient s of $f$ and $g$ are parallel vectors at the maximum, since the gradients are always normal to the contour lines. Thus we want points $(x,y)$ where $g(x,y)\; =\; c$ and :$abla\_\{x,y\}\; f\; =\; lambda\; abla\_\{x,y\}\; g$, where:$abla\_\{x,y\}\; =\; left(\; frac\{partial\}\{partial\; x\},\; frac\{partial\}\{partial\; y\}\; ight).$To incorporate these conditions into one equation, we introduce an auxiliary function:$F\; left(\; x\; ,\; y,\; lambda\; ight)\; =\; f\; left(x,\; y\; ight)\; +\; lambda\; left(g\; left(x,\; y\; ight)\; -\; c\; ight),$and solve:$abla\_\{x,y,lambda\}\; F\; left(\; x\; ,\; y,\; lambda\; ight)=0$.

**Justification**As discussed above, we are looking for stationary points of $f$ seen while travelling on the level set $g(x,y)=c$. This occurs just when the gradient of $f$ has no component tangential to the level sets of $g$. This condition is equivalent to $abla\_\{x,y\}\; f(x,y)\; =\; lambda\; abla\_\{x,y\}\; g(x,y)$ for some $lambda$. Stationary points $(x,y,lambda)$ of $F$ also satisfy $g(x,y)\; =\; c$ as can be seen by considering the derivative with respect to $lambda$.

**Caveat: extrema versus stationary points**Be aware that the solutions are the "

stationary points " of the Lagrangian $F$, and aresaddle point s: they are not necessarily "extrema" of $F$. $F$ is unbounded: given a point $(x,y)$ that doesn't lie on the constraint, letting $lambda\; o\; pm\; infty$ makes $F$ arbitrarily large or small.However, under certain stronger assumptions, as we shall see , the**strong Lagrangian principle**holds, which states that the maxima of $f$ maximize the Lagrangian globally.**A more general formulation: The weak Lagrangian principle**Denote the objective function by $f(mathbf\; x)$ and let the constraints be given by $g\_k(mathbf\; x)=0$, perhaps by moving constants to the left, as in $h\_k(mathbf\; x)-c\_k=g\_k(mathbf\; x)$. The domain of "f" should be an open set containing all points satisfying the constraints. Furthermore, $f$ and the $g\_k$ must have continuous first partial derivatives and the gradients of the $g\_k$ must not be zero on the domain.MathWorld |title=Lagrange Multiplier |urlname=LagrangeMultiplier |author=Gluss, David and Weisstein, Eric W.] Now, define the Lagrangian, $Lambda$, as

:$Lambda(mathbf\; x,\; oldsymbol\; lambda)\; =\; f\; +\; sum\_k\; lambda\_k\; g\_k.$::$k$ is an index for variables and functions associated with a particular constraint, $k$.::$oldsymbollambda$ without a subscript indicates the vector with elements $mathbf\; lambda\_k$, which are taken to be independent variables.

Observe that both the optimization criteria and constraints $g\_k(x)$ are compactly encoded as stationary points of the Lagrangian:

:$abla\_\{mathbf\; x\}\; Lambda\; =\; mathbf\{0\}$

if and only if $abla\_\{mathbf\; x\}\; f\; =\; -\; sum\_k\; lambda\_k\; abla\_\{mathbf\; x\}\; g\_k,$:$abla\_\{mathbf\; x\}$ means to take the gradient only with respect to each element in the vector $mathbf\; x$, instead of all variables.and

:$abla\_\{mathbf\; lambda\}\; Lambda\; =\; mathbf\{0\}$ implies $g\_k\; =\; 0.$

Collectively, the stationary points of the Lagrangian,

:$abla\; Lambda\; =\; mathbf\{0\}$,

give a number of unique equations totaling the length of $mathbf\; x$ plus the length of $mathbf\; lambda$.

**Interpretation of $lambda\_i$**Often the Lagrange multipliers have an interpretation as some salient quantity of interest. To see whythis might be the case, observe that:

:$frac\{partial\; Lambda\}\{partial\; \{g\_k\; =\; lambda\_k.$

So, "λ"

_{"k"}is the rate of change of the quantity being optimized as a function of the constraint variable. As examples, inLagrangian mechanics the equations of motion are derived by finding stationary points of the action, the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential, $F=-\; abla\; V$, can be interpreted as a Lagrange multiplier determining the change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In economics, the optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the value of relaxing a given constraint (e.g. through bribery or other means).The method of Lagrange multipliers is generalized by the

Karush-Kuhn-Tucker conditions .**Examples****Very simple example**Suppose you wish to maximize $f(x,y)=x+y$ subject to the constraint $x^2+y^2=1$. The constraint is the unit circle, and the

level set s of "f" are diagonal lines (with slope -1), so one can see graphically that the maximum occurs at$(sqrt\{2\}/2,sqrt\{2\}/2)$ (and the minimum occurs at $(-sqrt\{2\}/2,-sqrt\{2\}/2)$Formally, set $g(x,y)=x^2+y^2-1$, and:$Lambda(x,\; y,\; lambda)\; =\; f(x,y)\; +\; lambda\; g(x,y)\; =\; x+y\; +\; lambda\; (x^2\; +\; y^2\; -\; 1)$

Set the derivative $dLambda=0$, which yields the system of equations:

:$egin\{align\}frac\{partial\; Lambda\}\{partial\; x\}\; =\; 1\; +\; 2\; lambda\; x\; =\; 0,\; qquad\; ext\{(i)\}\; \backslash frac\{partial\; Lambda\}\{partial\; y\}\; =\; 1\; +\; 2\; lambda\; y\; =\; 0,\; qquad\; ext\{(ii)\}\; \backslash frac\{partial\; Lambda\}\{partial\; lambda\}\; =\; x^2\; +\; y^2\; -\; 1\; =\; 0,\; qquad\; ext\{(iii)\}\; end\{align\}$As always, the $partial\; lambda$ equation is the original constraint.

Combining the first two equations yields $x=y$ (explicitly, $x\; eq\; 0$ ,otherwise (i) yields 1 = 0), so one can solve for $lambda$, yielding $lambda=-1/(2x)$, which one can substitute into (ii)).

Substituting into (iii) yields $2x^2=1$, so $x=pm\; sqrt\{2\}/2$ and the stationary points are $(sqrt\{2\}/2,sqrt\{2\}/2)$ and $(-sqrt\{2\}/2,-sqrt\{2\}/2)$. Evaluating the objective function "f" on these yields

:$f(sqrt\{2\}/2,sqrt\{2\}/2)=sqrt\{2\}mbox\{\; and\; \}\; f(-sqrt\{2\}/2,\; -sqrt\{2\}/2)=-sqrt\{2\},$

thus the maximum is $sqrt\{2\}$, which is attained at $(sqrt\{2\}/2,sqrt\{2\}/2)$ and the minimum is $-sqrt\{2\}$, which is attained at $(-sqrt\{2\}/2,-sqrt\{2\}/2)$.

**Simple example**Suppose you want to find the maximum values for

:$f(x,\; y)\; =\; x^2\; y\; ,$

with the condition that the "x" and "y" coordinates lie on the circle around the origin with radius √3, that is,

:$x^2\; +\; y^2\; =\; 3.\; ,$

As there is just a single condition, we will use only one multiplier, say λ.

Use the constraint to define a function "g"("x", "y"):

:$g\; (x,\; y)\; =\; x^2\; +y^2\; -3.\; ,$

The function "g" is identically zero on the circle of radius √3. So any multiple of "g"("x", "y") may be added to "f"("x", "y") leaving "f"("x", "y") unchanged in the region of interest (above the circle where our original constraint is satisfied). Let

:$Lambda(x,\; y,\; lambda)\; =\; f(x,y)\; +\; lambda\; g(x,\; y)\; =\; x^2y\; +\; lambda\; (x^2\; +\; y^2\; -\; 3).\; ,$

The critical values of $Lambda$ occur when its gradient is zero. The partial derivatives are

:$egin\{align\}frac\{partial\; Lambda\}\{partial\; x\}\; =\; 2\; x\; y\; +\; 2\; lambda\; x\; =\; 0,\; qquad\; ext\{(i)\}\; \backslash frac\{partial\; Lambda\}\{partial\; y\}\; =\; x^2\; +\; 2\; lambda\; y\; =\; 0,\; qquad\; ext\{(ii)\}\; \backslash frac\{partial\; Lambda\}\{partial\; lambda\}\; =\; x^2\; +\; y^2\; -\; 3\; =\; 0.\; qquad\; ext\{(iii)\}end\{align\}$

Equation (iii) is just the original constraint. Equation (i) implies $x\; =\; 0$ "or" λ = −"y". In the first case, if $x=0$ then we must have $y\; =\; pm\; sqrt\{3\}$ by (iii) and then by (ii) λ=0. In the second case, if λ = −"y" and substituting into equation (ii) we have that,

:$x^2\; -\; 2y^2\; =\; 0.\; ,$

Then "x"

^{2}= 2"y"^{2}. Substituting into equation (iii) and solving for "y" gives this value of "y"::$y\; =\; pm\; 1.\; ,$

Thus there are six critical points:

:$(sqrt\{2\},1);\; quad\; (-sqrt\{2\},1);\; quad\; (sqrt\{2\},-1);\; quad\; (-sqrt\{2\},-1);\; quad\; (0,sqrt\{3\});\; quad\; (0,-sqrt\{3\}).$

Evaluating the objective at these points, we find

:$f(pmsqrt\{2\},1)\; =\; 2;\; quad\; f(pmsqrt\{2\},-1)\; =\; -2;\; quad\; f(0,pm\; sqrt\{3\})=0.$

Therefore, the objective function attains a

global maximum (with respect to the constraints) at $(pmsqrt\{2\},1)$ and aglobal minimum at $(pmsqrt\{2\},-1).$ The point $(0,sqrt\{3\})$ is alocal minimum and $(0,-sqrt\{3\})$ is alocal maximum .**Example: entropy**Suppose we wish to find the discrete

probability distribution with maximalinformation entropy . Then:$f(p\_1,p\_2,ldots,p\_n)\; =\; -sum\_\{k=1\}^n\; p\_klog\_2\; p\_k.$

Of course, the sum of these probabilities equals 1, so our constraint is "g"(

**p**) = 1 with:$g(p\_1,p\_2,ldots,p\_n)=sum\_\{k=1\}^n\; p\_k.$

We can use Lagrange multipliers to find the point of maximum entropy (depending on the probabilities). For all "k" from 1 to "n", we require that

:$frac\{partial\}\{partial\; p\_k\}(f+lambda\; (g-1))=0,$

which gives

:$frac\{partial\}\{partial\; p\_k\}left(-sum\_\{k=1\}^n\; p\_k\; log\_2\; p\_k\; +\; lambda\; (sum\_\{k=1\}^n\; p\_k\; -\; 1)\; ight)\; =\; 0.$

Carrying out the differentiation of these "n" equations, we get

:$-left(frac\{1\}\{ln\; 2\}+log\_2\; p\_k\; ight)\; +\; lambda\; =\; 0.$

This shows that all "p"

_{"i"}are equal (because they depend on λ only).By using the constraint ∑_{"k"}"p"_{"k"}= 1, we find:$p\_k\; =\; frac\{1\}\{n\}.$

Hence, the uniform distribution is the distribution with the greatest entropy.

**Economics**Constrained optimization plays a central role in

economics . For example, the choice problem for a consumer is represented as one of maximizing autility function subject to abudget constraint . The Lagrange multiplier has an economic interpretation as theshadow price associated with the constraint, in this case themarginal utility ofincome .**The strong Lagrangian principle: Lagrange duality**Given a

convex optimization problem in standard form:$egin\{align\}\; ext\{minimize\; \}\; f\_0(x)\; \backslash \; ext\{subject\; to\; \}\; f\_i(x)\; leq\; 0,\; i\; in\; left\; \{1,dots,m\; ight\; \}\; \backslash \; h\_i(x)\; =\; 0,\; i\; in\; left\; \{1,dots,p\; ight\; \}end\{align\}$

with the domain $mathcal\{D\}\; subset\; mathbb\{R\}^n$ having non-empty interior, the

**Lagrangian function**$L:\; mathbb\{R\}^n\; imes\; mathbb\{R\}^m\; imes\; mathbb\{R\}^p\; o\; mathbb\{R\}$ is defined as: $L(x,lambda,\; u)\; =\; f\_0(x)\; +\; sum\_\{i=1\}^m\; lambda\_i\; f\_i(x)\; +\; sum\_\{i=1\}^p\; u\_i\; h\_i(x).$

The vectors $lambda$ and $u$ are called the "dual variables" or "Lagrange multiplier vectors" associated with the problem. The

**Lagrange dual function**$g:mathbb\{R\}^m\; imes\; mathbb\{R\}^p\; o\; mathbb\{R\}$ is defined as: $g(lambda,\; u)\; =\; inf\_\{xinmathcal\{D\; L(x,lambda,\; u)\; =\; inf\_\{xinmathcal\{D\; left\; (\; f\_0(x)\; +\; sum\_\{i=1\}^m\; lambda\_i\; f\_i(x)\; +\; sum\_\{i=1\}^p\; u\_i\; h\_i(x)\; ight\; ).$

The dual function $g$ is concave, even when the initial problem is not convex. The dual function yields lower bounds on the optimal value $p^*$ of the initial problem; for any $lambda\; geq\; 0$ and any $u$ we have $g(lambda,\; u)\; leq\; p^*$. If a

constraint qualification such asSlater's condition holds and the original problem is convex, then we have strong duality, i.e. $d^*\; =\; max\_\{lambda\; ge\; 0,\; u\}\; g(lambda,\; u)\; =\; inf\; f\_0\; =\; p^*$.**ee also***

Karush-Kuhn-Tucker conditions : generalization of the method of Lagrange multipliers.

*Lagrange multipliers on Banach spaces : another generalization of the method of Lagrange multipliers.**References****External links**For references to Lagrange's original work and for an account of the terminology see the Lagrange Multiplier entry in

* [*http://members.aol.com/jeff570/l.html Earliest known uses of some of the words of mathematics: L*]Exposition

* [*http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html Conceptual introduction*] (plus a brief discussion of Lagrange multipliers in thecalculus of variations as used in physics)

* [*http://www.cs.berkeley.edu/~klein/papers/lagrange-multipliers.pdf Lagrange Multipliers without Permanent Scarring*] (tutorial by Dan Klein)For additional text and interactive applets

* [*http://www.umiacs.umd.edu/~resnik/ling848_fa2004/lagrange.html Simple explanation with an example of governments using taxes as Lagrange multipliers*]

* [*http://www-math.mit.edu/18.02/applets/LagrangeMultipliersTwoVariables.html Applet*]

* [*http://www.math.gatech.edu/~carlen/2507/notes/lagMultipliers.html Tutorial and applet*]

* [*http://midnighttutor.com/Lagrange_multiplier.html Good Video Lecture of Lagrange Multipliers*]

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Lagrange multipliers on Banach spaces**— In the field of calculus of variations in mathematics, the method of Lagrange multipliers on Banach spaces can be used to solve certain infinite dimensional constrained optimization problems. The method is a generalization of the classical method … Wikipedia**Constrained optimization and Lagrange multipliers**— This tutorial presents an introduction to optimization problems that involve finding a maximum or a minimum value of an objective function f(x 1,x 2,ldots, x n) subject to a constraint of the form g(x 1,x 2,ldots, x n)=k.Maximum and… … Wikipedia**Lagrange multiplier**— Figure 1: Find x and y to maximize f(x,y) subject to a constraint (shown in red) g(x,y) = c … Wikipedia**Joseph Louis Lagrange**— Lagrange redirects here. For other uses, see Lagrange (disambiguation). Joseph Louis Lagrange Joseph Louis (Giuseppe Lodovico), comte de Lagrange … Wikipedia**Multiplicateur De Lagrange**— Pour les articles homonymes, voir Théorème de Lagrange. La méthode des multiplicateurs de Lagrange permet de trouver un optimum, sur la figure le point le … Wikipédia en Français**Multiplicateur de Lagrange**— Pour les articles homonymes, voir Théorème de Lagrange. La méthode des multiplicateurs de Lagrange permet de trouver un optimum, sur la figure le point le plus élevé possible, tout en satisfaisant une contrainte, sur la figure un … Wikipédia en Français**Constraint algorithm**— In mechanics, a constraint algorithm is a method for satisfying constraints for bodies that obey Newton s equations of motion. There are three basic approaches to satisfying such constraints: choosing novel unconstrained coordinates ( internal… … Wikipedia**Multibody system**— A multibody system is used to model the dynamic behavior of interconnected rigid or flexible bodies, each of which may undergo large translational and rotational displacements. Contents 1 Introduction 2 Applications 3 Example 4 Concept … Wikipedia**Dualité dans les programmes d'optimisation**— Multiplicateur de Lagrange Pour les articles homonymes, voir Théorème de Lagrange. La méthode des multiplicateurs de Lagrange permet de trouver un optimum, sur la figure le point le … Wikipédia en Français**Fermi–Dirac statistics**— Statistical mechanics Thermodynamics · … Wikipedia