Skip to content

Chapter 5: Duality

Duality is a central concept in optimization theory. It associates with every optimization problem (the primal) a companion dual problem whose optimal value provides a lower bound on the primal optimal value. Under convexity and suitable regularity conditions, the two optimal values coincide (strong duality), giving rise to powerful optimality conditions known as the KKT conditions.


5.1 Motivation: From Constrained to Unconstrained

Consider the single-inequality problem

p=minxf(x)s.t.g(x)0.

Constrained problems are difficult to optimize directly. The key idea is to absorb the constraint into the objective using an indicator function:

Ig0(x)={0,g(x)0,+,g(x)>0.

Then the problem becomes unconstrained:

p=minx(f(x)+Ig0(x)).

5.1.1 Linearizing the Indicator

Key Claim

Ig0(x)=maxλ0λg(x).

Explanation: If g(x)0, then λg(x)0 for all λ0, so the supremum is 0. If g(x)>0, then λg(x)+ as λ, so the supremum is +.

Thus the indicator function can be expressed as a linear supremum.

5.1.2 Min–Max Reformulation

Substituting, we obtain

f(x)+Ig0(x)=f(x)+maxλ0λg(x)=maxλ0(f(x)+λg(x)).

Hence the original problem becomes an unconstrained min–max problem:

p=minxmaxλ0(f(x)+λg(x)).

Game interpretation: Player A (the minimizer) chooses x; Player B (the maximizer) chooses λ0. The quantity being optimized is

L(x,λ)=f(x)+λg(x).

The order of play matters—this motivates the fundamental inequality of duality.


5.2 The Weak Max–Min Inequality

Theorem (Weak Max–Min Inequality)

For any function φ(x,y),

maxyminxφ(x,y)minxmaxyφ(x,y).

Proof sketch: For any fixed y0 and x0,

minxφ(x,y0)φ(x0,y0)maxyφ(x0,y).

Taking maxy0 on the left and then minx0 on the right establishes the inequality.

Applying this inequality to the Lagrangian L(x,λ)=f(x)+λg(x) yields:

maxλ0minxL(x,λ)minxmaxλ0L(x,λ)=p.

5.3 Definitions: Lagrangian, Dual Function, Dual Problem

For the single-inequality problem, we define:

Definition (Lagrangian, Single-Inequality Case)

L(x,λ)=f(x)+λg(x),λ0.

λ is called the Lagrange multiplier (or dual variable).

Definition (Dual Function)

q(λ)=minxL(x,λ).

Definition (Dual Problem)

d=maxλ0q(λ).

Definition (Weak Duality)

dp.

The difference pd is called the duality gap (always 0).

Definition (Strong Duality)

p=d. This holds for convex problems under appropriate regularity conditions (e.g., Slater's condition).

Example: Unit Circle Constraint

Consider

minx1,x2x1+x2s.t.x12+x221.
  • Lagrangian: L(x,λ)=x1+x2+λ(x12+x221),λ0.

  • Dual function: For λ>0,

    g(λ)=minxL(x,λ)=12λλ,(x1=x2=12λ).
  • Dual problem:

    maxλ0g(λ)=maxλ0(12λλ)=2,(λ=12).
  • Verification: The primal solution is x=(12,12) on the unit circle, giving p=2=d. Strong duality holds.


5.4 The General Optimization Problem

Extend the idea to the standard form:

minxf(x)s.t.gi(x)0,i=1,,m,hj(x)=0,j=1,,p.
  • Inequality constraints require λi0 (nonnegative multipliers).
  • Equality constraints require no sign restriction on νjR.

Definition (Lagrangian, General Case)

Let λR+m and νRp. The Lagrangian is

L(x,λ,ν)=f(x)+i=1mλigi(x)+j=1pνjhj(x)=f(x)+λg(x)+νh(x).

Definition (Lagrange Dual Function)

g(λ,ν)=infxDL(x,λ,ν)=infxD(f(x)+i=1mλigi(x)+j=1pνjhj(x)),

where D is the domain of the primal problem.

Properties:

  • g(λ,ν) is a concave function of (λ,ν).
  • g can be for some (λ,ν), which defines its effective domain.
  • Lower bound property: If λ0, then g(λ,ν)p.

Proof of lower bound property: If x~ is feasible and λ0, then

f(x~)L(x~,λ,ν)infxDL(x,λ,ν)=g(λ,ν).

Minimizing over all feasible x~ gives pg(λ,ν).

Definition (Lagrange Dual Problem)

maxλ,νg(λ,ν)s.t.λ0.
  • Always a convex optimization problem (maximization of a concave function), even when the primal is nonconvex.
  • Optimal value denoted d.
  • (λ,ν) is dual feasible if λ0 and (λ,ν)domg.
  • d= if the problem is infeasible; d=+ if unbounded above.

Why the Dual Problem is Always Convex

g(λ,ν)=minx(f(x)+i=1mλigi(x)+j=1pνjhj(x))=maxx(f(x)i=1mλigi(x)j=1pνjhj(x)).

This is the (negative) pointwise supremum of affine functions in (λ,ν), hence concave. The constraint λ0 is convex. Thus the dual is a concave maximization—i.e., a convex optimization problem.


5.5 Weak and Strong Duality

Theorem (Weak Duality)

dp.

This holds always—for convex and nonconvex problems alike. It can be used to find nontrivial lower bounds for difficult problems.

For example, solving the SDP dual of the two-way partitioning problem provides a lower bound on its (NP-hard) primal.

Theorem (Strong Duality)

d=p does not hold in general, but (usually) holds for convex problems. Sufficient conditions that guarantee strong duality in convex problems are called constraint qualifications.


5.6 Slater's Constraint Qualification

Consider a convex problem:

minxf0(x)s.t.fi(x)0,i=1,,m,Ax=b.

Definition (Slater's Condition)

The problem is strictly feasible, i.e., there exists xintD such that

fi(x)<0,i=1,,m,Ax=b.

Theorem (Slater's Theorem)

If the primal is convex and Slater's condition holds, then:

  1. Strong duality holds: p=d.
  2. If p>, the dual optimum is attained: there exist dual optimal λ,ν.

Refinements:

  • intD can be replaced with relintD (relative interior).
  • Linear inequalities do not need to hold with strict inequality.
  • Many other constraint qualifications exist.

5.7 Complementary Slackness

Assume x satisfies the primal constraints and λ0. Then

g(λ,ν)=infx~D(f0(x~)+i=1mλifi(x~)+i=1pνihi(x~))f0(x)+i=1mλifi(x)+i=1pνihi(x)f0(x).

Equality f0(x)=g(λ,ν) holds if and only if both inequalities hold with equality:

  1. First inequality: x minimizes L(x~,λ,ν) over x~D.
  2. Second inequality: λifi(x)=0 for i=1,,m.

Definition (Complementary Slackness)

λi>0fi(x)=0,fi(x)<0λi=0.

Equivalently, λifi(x)=0 for all i=1,,m.

The name reflects that at least one of λi and fi(x) is "slack" (zero) while the other may be tight.


5.8 KKT (Karush–Kuhn–Tucker) Conditions

Assume strong duality holds, x is primal optimal, and (λ,ν) is dual optimal. Then the following necessary conditions hold:

  1. Primal feasibility: fi(x)0 for i=1,,m and hi(x)=0 for i=1,,p.
  2. Dual feasibility: λ0.
  3. Complementary slackness: λifi(x)=0 for i=1,,m.
  4. Stationarity: x is a minimizer of L(,λ,ν).

Conversely, these four conditions imply optimality of x and (λ,ν), and strong duality.

If the problem is convex and the functions fi, hi are differentiable, condition 4 can be written as:

Definition (KKT Conditions for Differentiable Problems)

A point (x,λ,ν) satisfies the KKT conditions if:

  1. Primal feasibility: gi(x)0,hj(x)=0
  2. Dual feasibility: λi0
  3. Complementary slackness: λigi(x)=0
  4. Stationarity:f(x)+i=1mλigi(x)+j=1pνjhj(x)=0

Role of the KKT Conditions

Theorem (KKT Necessity and Sufficiency)

Necessity: If x is a primal optimal solution and Slater's condition holds, then there exist (λ,ν) such that (x,λ,ν) satisfies all KKT conditions.

Sufficiency: If the problem is convex and (x,λ,ν) satisfies the KKT conditions, then x is a globally optimal solution.

In summary: for a convex problem satisfying Slater's condition,

x is optimalλ,ν satisfying KKT conditions 1–4 (or 1,2,3,4’).

5.9 Dual of Standard Problem Classes

5.9.1 Linear Program (LP) — Standard Form

Primal:

minxcxs.t.Ax=b,x0.

Lagrangian: L(x,λ,ν)=cx+ν(Axb)λx=bν+(c+Aνλ)x.

Since L is affine in x, the dual function is finite only when c+Aνλ=0:

g(λ,ν)={bν,Aνλ+c=0,,otherwise.

Dual problem:

maxνbνs.t.Aν+c0.

From Slater's condition: p=d if there exists x~0 with Ax~=b. In fact, p=d except when both primal and dual are infeasible.

5.9.2 Linear Program (LP) — Inequality Form

Primal:

minxcxs.t.Axb.

Lagrangian: L(x,λ)=cx+λ(Axb)=(c+Aλ)xbλ,λ0.

Dual function: g(λ)=infx((c+Aλ)xbλ)={bλ,Aλ+c=0,,otherwise.

Dual problem:

maxλbλs.t.Aλ+c=0,λ0.

From Slater's condition: p=d if Ax~b for some x~.

5.9.3 Quadratic Program (QP)

Assume PS++n (positive definite).

Primal:

minxxPxs.t.Axb.

Lagrangian: L(x,λ)=xPx+λ(Axb),λ0.

Dual function: Minimizing over x (set gradient to zero: 2Px+Aλ=0x=12P1Aλ) gives

g(λ)=14λAP1Aλbλ.

Dual problem:

maxλ14λAP1Aλbλs.t.λ0.

From Slater's condition: p=d if Ax~b for some x~. In fact, for QP, p=d always.

5.9.4 Equality-Constrained QP

Primal:

minx12xQx+cxs.t.Ax=b,

where Q0 and A has full row rank (mn).

KKT system:

[QAA0][xν]=[cb].

Since the problem is convex, the KKT solution (x,ν) is globally optimal.

5.9.5 SDP Connection (Two-Way Partitioning)

Primal (nonconvex):

minxxWxs.t.xi2=1,i=1,,n.

The feasible set is {1,1}n (2n discrete points). The cost can be written as xWx=iWii+2i>jWijxixj, where the cost of assigning i and j to different sets is 4Wij.

Lagrangian: L(x,ν)=xWx+i=1nνi(xi21)=x(W+diag(ν))x1ν.

Dual function:

g(ν)={1ν,W+diag(ν)0,,otherwise.

Dual problem (SDP):

maxν1νs.t.W+diag(ν)0.

Lower bound: p1ν if W+diag(ν)0. For example, choosing ν=λmin(W)1 yields pnλmin(W).


5.10 Geometric Interpretation of Duality

The relationship between the primal and dual problems can be understood geometrically as a min–max exchange. The primal problem minimizes over x the pointwise maximum (over λ0) of the Lagrangian; the dual maximizes over λ0 the pointwise minimum (over x) of the Lagrangian:

maxλ0minxL(x,λ)dminxmaxλ0L(x,λ)p.

The duality gap pd0 measures the "wedge" between these two saddle-point problems. Strong duality (p=d) occurs precisely when a saddle point of L(x,λ) exists—i.e., there is a pair (x,λ) such that

L(x,λ)L(x,λ)L(x,λ)

for all x and λ0. This is the essence of the KKT conditions.

Epigraph Interpretation

Define the set

A={(u,t)Rm+1|x:fi(x)ui(i=1,,m),f0(x)t}.

The primal optimal value is p=inf{t(0,t)A}. The dual problem can be viewed as finding the best affine support (hyperplane) to A from below, parametrized by λ0:

g(λ)=inf(u,t)A(t+λu).

When A is convex (as in convex problems), supporting hyperplane theorems explain why strong duality tends to hold under mild regularity conditions.


5.11 Sensitivity Analysis

Dual variables carry economic meaning: they measure the sensitivity of the optimal value to perturbations in the constraints.

5.11.1 Single Perturbation

Consider perturbing a single inequality constraint g(x)0 to g(x)u:

p(u)=minx{f(x)g(x)u}.

Global Lower Bound

p(u)pλu,

where λ is the optimal dual variable for the original (u=0) problem, assuming strong duality holds.

  • If λ is large and u<0 (tightening the constraint): p(u) increases greatly.
  • If λ is small and u>0 (loosening the constraint): p(u) does not decrease much.

Local Sensitivity

dp(u)du=λ.

For small |u|, p(u)pλu.

Economic interpretation: λ is the shadow price of the constraint. If the market price for relaxing this resource is less than λ, it pays to buy more. If the price exceeds λ, it is advantageous to sell (violate) the constraint.

5.11.2 General Case (Multiple Perturbations)

Perturb both inequality and equality constraints:

p(u,)=minxf(x)s.t.gi(x)ui,i=1,,m,hi(x)=i,i=1,,p.

Global Lower Bound

p(u,)q(λ,ν)uλν=p(0,0)uλν.

Interpretation of dual variables:

Sign of λi, νiEffect of perturbation
Large λip increases greatly if ui<0 (tighten)
Small λip decreases little if ui>0 (loosen)
Large positive νip increases greatly if i<0
Large negative νip increases greatly if i>0
Small $\nu_i^*

Local Sensitivity (General Case)

p(u,)ui=λi,p(u,)i=νi.

5.12 Worked Examples

5.12.1 Least-Norm Solution of Linear Equations

Primal:

minxxxs.t.Ax=b.

Lagrangian: L(x,ν)=xx+ν(Axb).

Set gradient to zero: xL=2x+Aν=0x=12Aν.

Plug into L:

g(ν)=L(12Aν,ν)=14νAAνbν.

This is a concave function of ν. By the lower bound property:

p14νAAνbνfor all ν.

5.12.2 Water-Filling (Power Allocation)

Primal (reformulated):

minxi=1nln(xi+αi)s.t.x0,1x=1,

where αi>0.

Lagrangian: L(x,λ,ν)=iln(xi+αi)λx+ν(1x1).

KKT conditions: x is optimal iff there exist λRn, νR such that:

  1. x0, 1x=1
  2. λ0
  3. λixi=0, i=1,,n
  4. Stationarity: 1xi+αiλi+ν=0, i=1,,n.

Solution:

  • If ν1/αi: λi=0 and xi=1/ναi.
  • If ν1/αi: xi=0 and λi=ν1/αi.

These combine to

xi=max{0,1ναi},λi=max{0,ν1αi}.

Determine ν from 1x=1:

i=1nmax{0,1ναi}=1.

Interpretation: Think of n patches at heights αi. Flood the area with a unit amount of water. The resulting water level is 1/ν; patch i receives xi water.

5.12.3 Projection onto the 1-Norm Ball

Primal:

minx12xa22s.t.x11.

KKT conditions:

  1. x11
  2. λ0
  3. λ(1x1)=0
  4. x minimizes L(x~,λ)=12x~a22+λ(x~11)=k=1n(12(x~kak)2+λ|x~k|)λ.

The problem is separable. For λ0, the minimizer is the soft-thresholding operator:

xk={akλ,akλ,0,λakλ,ak+λ,akλ.

Hence x1=k|xk|=kmax{0,|ak|λ}.

  • If a11, solution is λ=0, x=a.
  • Otherwise, solve the piecewise-linear equation in λ:
k=1nmax{0,|ak|λ}=1.

5.13 Summary

ConceptKey Idea
LagrangianWeighted sum of objective and constraints
Dual functionInfimum of Lagrangian over primal variables; always concave
Dual problemMaximize dual function over λ0; always convex
Weak dualitydp (always holds)
Strong dualityd=p (holds for convex problems under constraint qualifications)
Slater's conditionExistence of a strictly feasible point guarantees strong duality
Complementary slacknessλigi(x)=0 — at optimum, at least one of λi, gi(x) is zero
KKT conditionsNecessary and sufficient optimality conditions for convex problems under Slater
Sensitivityp/ui=λi — dual variables as shadow prices