Skip to content

Cheatsheet: Convexity, Smoothness & Strong Convexity Equivalences


1 Convexity Equivalences

For a twice-differentiable f:DR on convex DRn, the following are equivalent:

  1. Basic Convexity: f(θx+(1θ)y)θf(x)+(1θ)f(y), θ[0,1].
  2. First-Order Condition (FOC): f(y)f(x)+f(x)(yx), x,yD.
  3. Gradient Monotonicity (GM): f(x)f(y),xy0, x,yD.
  4. Second-Order Condition (SOC): 2f(x)0 (Hessian PSD).

Convexity FOC

() Assume f convex. For θ(0,1]:

f(x+θ(yx))(1θ)f(x)+θf(y)f(x+θ(yx))f(x)θf(y)f(x)

Take θ0: f(x)(yx)f(y)f(x).

() Assume FOC holds. Let z=θx+(1θ)y:

f(x)f(z)+f(z)(xz)f(y)f(z)+f(z)(yz)

Multiply by θ and (1θ) respectively and add:

θf(x)+(1θ)f(y)f(z)+f(z)(θx+(1θ)yz=0)=f(z)

FOC GM

() Write FOC twice swapping x,y and add:

f(y)f(x)+f(x)(yx)f(x)f(y)+f(y)(xy)0f(x)(yx)+f(y)(xy)f(x)f(y),xy0

() By FTC along x+t(yx):

f(y)f(x)=01f(x+t(yx))(yx)dt

Subtract the tangent at x:

f(y)f(x)f(x)(yx)=01(f(xt)f(x))(yx)dt

By GM, integrand is 0. Thus f(y)f(x)+f(x)(yx).

GM SOC

() For y=x+ϵd, GM implies (f(x+ϵd)f(x))dϵ0. Take ϵ0: d2f(x)d0.

() By FTC:

f(y)f(x)=012f(x+t(yx))(yx)dt(f(y)f(x))(yx)=01(yx)2f(xt)(yx)dt0

2 L-Smoothness Equivalences

For a twice-differentiable f, the following are equivalent definitions of L-smoothness:

  1. Lipschitz Continuous Gradient: f(x)f(y)Lxy, x,yD.
  2. Alternative (Convexity): g(x)=L2x2f(x) is convex.
  3. Descent Lemma: f(y)f(x)+f(x)(yx)+L2yx2.
  4. Hessian Bound: 2f(x)LI, xD.

Lipschitz Gradient Hessian Bound

() Let v be a unit vector. By definition of Hessian:

2f(x)v=limh0f(x+hv)f(x)h

Taking norms and using Lipschitz:

2f(x)vlimh0Lhvh=L2f(x)LI

() By FTC on gradient:

f(y)f(x)=012f(x+t(yx))(yx)dt012f(xt)2yxdtLyx

Hessian Bound Alternative Definition

g(x)=L2x2f(x). Compute Hessian:

2g(x)=LI2f(x)

g is convex 2g(x)02f(x)LI.

Alternative Descent Lemma

If g is convex, by standard FOC: g(y)g(x)+g(x)(yx). Substitute g(x)=L2x2f(x), g(x)=Lxf(x):

L2y2f(y)L2x2f(x)+(Lxf(x))(yx)

Rearrange:

f(y)f(x)+f(x)(yx)+L2(y2x22x(yx))=f(x)+f(x)(yx)+L2yx2

Direct: Lipschitz Gradient Descent Lemma

By FTC:

f(y)f(x)f(x)(yx)=01(f(xt)f(x))(yx)dt

Apply Cauchy-Schwarz and Lipschitz:

01f(xt)f(x)yxdt01Ltyxyxdt=L2yx2

3 Gradient Descent Convergence

Gradient descent update: xt+1=xtηf(xt).

3.1 Smooth Functions (L-smooth, step size η=1/L)

Descent Lemma:

f(xt+1)f(xt)12Lf(xt)2

Convergence to stationary point (O(1/T)):

min0tT1f(xt)22L(f(x0)f(x))T

3.2 Smooth & Convex Functions (η=1/L)

Convergence rate (O(1/T)):

f(xT)f(x)L2Tx0x2

Derivation sketch: From convexity FOC, f(xt)(xtx)f(xt)f(x). Combined with distance contraction:

xt+1x2xtx22L(f(xt+1)f(x))

Telescoping sum yields the bound.

3.3 Smooth & Strongly Convex Functions (η=1/L, m>0)

Linear convergence:

f(xT)f(x)(1mL)T[f(x0)f(x)]

Key inequalities:

  • Descent Lemma: f(xt+1)f(xt)12Lf(xt)2
  • Gradient Lower Bound (from m-strong convexity): f(x)22m(f(x)f(x))
  • Substitute: f(xt+1)f(x)(f(xt)f(x))mL(f(xt)f(x))
  • Condition number κ=L/m; smaller κ gives faster convergence.

4 Strong Convexity Equivalences

For a twice-differentiable f:DR on convex DRn, the following are equivalent (m>0):

  1. Alternative Definition: h(x)=f(x)m2x2 is convex.
  2. Strong FOC: f(y)f(x)+f(x)(yx)+m2yx2.
  3. Strong GM: f(x)f(y),xymxy2.
  4. Strong SOC: 2f(x)mI, xD.

Alternative Strong SOC

Define h(x)=f(x)m2x2. Then:

h(x)=f(x)mx2h(x)=2f(x)mI

h convex 2h(x)02f(x)mI.

Alternative Strong FOC

() h convex standard FOC for h:

h(y)h(x)+h(x)(yx)

Substitute h(x)=f(x)m2x2, h(x)=f(x)mx:

f(y)m2y2f(x)m2x2+(f(x)mx)(yx)

Rearrange quadratic terms:

m2y2m2x2mx(yx)=m2yx2f(y)f(x)+f(x)(yx)+m2yx2

() Assume strong FOC holds. Compute:

h(y)h(x)h(x)(yx)=[f(y)f(x)f(x)(yx)]m2yx2

By strong FOC, the bracket m2yx2. Thus:

h(y)h(x)h(x)(yx)0h convex

Strong FOC Strong GM

() Write strong FOC twice:

f(y)f(x)+f(x)(yx)+m2yx2f(x)f(y)+f(y)(xy)+m2xy2

Add and cancel f(x)+f(y):

0f(x)(yx)+f(y)(xy)+mxy2f(x)f(y),xymxy2

() Fix x,y, define xt=x+t(yx). By FTC:

f(y)f(x)=01f(xt)(yx)dt

Add/subtract f(x):

=f(x)(yx)+01(f(xt)f(x))(yx)dt

Apply strong GM with u=xt, v=x:

(f(xt)f(x))(t(yx))mt(yx)2=mt2yx2

Divide by t, integrate:

01(f(xt)f(x))(yx)dt01mtyx2dt=m2yx2f(y)f(x)+f(x)(yx)+m2yx2