Chapter 3: Convex Functions
Convex functions are the fundamental objects of study in convex optimization. This chapter develops the definition of convex functions, their properties, various characterizations (first-order, second-order, and gradient monotonicity), as well as operations that preserve convexity and related geometric concepts.
3.1 Operations That Preserve Convexity of Sets
Before studying convex functions per se, it is useful to review how convexity of sets can be established through operations that preserve it. Given a set
Apply the definition directly:
Show that
is obtained from simple convex sets (hyperplanes, halfspaces, norm balls, etc.) by operations that preserve convexity: - Intersection
- Affine function
- Perspective function
- Linear-fractional function
Show that
is a sublevel set of a convex function (discussed later).
3.1.1 Intersection
Property (Intersection of Convex Sets). The intersection of any number (including infinite) of convex sets is convex: if each
is convex, then is convex.
Examples:
Affine space: An affine space defined by linear equalities can be expressed as an intersection of hyperplanes:
Each equality constraint is the intersection of two halfspaces
and . -ball: Each constraint is a halfspace, so the intersection is convex.
PSD cone: The cone of positive semidefinite matrices
is the intersection of infinitely many halfspaces.
3.1.2 Affine Mapping
Property (Affine Image). If
is convex and , , then the image under an affine map is convex.
Example (Scaling and Translation): Scaling and translating a convex set preserves convexity:
Interpretation: Linear transformations do not "bend" the set.
3.1.3 Inverse Affine Mapping
Property (Affine Preimage). If
is convex and is affine, the preimage is also convex:
Application: Linear Matrix Inequality (LMI). Consider
The set of positive semidefinite matrices
is convex by the affine preimage property.
3.2 Fundamental Concepts in Set Topology
We briefly review essential topological concepts that are foundational in convex analysis.
- A point
is an -tuple . - A set
is a collection of points. - The open ball (neighborhood) centered at
with radius :
3.2.1 Open and Closed Sets
Open Set. Every point has a neighborhood contained in the set:
Example:
.
Closed Set. Contains all its limit points; its complement is open. Example:
.
3.2.2 Interior and Relative Interior
Interior.
Points with a neighborhood contained in . Example: .
Relative Interior (relint). If
lies in an affine subspace , then is a relative interior point if for some
. The set of all such points is . Example: The line segment
in has empty interior, but its relative interior is within the line it spans.
3.2.3 Limit Points and Closure
Limit Point. Every neighborhood contains other points from the set:
Points arbitrarily close to
are in .
Closure.
is together with all its limit points; it is the smallest closed set containing .
3.2.4 Boundary
Boundary.
Every neighborhood contains points from both
and its complement.
Example:
, ,
Key Relationships:
3.3 Definition of Convex Functions
Definition (Convex Function). A function
defined on a convex set is called convex if for all
and all .
Definition (Strictly Convex).
is strictly convex if the inequality is strict whenever and :
Definition (Concave Function).
is concave if is convex, i.e.,
Geometric meaning: For a convex function
Note: Linear (affine) functions are both convex and concave, but neither strictly convex nor strictly concave.
3.4 Common Examples of Convex Functions
Univariate examples:
(affine) , , for on
Multivariate examples:
(affine) with (quadratic) (any norm) (log-sum-exp)
3.5 Convexity via Restriction to Lines
Theorem (Restriction to Lines). A function
with is convex if and only if for every with , the one-dimensional function is convex on its domain
.
Intuition: This theorem reduces multivariate convexity to one-dimensional verification.
Examples:
(quadratic in ) with
Proposition. Let
be convex and . Then is convex if and only if for every line meeting , the restriction is convex on .
Proof.
which is the same as
so
so
3.6 Epigraph and Sublevel Sets
3.6.1 Epigraph
Definition (Epigraph). For a function
, the epigraph of is the set
Geometric meaning: The epigraph is the region on or above the graph of
Key Property.
is convex is a convex set.
Proposition. Let
and . Then is convex on its domain if and only if is a convex set.
Proof.
hence
so
3.6.2 Sublevel Sets
Definition (Sublevel Set). For
and , the -sublevel set is the set of points where
takes values .
Key Property. If
is convex, then all sublevel sets are convex. Warning: The converse is not true. For example,
has convex sublevel sets but is not convex.
Examples:
: (convex intervals) : (convex intervals, but is not convex)
3.7 Jensen's Inequality
Theorem (Jensen's Inequality — Discrete Form). If
is convex and with , then
Integral Form. For convex
and a probability measure :
Expectation Form. For convex
and a random variable :
3.8 First-Order Condition for Convexity
3.8.1 Gradient and Differentiability
Precondition:
exists at each
Theorem (First-Order Condition). A differentiable function
is convex if and only if for all :
Implication: If
Example:
3.8.2 Proof of Equivalence
Direction 1: Basic convexity
Assume
Subtract
Take
Hence:
Direction 2: First-order condition
Assume for all
Let
Multiply the two inequalities by
Since
3.9 Gradient Monotonicity
Definition (Monotone Gradient). A differentiable function
has a monotone gradient if for all :
Connection to Convexity:
- If
is convex, its gradient is monotone. - Conversely, if
is monotone, is convex.
Example:
3.9.1 Proof of Equivalence between First-Order Condition and Gradient Monotonicity
Direction 1: First-order condition
Assume
Write this inequality twice, swapping
Adding the two inequalities yields:
which simplifies to
Direction 2: Gradient Monotonicity
Assume
For any
Subtract the tangent at
This gives the first-order condition:
3.10 Second-Order Condition for Convexity
Precondition:
exists at each
Theorem (Second-Order Condition). A twice-differentiable function
is convex if and only if its Hessian is positive semidefinite for all :
Example:
3.10.1 Proof of Equivalence between Gradient Monotonicity and Second-Order Condition
Direction 1: Gradient Monotonicity
Assume
Take the limit as
Direction 2: Hessian PSD
Assume
Take inner product with
3.11 Chain of Equivalences
For twice-differentiable functions, the following four statements are equivalent characterizations of convexity:
Convexity
f(θx + (1−θ)y) ≤ θf(x) + (1−θ)f(y)
⇕
First-order Condition Gradient Monotonicity
f(y) ≥ f(x) + ∇f(x)ᵀ(y−x) ⇔ ⟨∇f(x)−∇f(y), x−y⟩ ≥ 0
⇕
Second-order Condition
∇²f(x) ⪰ 0In summary:
- Convexity (Definition):
- First-order Condition:
- Gradient Monotonicity:
- Second-order Condition:
Each provides a different practical tool for verifying or exploiting convexity in analysis and algorithm design.
3.12 Smooth and Strongly Convex Functions
Beyond mere convexity, many optimization algorithms rely on stronger structural assumptions—smoothness and strong convexity—to obtain quantitative convergence guarantees. This section develops L-smoothness (Lipschitz gradient) and m-strong convexity, their equivalent characterizations, and the convergence rates of gradient descent under these conditions.
3.12.1 L-Smooth Functions
Definition (L-smooth function). A differentiable function
is called L-smooth (or has an L-Lipschitz continuous gradient) for some if for all ,
Smoothness bounds how fast the gradient can change. Unlike convexity, it does not guarantee that stationary points are global minima; it only guarantees a quadratic upper bound on the function.
Chain of Equivalences. For a twice-differentiable function
We now prove the full cycle of equivalences.
Proof: (I)
Using the Fundamental Theorem of Calculus along the line segment
Subtract the linear approximation at
Thus
Proof: (III)
Define
where the inequality uses condition (III). Hence
Proof: (II)
If
Thus
Proof: (IV)
Assume
Since for any vector
Thus condition (I) holds.
This completes the chain of equivalences for L-smoothness.
3.12.2 Gradient Descent Convergence under Smoothness
Gradient descent updates:
Under L-smoothness alone, we can guarantee that the gradient norm converges to zero.
Lemma (Descent Lemma for gradient descent). If
is L-smooth and we take , then for each iteration:
Proof. Applying the Descent Lemma (condition III) with
Rearranging gives the claimed bound.
Theorem (Convergence to stationary point). Let
be L-smooth and . After iterations of gradient descent with : Consequently,
at a rate .
Proof. Summing the Descent Lemma over
The left-hand side telescopes:
Since
which rearranges to the stated bound.
This shows gradient descent converges to a stationary point (where
3.12.3 Smooth + Convex Convergence
If
Theorem (Convergence under smoothness + convexity). Let
be L-smooth and convex, and let be a global minimizer of . After iterations of gradient descent with :
Proof. Write
By convexity (first-order condition),
Using the Descent Lemma (3.10),
Let
Summing this inequality from
Since
This is the
3.12.4 m-Strongly Convex Functions
Definition (m-strongly convex function). A differentiable function
is called m-strongly convex for some if for all ,
Strong convexity is a strengthened form of convexity that guarantees the function grows at least quadratically away from any point, ensuring a unique global minimum.
Remark.
is often denoted in many texts. We keep for consistency with the lecture notation.
Chain of Equivalences. For a twice-differentiable function
Proof: (I)
Define
where the last line uses condition (I). Hence
Proof: (II)
If
Thus
Proof: (IV)
Assume
This establishes condition (III).
Proof: (III)
Assume
Let
Note that
Integrating over
Thus
This completes the chain of equivalences for m-strong convexity.
3.12.5 Linear Convergence under Smooth + Strong Convexity
When a function is both L-smooth and m-strongly convex, gradient descent achieves linear (exponential) convergence to the unique global minimizer.
Theorem (Linear convergence). Let
be L-smooth and m-strongly convex. Let be the unique global minimizer. Gradient descent with step size satisfies:
Proof. From the Descent Lemma with
By m-strong convexity, we have a lower bound relating the gradient norm to the optimality gap. For any
Minimizing the right-hand side over
since
Rearranging yields the Polyak–Łojasiewicz (PL) inequality:
Plug this into the Descent Lemma:
Unrolling this inequality over
Definition (Condition number). The ratio
is called the condition number of
. It measures how "elongated" the sublevel sets of are.
Since
Using the inequality
To achieve
which is linear in the condition number
Summary. The interplay between smoothness and strong convexity is fundamental to first-order optimization:
| Property | Guarantee | Rate |
|---|---|---|
| L-smooth only | Sublinear to stationary point | |
| L-smooth + convex | Sublinear to global minimum | |
| L-smooth + m-strongly convex | Linear (exponential) |
Strong convexity is what transforms the sublinear rate of gradient descent into a linear (geometric) rate, with the condition number