Chapter 1: Mathematical Background
This chapter establishes all mathematical preliminaries needed throughout the course, from basic linear algebra and calculus to foundational optimization concepts.
1.1 What Is Optimization?
Optimization is the process of finding the best decision among a set of alternatives, subject to restrictions.
The Optimization Model
The standard formulation of an optimization problem is:
where:
Decision variable(s)
: represents some action or choice we control.
Objective function
: represents total cost, risk, or negative profit (the quantity we wish to minimize or maximize).
Constraint set
: puts restrictions on the permissible choices of .
Why Optimization Matters
Optimization appears across virtually every domain:
- Applied science, engineering, economics, finance, medicine, statistics, business
- General decision and policy making
- Image inpainting (filling missing regions)
- Recommendation systems (predicting user preferences)
- Linear and logistic regression (predictive modeling)
- Neural networks (deep learning)
- Portfolio design, logistics, diet planning
As Maupertuis (1698–1759) proclaimed:
"...nothing at all takes place in the Universe in which some rule of maximum or minimum does not appear."
Key Conceptual Distinctions
Global vs. local optimum: A global optimum is the best among all feasible points; a local optimum is best only within a neighborhood.
Feasible vs. infeasible point: A feasible point satisfies all constraints; an infeasible point violates at least one constraint.
Constrained vs. unconstrained optimization: A constrained problem has
; an unconstrained problem has .
Linear vs. nonlinear optimization: A linear problem has a linear objective and linear constraints; a nonlinear problem has at least one nonlinear function.
Convex vs. nonconvex optimization: A convex problem has a convex objective minimized over a convex feasible set; nonconvex problems may have multiple local optima.
1.2 Inner Products
Inner Product of Vectors
Definition: For vectors
, the inner product (dot product) is
Geometric Interpretation
The angle
vectors point in the same direction vectors are orthogonal (perpendicular) vectors point in opposite directions
Example:
Frobenius Inner Product of Matrices
Definition: For matrices
, the Frobenius inner product is
The angle between matrices is defined analogously:
Example:
1.3 Norms
A norm
, and (homogeneity) (triangle inequality)
Common Vector Norms
Euclidean Norm ( norm)
The most familiar norm, corresponding to geometric length.
Norm
Sum of absolute values. Promotes sparsity in optimization (e.g., LASSO, basis pursuit).
Chebyshev Norm ( norm)
Maximum absolute component. Used in worst-case analysis.
Norm (general, )
Recovers
Frobenius Norm (Matrix Norm)
The Frobenius norm treats a matrix as a vector in
1.4 Symmetric Matrices
Definition: A square matrix
is symmetric if
Key Properties
- All eigenvalues are real. This is a consequence of symmetry and guarantees that eigenvalue-based analysis is well-defined.
- Eigenvectors corresponding to distinct eigenvalues are orthogonal.
- Symmetric matrices are diagonalizable by an orthogonal matrix. There exists
with and a diagonal such that .
Eigenvalue Decomposition
If
where:
is orthonormal: with - Each term
is a rank-1 symmetric matrix
1.5 Positive Semidefinite (PSD) Matrices
Definition: A symmetric matrix
is positive semidefinite (PSD), denoted , if
Key Properties
- Eigenvalue characterization: All eigenvalues of
are nonnegative: . - PSD matrices are symmetric, hence diagonalizable by orthogonal matrices.
- Cholesky factorization: If
, then for some (matrix square root). - The set of PSD matrices forms a convex cone. That is, if
and , then .
PSD Cone
Denote:
— the set of symmetric matrices — the positive semidefinite cone — the positive definite cone (strict inequality, all eigenvalues )
1.6 Gradient and Hessian
For a multivariate function
Gradient (First-Order Information)
Gradient: The gradient points in the direction of steepest increase:
Hessian (Second-Order Information)
Hessian: The matrix of second-order partial derivatives describing local curvature:
The Hessian is symmetric when
Quadratic Form Example
Let
1.7 Lines, Line Segments, and Rays
Given two points
Parametric Representations
Line through
and :
Line segment from
to :
Ray starting at
going through :
Geometric Intuition
- The line extends infinitely in both directions.
- The line segment is the convex combination of
and , bounded between them. - The ray extends infinitely in the direction from
toward .
These primitive sets are the building blocks of convex geometry. The line segment representation
is called a convex combination of
1.8 Summary of Mathematical Preliminaries
| Concept | Key Formula / Property |
|---|---|
| Inner product (vectors) | |
| Frobenius inner product | |
| Orthogonality | |
| Frobenius norm | |
| Symmetric matrix | |
| Eigenvalue decomposition | |
| PSD matrix | |
| PSD convex cone | |
| Gradient | |
| Hessian | |
| Line through | |
| Line segment | |
| Ray |
These mathematical tools form the foundation for the study of convex sets, convex functions, and optimization algorithms in the chapters that follow.