Skip to content

Chapter 1: Mathematical Background

This chapter establishes all mathematical preliminaries needed throughout the course, from basic linear algebra and calculus to foundational optimization concepts.


1.1 What Is Optimization?

Optimization is the process of finding the best decision among a set of alternatives, subject to restrictions.

The Optimization Model

The standard formulation of an optimization problem is:

minimizef(x)subjecttoxX

where:

Decision variable(s) x: represents some action or choice we control.

Objective function f(): represents total cost, risk, or negative profit (the quantity we wish to minimize or maximize).

Constraint set X: puts restrictions on the permissible choices of x.

Why Optimization Matters

Optimization appears across virtually every domain:

  • Applied science, engineering, economics, finance, medicine, statistics, business
  • General decision and policy making
  • Image inpainting (filling missing regions)
  • Recommendation systems (predicting user preferences)
  • Linear and logistic regression (predictive modeling)
  • Neural networks (deep learning)
  • Portfolio design, logistics, diet planning

As Maupertuis (1698–1759) proclaimed:

"...nothing at all takes place in the Universe in which some rule of maximum or minimum does not appear."

Key Conceptual Distinctions

Global vs. local optimum: A global optimum is the best among all feasible points; a local optimum is best only within a neighborhood.

Feasible vs. infeasible point: A feasible point satisfies all constraints; an infeasible point violates at least one constraint.

Constrained vs. unconstrained optimization: A constrained problem has XRn; an unconstrained problem has X=Rn.

Linear vs. nonlinear optimization: A linear problem has a linear objective and linear constraints; a nonlinear problem has at least one nonlinear function.

Convex vs. nonconvex optimization: A convex problem has a convex objective minimized over a convex feasible set; nonconvex problems may have multiple local optima.


1.2 Inner Products

Inner Product of Vectors

Definition: For vectors x,yRn, the inner product (dot product) is

x,y=xy=i=1nxiyi

Geometric Interpretation

The angle θ between x and y is defined by

cosθ=x,yxy,x=x,x
  • θ=0 vectors point in the same direction
  • θ=90 vectors are orthogonal (perpendicular)
  • θ=180 vectors point in opposite directions

Example:

x=[12],y=[31]x,y=13+21=5,θ=cos1(5510)

Frobenius Inner Product of Matrices

Definition: For matrices A,BRm×n, the Frobenius inner product is

A,BF=i=1mj=1nAijBij=trace(AB)

The angle between matrices is defined analogously:

cosθ=A,BFAFBF

Example:

A=[1203],B=[2110]A,BF=12+21+01+30=4

1.3 Norms

A norm measures the length or magnitude of a vector (or matrix). It satisfies:

  1. x0, and x=0x=0
  2. αx=|α|x (homogeneity)
  3. x+yx+y (triangle inequality)

Common Vector Norms

Euclidean Norm (2 norm)

x2=i=1nxi2

The most familiar norm, corresponding to geometric length.

1 Norm

x1=i=1n|xi|

Sum of absolute values. Promotes sparsity in optimization (e.g., LASSO, basis pursuit).

Chebyshev Norm ( norm)

x=maxi|xi|

Maximum absolute component. Used in worst-case analysis.

p Norm (general, p1)

xp=(i=1n|xi|p)1/p

Recovers 1 when p=1, 2 when p=2, and approaches as p.

Frobenius Norm (Matrix Norm)

AF=A,AF=i,jAij2

The Frobenius norm treats a matrix as a vector in Rmn; it is the 2 norm of the vectorized matrix.


1.4 Symmetric Matrices

Definition: A square matrix ARn×n is symmetric if

A=Aor equivalentlyAij=Aji for all i,j

Key Properties

  • All eigenvalues are real. This is a consequence of symmetry and guarantees that eigenvalue-based analysis is well-defined.
  • Eigenvectors corresponding to distinct eigenvalues are orthogonal.
  • Symmetric matrices are diagonalizable by an orthogonal matrix. There exists Q with QQ=I and a diagonal Λ such that A=QΛQ.

Eigenvalue Decomposition

If A is symmetric, it admits the decomposition

A=QΛQ=i=1nλiqiqi

where:

  • Q=[q1q2qn] is orthonormal: QQ=I
  • Λ=diag(λ1,λ2,,λn) with λiR
  • Each term λiqiqi is a rank-1 symmetric matrix

1.5 Positive Semidefinite (PSD) Matrices

Definition: A symmetric matrix ARn×n is positive semidefinite (PSD), denoted A0, if

xAx0xRn

Key Properties

  • Eigenvalue characterization: All eigenvalues of A are nonnegative: λi0.
  • PSD matrices are symmetric, hence diagonalizable by orthogonal matrices.
  • Cholesky factorization: If A0, then A=BB for some BRn×n (matrix square root).
  • The set of PSD matrices forms a convex cone. That is, if A,B0 and α,β0, then αA+βB0.

PSD Cone

Denote:

  • Sn={XRn×n:X=X} — the set of n×n symmetric matrices
  • S+n={XSn:X0} — the positive semidefinite cone
  • S++n={XSn:X0} — the positive definite cone (strict inequality, all eigenvalues >0)

1.6 Gradient and Hessian

For a multivariate function f:RnR:

Gradient (First-Order Information)

Gradient: The gradient points in the direction of steepest increase:

f(x)=[fx1fx2fxn]Rn

Hessian (Second-Order Information)

Hessian: The matrix of second-order partial derivatives describing local curvature:

2f(x)=[2fx122fx1x22fx1xn2fx2x12fx222fx2xn2fxnx12fxnx22fxn2]Rn×n

The Hessian is symmetric when f is twice continuously differentiable (by Clairaut's theorem).

Quadratic Form Example

Let f(x)=xAx+bx+c, with ARn×n symmetric, bRn, cR. Then:

f(x)=2Ax+b,2f(x)=2A

1.7 Lines, Line Segments, and Rays

Given two points x,yRn, define the direction vector d:=yx.

Parametric Representations

Line through x and y:

x,y={x+θdθR}

Line segment from x to y:

[x,y]={x+θdθ[0,1]}={(1θ)x+θyθ[0,1]}

Ray starting at x going through y:

xy={x+θdθ0}

Geometric Intuition

  • The line extends infinitely in both directions.
  • The line segment is the convex combination of x and y, bounded between them.
  • The ray extends infinitely in the direction from x toward y.

These primitive sets are the building blocks of convex geometry. The line segment representation

z=(1θ)x+θy,θ[0,1]

is called a convex combination of x and y, and it is the central operation in the definition of convex sets.


1.8 Summary of Mathematical Preliminaries

ConceptKey Formula / Property
Inner product (vectors)x,y=xy=xiyi
Frobenius inner productA,BF=trace(AB)
Orthogonalityx,y=0θ=90
2 norm|x|2=xi2
1 norm|x|1=|xi|
norm|x|=maxi|xi|
p norm|x|p=(|xi|p)1/p
Frobenius norm|A|F=Aij2
Symmetric matrixA=A, eigenvalues real
Eigenvalue decompositionA=QΛQ
PSD matrixxAx0x, λi0
PSD convex coneαA+βB0 for α,β0
Gradientf(x)Rn, direction of steepest increase
Hessian2f(x)Rn×n, local curvature
Line through x,y{x+θ(yx)θR}
Line segment [x,y]{(1θ)x+θyθ[0,1]}
Ray xy{x+θ(yx)θ0}

These mathematical tools form the foundation for the study of convex sets, convex functions, and optimization algorithms in the chapters that follow.