Chapter 1: Mathematical Background

This chapter establishes all mathematical preliminaries needed throughout the course, from basic linear algebra and calculus to foundational optimization concepts.

1.1 What Is Optimization?

Optimization is the process of finding the best decision among a set of alternatives, subject to restrictions.

The Optimization Model

The standard formulation of an optimization problem is:

\begin{aligned} minimize & f (x) \\ subject to & x \in X \end{aligned}

where:

Decision variable(s) $x$ : represents some action or choice we control.

Objective function $f (\cdot)$ : represents total cost, risk, or negative profit (the quantity we wish to minimize or maximize).

Constraint set $X$ : puts restrictions on the permissible choices of $x$ .

Why Optimization Matters

Optimization appears across virtually every domain:

Applied science, engineering, economics, finance, medicine, statistics, business
General decision and policy making
Image inpainting (filling missing regions)
Recommendation systems (predicting user preferences)
Linear and logistic regression (predictive modeling)
Neural networks (deep learning)
Portfolio design, logistics, diet planning

As Maupertuis (1698–1759) proclaimed:

"...nothing at all takes place in the Universe in which some rule of maximum or minimum does not appear."

Key Conceptual Distinctions

Global vs. local optimum: A global optimum is the best among all feasible points; a local optimum is best only within a neighborhood.

Feasible vs. infeasible point: A feasible point satisfies all constraints; an infeasible point violates at least one constraint.

Constrained vs. unconstrained optimization: A constrained problem has $X ⊊ R^{n}$ ; an unconstrained problem has $X = R^{n}$ .

Linear vs. nonlinear optimization: A linear problem has a linear objective and linear constraints; a nonlinear problem has at least one nonlinear function.

Convex vs. nonconvex optimization: A convex problem has a convex objective minimized over a convex feasible set; nonconvex problems may have multiple local optima.

1.2 Inner Products

Inner Product of Vectors

Definition: For vectors $x, y \in R^{n}$ , the inner product (dot product) is
$⟨ x, y ⟩ = x^{⊤} y = \sum_{i = 1}^{n} x_{i} y_{i}$

Geometric Interpretation

The angle $θ$ between $x$ and $y$ is defined by

\cos θ = \frac{⟨ x, y ⟩}{∥ x ∥ ∥ y ∥}, ∥ x ∥ = \sqrt{⟨ x, x ⟩}

$θ = 0^{\circ} \Rightarrow$ vectors point in the same direction
$θ = 90^{\circ} \Rightarrow$ vectors are orthogonal (perpendicular)
$θ = 180^{\circ} \Rightarrow$ vectors point in opposite directions

Example:

x = [\begin{matrix} 1 \\ 2 \end{matrix}], y = [\begin{matrix} 3 \\ 1 \end{matrix}] ⟹ ⟨ x, y ⟩ = 1 \cdot 3 + 2 \cdot 1 = 5, θ = \cos^{- 1} (\frac{5}{\sqrt{5} \sqrt{10}})

Frobenius Inner Product of Matrices

Definition: For matrices $A, B \in R^{m \times n}$ , the Frobenius inner product is
$⟨ A, B ⟩_{F} = \sum_{i = 1}^{m} \sum_{j = 1}^{n} A_{i j} B_{i j} = trace (A^{⊤} B)$

The angle between matrices is defined analogously:

\cos θ = \frac{⟨ A, B ⟩_{F}}{∥ A ∥_{F} ∥ B ∥_{F}}

Example:

A = [\begin{matrix} 1 & 2 \\ 0 & 3 \end{matrix}], B = [\begin{matrix} 2 & 1 \\ 1 & 0 \end{matrix}] ⟹ ⟨ A, B ⟩_{F} = 1 \cdot 2 + 2 \cdot 1 + 0 \cdot 1 + 3 \cdot 0 = 4

1.3 Norms

A norm $∥ \cdot ∥$ measures the length or magnitude of a vector (or matrix). It satisfies:

$∥ x ∥ \geq 0$ , and $∥ x ∥ = 0 ⟺ x = 0$
$∥ α x ∥ = | α | ∥ x ∥$ (homogeneity)
$∥ x + y ∥ \leq ∥ x ∥ + ∥ y ∥$ (triangle inequality)

Common Vector Norms

Euclidean Norm ( $ℓ_{2}$ norm)

$∥ x ∥_{2} = \sqrt{\sum_{i = 1}^{n} x_{i}^{2}}$

The most familiar norm, corresponding to geometric length.

$ℓ_{1}$ Norm

$∥ x ∥_{1} = \sum_{i = 1}^{n} | x_{i} |$

Sum of absolute values. Promotes sparsity in optimization (e.g., LASSO, basis pursuit).

Chebyshev Norm ( $ℓ_{\infty}$ norm)

$∥ x ∥_{\infty} = max_{i} | x_{i} |$

Maximum absolute component. Used in worst-case analysis.

$ℓ_{p}$ Norm (general, $p \geq 1$ )

$∥ x ∥_{p} = {(\sum_{i = 1}^{n} | x_{i} |^{p})}^{1 / p}$

Recovers $ℓ_{1}$ when $p = 1$ , $ℓ_{2}$ when $p = 2$ , and approaches $ℓ_{\infty}$ as $p \to \infty$ .

Frobenius Norm (Matrix Norm)

$∥ A ∥_{F} = \sqrt{⟨ A, A ⟩_{F}} = \sqrt{\sum_{i, j} A_{i j}^{2}}$

The Frobenius norm treats a matrix as a vector in $R^{m n}$ ; it is the $ℓ_{2}$ norm of the vectorized matrix.

1.4 Symmetric Matrices

Definition: A square matrix $A \in R^{n \times n}$ is symmetric if
$A = A^{⊤} or equivalently A_{i j} = A_{j i} for all i, j$

Key Properties

All eigenvalues are real. This is a consequence of symmetry and guarantees that eigenvalue-based analysis is well-defined.
Eigenvectors corresponding to distinct eigenvalues are orthogonal.
Symmetric matrices are diagonalizable by an orthogonal matrix. There exists $Q$ with $Q^{⊤} Q = I$ and a diagonal $Λ$ such that $A = Q Λ Q^{⊤}$ .

Eigenvalue Decomposition

If $A$ is symmetric, it admits the decomposition

$A = Q Λ Q^{⊤} = \sum_{i = 1}^{n} λ_{i} q_{i} q_{i}^{⊤}$

where:

$Q = [q_{1} q_{2} \dots q_{n}]$ is orthonormal: $Q^{⊤} Q = I$
$Λ = diag (λ_{1}, λ_{2}, \dots, λ_{n})$ with $λ_{i} \in R$
Each term $λ_{i} q_{i} q_{i}^{⊤}$ is a rank-1 symmetric matrix

1.5 Positive Semidefinite (PSD) Matrices

Definition: A symmetric matrix $A \in R^{n \times n}$ is positive semidefinite (PSD), denoted $A ⪰ 0$ , if
$x^{⊤} A x \geq 0 \forall x \in R^{n}$

Key Properties

Eigenvalue characterization: All eigenvalues of $A$ are nonnegative: $λ_{i} \geq 0$ .
PSD matrices are symmetric, hence diagonalizable by orthogonal matrices.
Cholesky factorization: If $A ⪰ 0$ , then $A = B B^{⊤}$ for some $B \in R^{n \times n}$ (matrix square root).
The set of PSD matrices forms a convex cone. That is, if $A, B ⪰ 0$ and $α, β \geq 0$ , then $α A + β B ⪰ 0$ .

PSD Cone

Denote:

$S^{n} = {X \in R^{n \times n} : X^{⊤} = X}$ — the set of $n \times n$ symmetric matrices
$S_{+}^{n} = {X \in S^{n} : X ⪰ 0}$ — the positive semidefinite cone
$S_{+ +}^{n} = {X \in S^{n} : X ≻ 0}$ — the positive definite cone (strict inequality, all eigenvalues $> 0$ )

1.6 Gradient and Hessian

For a multivariate function $f : R^{n} \to R$ :

Gradient (First-Order Information)

Gradient: The gradient points in the direction of steepest increase:
$\nabla f (x) = [\begin{matrix} \frac{\partial f}{\partial x_{1}} \\ \frac{\partial f}{\partial x_{2}} \\ ⋮ \\ \frac{\partial f}{\partial x_{n}} \end{matrix}] \in R^{n}$

Hessian (Second-Order Information)

Hessian: The matrix of second-order partial derivatives describing local curvature:
$\nabla^{2} f (x) = [\begin{matrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} & \frac{\partial^{2} f}{\partial x_{1} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{1} \partial x_{n}} \\ \frac{\partial^{2} f}{\partial x_{2} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{2}^{2}} & \dots & \frac{\partial^{2} f}{\partial x_{2} \partial x_{n}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} f}{\partial x_{n} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{n} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{n}^{2}} \end{matrix}] \in R^{n \times n}$

The Hessian is symmetric when $f$ is twice continuously differentiable (by Clairaut's theorem).

Quadratic Form Example

Let $f (x) = x^{⊤} A x + b^{⊤} x + c$ , with $A \in R^{n \times n}$ symmetric, $b \in R^{n}$ , $c \in R$ . Then:

\nabla f (x) = 2 A x + b, \nabla^{2} f (x) = 2 A

1.7 Lines, Line Segments, and Rays

Given two points $x, y \in R^{n}$ , define the direction vector $d := y - x$ .

Parametric Representations

Line through $x$ and $y$ :
$ℓ_{x, y} = {x + θ d ∣ θ \in R}$

Line segment from $x$ to $y$ :
$[x, y] = {x + θ d ∣ θ \in [0, 1]} = {(1 - θ) x + θ y ∣ θ \in [0, 1]}$

Ray starting at $x$ going through $y$ :
$\vec{x y} = {x + θ d ∣ θ \geq 0}$

Geometric Intuition

The line extends infinitely in both directions.
The line segment is the convex combination of $x$ and $y$ , bounded between them.
The ray extends infinitely in the direction from $x$ toward $y$ .

These primitive sets are the building blocks of convex geometry. The line segment representation

z = (1 - θ) x + θ y, θ \in [0, 1]

is called a convex combination of $x$ and $y$ , and it is the central operation in the definition of convex sets.

1.8 Summary of Mathematical Preliminaries

Concept	Key Formula / Property
Inner product (vectors)	$⟨ x, y ⟩ = x^{⊤} y = \sum x_{i} y_{i}$
Frobenius inner product	$⟨ A, B ⟩_{F} = trace (A^{⊤} B)$
Orthogonality	$⟨ x, y ⟩ = 0 \Leftrightarrow θ = 90^{\circ}$
$ℓ_{2}$ norm	$\| x \|_{2} = \sqrt{\sum x_{i}^{2}}$
$ℓ_{1}$ norm	$\| x \|_{1} = \sum \| x_{i} \|$
$ℓ_{\infty}$ norm	$\| x \|_{\infty} = max_{i} \| x_{i} \|$
$ℓ_{p}$ norm	$\| x \|_{p} = (\sum \| x_{i} \|^{p})^{1 / p}$
Frobenius norm	$\| A \|_{F} = \sqrt{\sum A_{i j}^{2}}$
Symmetric matrix	$A = A^{⊤}$ , eigenvalues real
Eigenvalue decomposition	$A = Q Λ Q^{⊤}$
PSD matrix	$x^{⊤} A x \geq 0 \forall x$ , $λ_{i} \geq 0$
PSD convex cone	$α A + β B ⪰ 0$ for $α, β \geq 0$
Gradient	$\nabla f (x) \in R^{n}$ , direction of steepest increase
Hessian	$\nabla^{2} f (x) \in R^{n \times n}$ , local curvature
Line through $x, y$	${x + θ (y - x) ∣ θ \in R}$
Line segment $[x, y]$	${(1 - θ) x + θ y ∣ θ \in [0, 1]}$
Ray $\vec{x y}$	${x + θ (y - x) ∣ θ \geq 0}$

These mathematical tools form the foundation for the study of convex sets, convex functions, and optimization algorithms in the chapters that follow.

Chapter 1: Mathematical Background ​

1.1 What Is Optimization? ​

The Optimization Model ​

Why Optimization Matters ​

Key Conceptual Distinctions ​

1.2 Inner Products ​

Inner Product of Vectors ​

Geometric Interpretation ​

Frobenius Inner Product of Matrices ​

1.3 Norms ​

Common Vector Norms ​

Euclidean Norm (ℓ2 norm) ​

ℓ1 Norm ​

Chebyshev Norm (ℓ∞ norm) ​

ℓp Norm (general, p≥1) ​

Frobenius Norm (Matrix Norm) ​

1.4 Symmetric Matrices ​

Key Properties ​

Eigenvalue Decomposition ​

1.5 Positive Semidefinite (PSD) Matrices ​

Key Properties ​

PSD Cone ​

1.6 Gradient and Hessian ​

Gradient (First-Order Information) ​

Hessian (Second-Order Information) ​

Quadratic Form Example ​

1.7 Lines, Line Segments, and Rays ​

Parametric Representations ​

Geometric Intuition ​

1.8 Summary of Mathematical Preliminaries ​

Chapter 1: Mathematical Background

1.1 What Is Optimization?

The Optimization Model

Why Optimization Matters

Key Conceptual Distinctions

1.2 Inner Products

Inner Product of Vectors

Geometric Interpretation

Frobenius Inner Product of Matrices

1.3 Norms

Common Vector Norms

Euclidean Norm ( $ℓ_{2}$ norm)

$ℓ_{1}$ Norm

Chebyshev Norm ( $ℓ_{\infty}$ norm)

$ℓ_{p}$ Norm (general, $p \geq 1$ )

Frobenius Norm (Matrix Norm)

1.4 Symmetric Matrices

Key Properties

Eigenvalue Decomposition

1.5 Positive Semidefinite (PSD) Matrices

Key Properties

PSD Cone

1.6 Gradient and Hessian

Gradient (First-Order Information)

Hessian (Second-Order Information)

Quadratic Form Example

1.7 Lines, Line Segments, and Rays

Parametric Representations

Geometric Intuition

1.8 Summary of Mathematical Preliminaries