Optimization Methods Notes (1)
Optimization, Infimum, Norm, Positive definite matrices, Taylor's theorem.

Optimization

Optimization finds the “best” possible solutions from a set of feasible points.

An optimization problem takes the form of minimizing (or maximizing) an objective function subject to constraints:

$$ \begin{array}{ll}\mathrm{Minimize}&f(x) \\ \mathrm{subject~to}&x\in\Omega.\end{array} $$

  • $x\in\mathbb{R}^n$ is called decision variables.
  • $f$ is called the objective function.
  • $\Omega\subseteq\mathbb{R}^n$ is called the constraint set / feasible set / feasible region.

The symmetry of $G$ can be assume without loss of generality:
for any $H\in\mathbb{R}^{n\times n}$ (no symmetric) and $x\in\mathbb{R}^n$, we have: $$\lbrack x^THx\rbrack_{1\times 1}=\lbrack x^THx\rbrack^T_{1\times 1}=x^TH^Tx$$ so: $$x^THx=x^TH^Tx=\frac12x^T(\underbrace{H^T+H}_{\text{symmetric matrix $G$}})x.$$ For $f(x)=\frac12x^TGx+c^Tx+\beta$ with $G\in\mathbb{R}^{n\times n}$ being symmetric, so: $$\nabla f(x)=Gx+c.$$

Infimum

$l=\inf S, \inf\emptyset=\infty$
Let $S\subseteq\mathbb{R}$ be a nonempty set. We say that $l\in[-\infty, \infty)$ is the infimum of $S$, if:

  • for every $s\in S, s \geq l$
  • for every $\zeta>l(\zeta\in\mathbb{R})$, one can find $s\in S$ so that $s<\zeta$

infimum is the largest number that is smaller than everything in $S$.
refer to set the infimum value as the optimal value.

Norm

Common vector norms

  • $l_1$ norm: $\vert\vert x\vert\vert_1:=\sum_{i=1}^{n}|x_i|\quad$ (sum of abs value)
  • $l_2$ norm: $\vert\vert x\vert\vert_2:=\sqrt{\sum_{i=1}^{n}|x_i|^2}\quad$ ($\vert\vert x\vert\vert_2^2=\begin{bmatrix}x1&x2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}=x^Tx$)
  • $l_\infty$ norm: $\vert\vert x\vert\vert_\infty:=\max_{i\leq i\leq n}|x_i|\quad$ (max abs value)

Common matrix norms

$\vert\vert A\vert\vert_1=\underbrace{\max_j\sum_{i=1}^{n}|a_{ij}|}_{\text{maximum of the $l_1$ norms of columns}}$

  • induced matrix norm
    $\underset{||x||_1=1}{\max}||Ax||_1$

$\vert\vert A\vert\vert_2=\underbrace{\sqrt{\lambda_{max}(A^TA)}}_{\text{maximum eigenvalue of $A^TA$}}$

  • induced matrix norm
    $\underset{||x||_2=1}{\max}||Ax||_2$

$\vert\vert A\vert\vert_\infty=\underbrace{\max_i\sum_{j=1}^{n}|a_{ij}|}_{\text{maximum of the $l_1$ norms of rows}}$

  • induced matrix norm
    $\max_{\vert\vert x\vert\vert_\infty=1}||Ax||_\infty$

$\vert\vert A\vert\vert_F=\underbrace{\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{n}|a_{ij}|^2}}_{\text{known as the Frobenius norm}}$

For symmetric A
$$\underset{||x||=1}{\min}x^TAx=\lambda_{\min}(A)$$ $$\underset{||x||=1}{\max}x^TAx=\lambda_{\max}(A)\xlongequal[]{if A\succ0}||A||_2$$

Positive definite matrices

Positive semidefinite matrices

Let $A\in\mathbb{R}^{n\times n}$ be symmetric, we say that $A$ is positive semidefinite if $x^TAx\geq0$, for all $x\in\mathbb{R}^{n}$. $A\succeq0$
Example:
$$A=\left[\begin{matrix}3&1\\1&2\end{matrix}\right]$$ $$x^TAx=\left[\begin{matrix}x1&x2\end{matrix}\right]\left[\begin{matrix}3&1\\1&2\end{matrix}\right]\left[\begin{matrix}x1\\x2\end{matrix}\right]$$ $$=3x_1^2+2x_1x_2+2x_2^2$$ $$=2x_1^2+(x_1+x_2)^2+x_2^2\geq0$$

Positive definite matrices

Let $A\in\mathbb{R}^{n\times n}$ be symmetric, we say that $A$ is positive definite if $x^TAx>0$, for all $x\in\mathbb{R}^{n}$\{0}.
$A\succ0$

if $D\succ0$ (positive definite)
$d=-D\nabla f(x)$ is a descent direction

Taylor’s theorem

with remainder term

Suppose $f$ is (n+1) times differentiable on an open interval containing $[a,b]$. Then: $$ f(b) = f(a) + f’(a)(b-a) + \frac{f’’(a)}{2!}(b-a)^2 + \cdots + \frac{f^{(n)}(a)}{n!}(b-a)^n + \frac{f^{(n+1)}(\xi)}{(n+1)!}(b-a)^{n+1} $$ for some ${\xi \in (a,b)}$.

with remainder term high-dimensional

Let $f\in C^1(\mathbb{R}^n)$, $x$ and $y\in\mathbb{R}^n$. Then there exists $\xi\in{(1-s)x+sy:s\in(0,1)}$ such that $$f(y)=f(x)+[\nabla f(\xi)]^T(y-x).$$

Let $f\in C^2(\mathbb{R}^n)$, $x$ and $y\in\mathbb{R}^n$. Then there exists $\xi\in {(1-s)x + sy : s \in (0,1)}$ such that $$f(y) = f(x) + [\nabla f(x)]^T (y - x) + \frac{1}{2}(y - x)^T \nabla^2 f(\xi) (y - x). $$


Last modified on 2023-09-22