Linear Algebra & Probability

Matrices

Orthogonal Matrix

→ Column vectors are unit vectors

→ They are orthogonal means dot product = 0

→ Just rotation of vector space

Projection Matrix

→ Data in 2d scatter, we want to pick or project that in a line

→ Compress to lower dim so vectors moves to closest point

Inverse Matrix

→ Untransformation → if matrix rotate, inverse rotates back

→ Not defined for zero & projection matrices

Symmetric Matrix

→ Should be square, eigenvectors are orthogonal

$$S = S^T \text{ for symmetric matrix } S$$

Transpose

→ Make columns the rows

→ $S = S^T$ for $S$ is symmetric

Orthogonal Q

$$Q^T = Q^{-1} \text{ (transpose = inverse)}$$

→ Note $Q \neq Q^T$

→ But $Q^T = Q^{-1}$

Rank & Determinant

Matrix Rank

→ Number of linearly independent columns (or rows).

→ Rank 1 matrix: $A = \vec{u} \vec{v}^T$. Every column is a multiple of $\vec{u}$.

→ Full Rank: $rank(A) = \min(m, n)$.

Determinant

→ Volume of the parallelepiped formed by column vectors.

→ $\det(AB) = \det(A)\det(B)$.

→ If $\det(A) = 0$, matrix is singular (not invertible, collapses space).

→ $\det(A) = \prod \lambda_i$ (Product of eigenvalues).

Trace

→ Sum of diagonal elements.

→ $Tr(A) = \sum \lambda_i$ (Sum of eigenvalues).

Eigenvectors

Definition

→ Every other vector deviates from their initial direction, eigenvectors stay after transformation.

$$A \vec{v} = \lambda \vec{v}$$

→ Eigenvector when linear transform they stay on line just scaled (eigenvalue)

For Symmetric Matrix

$$S = Q \Lambda Q^T$$

$$S = \begin{bmatrix} \vdots & \vdots & \vdots \\ c_1 & c_2 & c_3 \\ \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \lambda_3 \end{bmatrix} \begin{bmatrix} \vdots & \vdots & \vdots \\ c_1 & c_2 & c_3 \\ \vdots & \vdots & \vdots \end{bmatrix}^T$$

→ Rotation → scaling → unrotation (since orthogonal eigenvectors for symmetric matrix $Q^T = Q^{-1}$)

→ $C$ = eigenvectors, $\lambda$ = eigenvalues

Orthogonal Matrix

→ Equal rotation → eigenvectors in it

→ It rotates standard basis to eigenvectors

SVD

Overview

SVD is for any matrix, no restriction on symmetry. Dimension could be different.

$$A = U \Sigma V^T$$

Rectangular Matrix

→ What rectangular matrix does: basically if $R^2$ & $R^3$ then rectangular matrix is link / transform

$$(2,3) \cdot (3,1) \rightarrow (2,1) = \text{Dimension eraser}$$

$$(3,2) \cdot (2,1) \rightarrow (3,1) = \text{Dimension adder}$$

Left & Right Singular Vectors

→ $A_{m \times n}$ so if we do $A A^T$ this is symmetric

$$S_L = A A^T \text{ (left singular of A)}$$

$$S_R = A^T A \text{ (right singular of A)}$$

→ If $A_{2 \times 3}$:

→ $[S_L]$ 2×2 would have $u_1, u_2$ (left singular vectors)

→ $[S_R]$ 3×3 would have $v_1, v_2, v_3$ (right singular vectors)

→ $S_L$ & $S_R$ are positive semi definite (all eigenvalues $\geq 0$)

→ They have same (non-zero) eigenvalues

→ Sort in dec order: $\lambda_1 \geq \lambda_2 \geq \lambda_i \geq 0$

Singular Values

→ $\sqrt{\lambda_1}, \sqrt{\lambda_2}, \dots$ are singular values of $A$ ($\sigma_1, \sigma_2, \dots$)

→ $\Sigma$ → Rectangular Diagonal, same shape as $A$

→ $U$ → Orthogonal, normalized eigenvectors of $S_L$ ($A A^T$)

→ $V$ → Orthogonal, normalized eigenvectors of $S_R$ ($A^T A$)

Decomposition Form

$$A = U \Sigma V^T$$

$$A = \begin{bmatrix} | & | \\ u_1 & u_2 \\ | & | \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \end{bmatrix} \begin{bmatrix} \text{--- } v_1 \text{ ---} \\ \text{--- } v_2 \text{ ---} \\ \text{--- } v_3 \text{ ---} \end{bmatrix}$$

$$A = \sigma_1 \begin{bmatrix} \vdots \\ u_1 \\ \vdots \end{bmatrix} \begin{bmatrix} \cdots & v_1 & \cdots \end{bmatrix} + \sigma_2 \begin{bmatrix} \vdots \\ u_2 \\ \vdots \end{bmatrix} \begin{bmatrix} \cdots & v_2 & \cdots \end{bmatrix}$$

$$= \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T$$

where $u_i$ is a column vector and $v_i^T$ is a row vector (dot product of $u_i$ with each row of $V^T$)

$$= \sum_{i} \sigma_i u_i v_i^T \text{ (rank-1 matrices)}$$

Image Compression

$$A = U \Sigma V^T = \sum_{r=1}^{rank(A)} \sigma_i u_i v_i^T$$

→ Sum of rank-1 matrices. In image compression, pick $r = 10$, so only 10 of those decomposition.

Probability

Basic Rules

→ Sample Space ($S$): All possible outcomes.

→ Law of Total Probability: $P(A) = \sum_n P(A|B_n)P(B_n)$.

Bayes' Theorem

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

→ **Prior** $P(A)$: Belief before evidence.

→ **Likelihood** $P(B|A)$: Prob. of evidence given hypothesis.

→ **Posterior** $P(A|B)$: Belief after evidence.

Expectations & Variance

$$E[X] = \sum x P(x) \text{ (Mean / balance point)}$$

$$Var(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2$$

Covariance

→ Measure of how two variables move together.

$$Cov(X, Y) = E[(X - E[X])(Y - E[Y])]$$

→ **Covariance Matrix**: symmetric matrix where $\Sigma_{ij} = Cov(X_i, X_j)$. Connecting back to Lin Alg: This matrix is positive semi-definite!

Common Distributions

Bernoulli

Single trial, boolean outcome (success/failure).

$$P(X=1) = p, \quad P(X=0) = 1-p$$

Gaussian (Normal)

The "Bell Curve", defined by $\mu$ and $\sigma^2$.

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$$