5. Orthogonality and Projections

5.1 Orthogonality

5.1.1 Orthogonal Subspaces

Two vectors $v, w \in R^{n}$ are called orthogonal if $v^{⊤} w = \sum_{i = 1}^{n} v_{i} w_{i} = 0$ . Two subspaces $V$ and $W$ are orthogonal if for all $v \in V$ and $w \in W$ , the vectors $v$ and $w$ are orthogonal.

5.1.2 Subspaces Orthogonal if Basis is Orthogonal

Let $v_{1}, \dots, v_{k}$ be a basis of subspace $V$ . Let $w_{1}, \dots, w_{l}$ be a basis of subspace $W$ . $V$ and $W$ are orthogonal if and only if $v_{i}$ and $w_{j}$ are orthogonal for all $i \in {1, \dots, k}$ and $j \in {1, \dots, l}$ .

We can determine if two subspaces are orthogonal by only comparing their basis vectors, as if they are orthogonal, all their linear combinations will also be.

As all vectors are pairwise linearly independent, their union is also linearly independent. Therefore the union of two bases is still a basis of the sum of their subspaces:

5.1.3 Basis of Subspace together form basis of $R^{n}$ .

Let $V$ and $W$ be orthogonal subspaces of $R^{n}$ . Let $v_{1}, \dots, v_{k}$ be a basis of subspace $V$ . Let $w_{1}, \dots, w_{l}$ be a basis of subspace $W$ . The set of vectors ${v_{1}, \dots, v_{k}, w_{1}, \dots, w_{l}}$ are linearly independent.

We can use this Lemma to get the basis of the subspace $V + W$ , which is then indeed also a subspace of $R^{n}$ . This sum of orthogonal vectors is called the Minkovsky-Sum!

5.1.4 Properties of orthogonal subspaces

Let $V$ and $W$ be orthogonal subspaces. Then $V \cap W = {0}$ (as all subspaces contain the $0$ vector).

If $dim (V) = k$ and $dim (W) = l$ , then $dim (V + W) = k + l \leq n$ .

Note that them being orthogonal does not yet mean they are orthogonal complements. Thus their dimension is $\leq$ not $=$ .

5.1.5 Orthogonal Complement

Let $V$ be a subspace of $R^{n}$ . We define the orthogonal complement of $V$ as $V^{⊥} = {w \in R^{n} ∣ w^{⊤} v = 0 for all v \in V}$

This $V^{⊥}$ is also a subspace of $R^{n}$ . Therefore, the concept of orthogonal subspaces allows us to decompose the space $R^{n}$ into a subspace and it’s complement (these being the nullspace of $A$ and the row space of $A$ ).

5.1.6 Decomposition of $R^{m \times n}$

Let $A \in R^{m \times n}$ be a matrix. $N (A) = C (A^{⊤})^{⊥} = R (A)$

As we know that if $r = dim (R (A))$ then $n - r = dim (N (A))$ . Thus they add up to the whole space!

5.1.7 Orthogonal subspace properties

Let $V, W$ be orthogonal subspaces of $R^{n}$ . Then the following statements are equivalent.

$W = V^{⊥}$

$dim (V) + dim (W) = n$

Every $u \in R^{n}$ can be written as $u = v + w$ with unique vectors $v \in V$ , $w \in W$

In words, this means that we can combine two orthogonal subspaces and create a new subspace, whose dimension is the sum of the two dimensions.

Example We can write: $R^{n} = V + V^{⊥} = {v + w ∣ v \in V, w \in V^{⊥}}$ (this is a Minkovsky sum). We can also write $R^{n} = V^{⊥} + (V^{⊥})^{⊥}$ , it’s symmetric.

5.1.8 Orthogonality cancels out

Let $V$ be a subspace of $R^{n}$ . Then $V = (V^{⊥})^{⊥}$

This allows us to rewrite Theorem 5.1.6:

5.1.9 Four fundamental Subspaces

Let $A \in R^{m \times n}$ . $N (A) = C (A^{⊤})^{⊥} and C (A^{⊤}) = N (A)^{⊥}$

Subspace Orthogonality

We have: $N (A)^{⊥} = C (A^{⊤})$ and $N (A^{⊤}) = C (A)^{⊥}$

We can visualise the four fundamental subspaces in the following way:

Consequences: Recall that the complete solution to $A x = b$ (“all $x$ ‘s st $A x = b$ “) $Sol (A, b) = {x \in R^{n} ∣ x = x_{p} + x_{n}, x_{n} \in N (A)} = x_{p} + N (A) = "one x_{p} + any x_{n} "$ where $x_{p}$ is a particular solution (assuming the system is feasible).

You can show that $Sol (A, b)$ is also equal to $x_{r} + N (A) = " x_{r} + any x_{n} "$ , where $x_{r} \in C (A^{T}) = R (A)$ and $x_{n} \in N (A)$ and where $x_{r} \in C (A^{T})$ is unique (i.e. such a particular solution $x_{p} = x_{r} \in R (A)$ always exists and is unique) (See Sabatino Notes Week 9 for Proof.)

5.1.10 $A$ and $A^{⊤} A$

Let $A \in R^{m \times n}$ . Then $N (A) = N (A^{⊤} A)$ and $C (A^{⊤}) = C (A^{⊤} A)$ .

Why is this needed?

$N (A) = N (A^{⊤} A)$ : if $N (A) = {0}$ i.e. $A$ lin. indep. columns then $N (A^{⊤} A)$ invertible as $N (A^{⊤} A) = {0}$ and square.
$C (A^{⊤}) = C (A^{⊤} A)$ : needed so $A^{⊤} A \overset{x}{^}$ is in the same subspace as $A^{⊤} b$ .

This one is important! Learn the proof! $N (A) = N (A^{⊤} A)$ holds because:

if $x \in N (A)$ then $A x = 0 ⟹ A^{⊤} A x = A \cdot 0 ⟹ A^{⊤} A x = 0$ .
if $x \in N (A^{⊤} A)$ then $A^{⊤} A x = 0$ , which means $0 = x^{⊤} 0 = x^{⊤} A^{⊤} A x = (A x)^{⊤} (A x) = ∣∣ A x ∣ ∣^{2} ⟹ A x = 0$ $C (A^{⊤}) = C (A^{⊤} A)$ holds because:
if $x \in C (A^{⊤} A)$ then let $\exists y A^{⊤} A y = x$ and if we set $z = A y$ then $A^{⊤} z = x$ thus $x \in C (A^{⊤})$ .
$C (A^{⊤}) = N (A)^{⊥} = N (A^{⊤} A)^{⊥}$ as ( $N (A^{⊤} A) = N (A)$ ). Then $N (A^{⊤} A)^{⊥} = C ((A^{⊤} A)^{⊤})$ (by $N (B)^{⊥} = C (B^{⊤})$ ) and $(A^{⊤} A)^{⊤} = A^{⊤} A$ thus $C (A^{⊤}) = C (A^{⊤} A)$ .

5.2 Projections

If a system $A x = b$ has no solutions (because $A$ is not invertible, i.e. $b \neq \in C (A)$ ) can we find a $p \in C (A)$ such that the error is minimal - basically the next best solution?

This usually happens when we have more equations than variables (i.e. each is being “overdetermined”) $m \geq n$ (alternative phrasing: there aren’t enough vectors to span the space, thus $C (A)$ is a line or a plane in 3d for example).

5.2.1 Projection of a vector onto a subspace

The projection of a vector $b \in R^{m}$ onto a subspace $S$ (of $R^{m}$ ) is the point in $S$ that is closest to $b$ . In other words $proj_{S} (b) = argmin_{p \in S} ∣∣ b - p ∣∣$

Where $b = p + e ⟹ b - p = e$ , with $e$ the error.

The one-dimensional case

We can clearly see that the error vector in 2d is orthogonal to the projection $e^{⊤} a = (b - p)^{⊤} a$ by geometric intuition.

5.2.2 Projection in 2d formula

Let $a \in R^{m} \ {0}$ . The projection of $b \in R^{m}$ on $S = {λa ∣ λ \in R} = C (a)$ is given by $proj_{S} (b) = \frac{a a ^{⊤}}{a ^{⊤} a} b$ This minimiser is unique.

And indeed if we substitute we find that the error vector is indeed orthogonal to $a$ .

Intuition: (In the lecture we don’t assume orthogonality of the error vector).

Assume $a ⊥ e$ $⟹$ $a ⊥ (b - p)$ (define error vector $e = b - p$ ).
We can write $p = λ a, λ \in R$ since we know the projection vector is on the line spanned by $a$ and hence $p$ is a scalar multiple of $a$ . $\begin{align} \mathbf{a} \perp (\mathbf{b} - \mathbf{p}) &\iff \mathbf{a}^\top(\mathbf{b} - \mathbf{p}) = 0 \\ &\iff \mathbf{a}^\top(\mathbf{b} - \lambda\mathbf{a}) = 0 \\ &\iff \mathbf{a}^\top\mathbf{b} - \mathbf{a}^\top\lambda\mathbf{a} = 0 \\ &\iff \mathbf{a}^\top\mathbf{b} = \mathbf{a}^\top\lambda\mathbf{a} \\ &\iff \mathbf{a}^\top\mathbf{b} = \lambda\mathbf{a}^\top\mathbf{a} \\ &\iff \lambda = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}} \end{align}$
Where we first used $v ⊥ u ⟺ v^{⊤} u = 0$ , then plugged in $λ a$ for $p$ , then used the distributivity of the vector multiplication.
We can divide by $a^{⊤} a$ ( $a^{⊤} a$ is a nonzero real number, as $a$ is a nonzero vector)

We can then plug in $λ$ into $p = λ a$ to get the projection vector $p = \frac{a ^{⊤} b}{a ^{⊤} a} a = \frac{a a ^{⊤}}{a ^{⊤} a} b$ We can do this since $a^{⊤} b$ is a scalar so $(a^{⊤} b) a = a (a^{⊤} b) = (a a^{⊤}) b$ (commute and associativity).

Idempotence in 1d: if we want to find the projection of a vector $b = λa$ , then the projection simply returns $λa$ , as $b$ is already on the line (subspace).

General Case

5.2.3 Projection is well defined

The projection of a vector $b \in R^{m}$ to the subspace $S = C (A)$ is well defined. It can be written as $proj_{S} (b) = A \overset{x}{^}$ where $\overset{x}{^}$ satisfies the normal equations $A^{⊤} A \overset{x}{^} = A^{⊤} b$

Intuition:

In the previous case, we had $e = (b - p ro j_{S} (b)) ⊥ a$ . Here, the same orthogonality condition holds for all columns of $A$ (that we are projecting on).
This is the same as stating $A^{⊤} (b - p ro j_{S} (b)) = 0$ which by substituting $p ro j_{S} (b) = p = A \hat{x}$ gives $A^{⊤} b - A^{⊤} A \hat{x} = 0$ which we can restate as $A^{⊤} A \hat{x} = A^{⊤} b$ , which is the normal equation.

Unique solution

If $A^{T} A$ is invertible, we have a unique solution $\overset{x}{^}$ that satisfies the equation: $p = A \overset{x}{^} = A (A^{⊤} A)^{- 1} A^{⊤} b$

From $A^{⊤} A \hat{x} = A^{⊤} b$ we can construct a formula for $p$ : $(A^{⊤} A)^{- 1} (A^{⊤} A) \hat{x} = (A^{⊤} A)^{- 1} A^{⊤} b$ ( $A^{⊤} A$ invertible if the columns of $A$ are independent), which gives us $A \hat{x} = A (A^{⊤} A)^{- 1} A^{⊤} b = p$ .

5.2.4 $A^{⊤} A$ invertible $\Leftrightarrow$ $A$ has indep. columns

$A^{⊤} A$ is invertible if and only if $A$ has linearly independent columns.

5.2.5 Projection Matrix

Let $S$ be a subspace in $R^{m}$ and $A$ a matrix whose columns are a basis of $S$ . The projection of $b \in R^{m}$ to $S$ is given by $proj_{S} (b) = P b$ where $P = A (A^{⊤} A)^{- 1} A^{⊤}$ is the projection matrix.

Note the condition for the columns to be a basis - this forces them to be independent, which means $A^{⊤} A$ invertible by Lemma 5.2.4.

IMPORTANT It may look like we can simplify the expression for the projection matrix $P$ . This is not the case as $(A ⊤ A)^{- 1} = A^{- 1} (A^{⊤})^{- 1}$ is only the case if $A$ itself is invertible. But if $A$ is invertible, it spans $R^{m}$ anyways and any projection is simply the point itself. This is beautifully reflected in the fact that if we simplify $P = A A^{- 1} (A^{⊤})^{- 1} A^{⊤}$ then we simply get $P = I$ .

5.2.6

If $b \in R^{m}$ then $proj_{S} (proj_{S} (b)) = proj_{S} (b)$ by definition. This requires us to have that $PP b = P b \Leftrightarrow P^{2} = P$ .

We also say $P$ is idempotent.

$P$ is symmetric (reprove in exam)

For $P$ we have that $P^{⊤} = P$ .

Proof: $P^{⊤} = (A (A^{⊤} A)^{- 1} A^{⊤})^{⊤} =$ $(A^{⊤})^{⊤} (A^{⊤} A)^{- 1}^{⊤} A^{⊤} = A (A^{⊤} A)^{- 1} A^{⊤} = P$ We use the fact that for invertible matrices $M^{- 1}^{⊤} = M^{⊤}^{- 1}$ .

5.2.6

Let $S^{⊥}$ be the orthogonal complement of $S$ and $P$ the projection matrix onto $S$ Then $I - P$ is the projection matrix that maps $b \in R^{m}$ to $proj_{S^{⊥}} (b)$ .

Proof Idea: Since $b = e + proj_{S} (b) = e + P b$ with $e \in S^{⊥}$ Thus $(I - P) b = b - P b = e = proj_{S^{⊥}} (b)$ This is true, since it holds that indeed $I - P$ is also idempotent: $(I - P)^{2} = I - 2 P + P^{2} = I - P - P + P = I - P$

Niklas @ ETHZ

Explorer

5. Orthogonality and Projections

5.1 Orthogonality

5.2 Projections

The one-dimensional case

General Case

Graph View

Table of Contents

Backlinks