6. Applications of Orthogonality and Projections

6.1 Least Squares Approximation

One case in which $A x = b$ usually has no solutions is when trying to fit a line to a set of measurements or datapoints. We can then try to use a least squares approximation to find a best-fitting line.

6.1.1 Minimiser is solution to the normal equations

A minimiser of $min_{\overset{x}{^} \in R^{n}} ∣∣ A \overset{x}{^} - b ∣ ∣^{2}$ is also a solution of $A^{⊤} A \overset{x}{^} = A^{⊤} b$ . When $A$ has independent columns the unique minimiser of $\overset{x}{^}$ is given by $\overset{x}{^} = (A^{⊤} A)^{- 1} A^{⊤} b$

Notice that $∣∣ A \overset{x}{^} - b ∣ ∣^{2}$ being minimal is equivalent to asking $A \overset{x}{^}$ to be the projection of $b$ onto $C (A)$ (then we minimise $e$ the error). Thus we can solve for $\overset{x}{^}$ (which when multiplied with $A$ gives us the projection) by using the projection, without the final $A$ on the left, which gives us the above equation.

The line here can be represented by the equation $f (x) = α_{1} x + α_{0}$ . We want to find $α_{0}$ and $α_{1}$ that minimise the sum of squares of the error of the fitted line: $min_{α_{0}, α_{1}} \sum_{k = 1}^{m} (b_{k} - [α_{0} + α_{1} t_{k}])^{2}$ which in Matrix vector notation gives $min_{α_{0}, α_{1}} ∣∣ b - A (α_{1} α_{0}) ∣ ∣^{2}$ where $b = b_{1} b_{2} ⋮ b_{m}$ and $A = 11 ⋮ 11 t_{1} t_{2} ⋮ t_{m - 1} t_{m}$

6.1.2 Independent columns of $A$

The columns of the $m \times 2$ matrix defined before are linearly dependent if and only if $t_{i} = t_{j}$ for all $i \neq = j$ .

As all datapoints are unique in time (can’t have two points at one time) this always holds.

6.1.3

If the columns of $A$ are pairwise orthogonal, we get $A^{⊤} A$ a diagonal matrix which is very easy to invert. We can convert any $A$ to have orthogonal columns by making sure that the sum of all the $t_{k} = 0$ , which can be achieved by shifting the graph on the x-axis.

We can also try to fit any other equation, not just a line here. A parabola could be used by adding a third column to $A$ which contains $t^{2}$ and adding $α_{2}$ .

6.2 The set of all solutions to a system of linear equations

We know that $\forall x \in R^{n}$ , there exist $x_{0} \in N (A)$ and $x_{1} \in R (A)$ such that $x = x_{0} + x_{1}$ and $x_{1}^{⊤} x_{0} = 0$ as $N (A) = C (A^{⊤})^{⊥}$ are orthogonal complements.

6.2.1 Unique solutions

Let $A \in R^{m \times n}$ . Let $x, y \in C (A^{⊤})$ . We have that $A x = A y \Leftrightarrow x = y$

Proof This is because $x, y$ have unique decompositions into the two fundamental subspaces. $A x = A y \Leftrightarrow x - y \in N (A) \Leftrightarrow x^{⊤} (x - y) = 0 = y^{⊤} (x - y) \Leftrightarrow (x - y)^{⊤} (x - y) = 0$ and from this follows that $∣∣ x - y ∣ ∣^{2} = 0 ⟹ x - y = 0$ .

6.2.2 Unique solution to a system

Suppose that ${x \in R^{n} ∣ A x = b} \neq = \emptyset$ . Then ${x \in R^{n} ∣ A x = b} = x_{1} + N (A)$ where $x_{1} \in R (A)$ is unique such that $A x_{1} = b$ .

This means that if there’s more than one solution to the system (i.e. the nullspace is not $= {0}$ ), then the set of all solutions is a specific solution + the entire nullspace.

6.2.3 Unique in $C (A^{⊤} A)$

Suppose that ${x \in R^{n} ∣ A x = b} \neq = \emptyset$ . Then there exists a unique vector $x_{1} \in C (A^{⊤} A)$ such that $A x_{1} = b$

The empty solution

We now only need to analyse the case where the given system of linear equations has no solutions: ${x \in R^{n} ∣ A x = b} = \emptyset$ .

This is difficult as proving a negative in this case means exhausting the entire search space (which is infinite) or proving it by using something smarter.

We can issue a “certificate” which proves that a system has no solution.

6.2.4 Proving no solutions

${x \in R^{n} ∣ A x = b} = \emptyset ⟺ {z \in R^{m} ∣ A^{⊤} z = 0, b^{⊤} z = 1} \neq = \emptyset$ Note that we don’t need it to be $1$ , it just has to be $\neq = 0$ .

In words: our LSE $A x = b$ does not have any solutions if and only if there exists a vector $z$ that is orthogonal to all columns of $A$ but not orthogonal to $b$ .

The blue vector $z$ is orthogonal to all in $C (A)$ , the blue subspace. If $b$ is not orthogonal to $z$ , this means that it cannot possibly be in the subspace, it must be slightly above/below it. Therefore $b \neq \in C (A)$ and thus there’s no solution.

Proof:

Verify that $P \neq = \emptyset \land D \neq = \emptyset$ is impossible:
- If $x \in P$ and $z \in D$ then $0 = 0^{⊤} x = z^{⊤} A x = z^{⊤} b = 1$
We now want to show that if $P = \emptyset ⟹ D \neq = \emptyset$ :
- $P = \emptyset ⟹ b \neq \in C (A)$ (there are no solutions) $⟺ b - proj_{C (A)} (b) \neq = 0$ (there is an error, $e = b - proj_{C (A)} \in N (A^{⊤})$
  - Thus $e \neq = 0$ and $z := \frac{1}{e ^{⊤} e} e$
  - $z \in N (A^{⊤}) ⟹ A^{⊤} z = 0$
  - $b^{⊤} z = \frac{1}{e ^{⊤} e} e^{⊤} b = \frac{1}{e ^{⊤} e} b^{⊤} e$ but we can rewrite $b = (p + e)$ and thus $b^{⊤} z = \frac{1}{e ^{⊤} e} p^{⊤} e + \frac{1}{e ^{⊤} e} e^{⊤} e$
    - we know the first term is $0$ as $p^{⊤} e = 0$ as they are in orthogonal subspaces
    - the second term is $1$ as it’s $\frac{e ^{⊤} e}{e ^{⊤} e}$ .
- Thus $z \in D$

Example:

$P = {x \in R^{3} ∣ x_{1} + 2 x_{2} - x_{3} = 1, 2 x_{1} + 4 x_{2} - 2 x_{3} = 0}$
The system $D = {z \in R^{m} ∣ A^{⊤} z = 0, b^{⊤} z = 1}$ is then $$ D = { z \in \mathbb{R}^2 \ | \ z_1 + 2z_2 = 0, 2z_1 + 4z_2 = 0, -z_1 - 2z_2 = 0, z_1 = 1 } $$$P = \emptyset $an d$ D \neq \emptyset $b ec a u se$ z = (1, -\frac{1}{2})^\top \in D$.

Applications:

If our matrix $A \in R^{m \times n}$ had linearly independent rows, then there would be solutions for all points in $b \in R^{m}$ . Since the rows are linearly independent, the only solution to $z^{⊤} A = 0$ is $z = 0$ . Hence $z^{⊤} b = 0 \neq = 1$ . Thus there is always a solution for the $P$ .
We can also show that a vector $b$ is linearly independent from a set of vectors $a_{1}, \dots, a_{n}$ . We just put them into the matrix equation $A x = b$ . If there is no solution, $b$ is independent. But to show the system has no solutions, we need our new formula.

6.3 Orthonormal Bases and Gram-Schmidt

6.3.1 Orthonormal vectors

Vectors $q_{1}, \dots, q_{n} \in R^{m}$ are orthonormal if they are orthogonal and have norm $1$ . In other words, for all $i, j \in {1, \dots, n}$ $q_{i}^{⊤} q_{j} = δ_{ij}$ where $δ_{ij}$ is the Kronecker delta ( $δ_{ij} = 0 if i \neq = j and 1 if i = j$ ).

Thus all vectors are pairwise orthogonal $q_{i}^{⊤} q_{j} = 0$ and all of them have norm $1$ : $q^{⊤} q = ∣∣ q ∣ ∣^{2} = ∣∣ q ∣∣ = 1$ .

6.3.3 Orthogonal Matrix

A square matrix $Q \in R^{n \times n}$ is orthogonal when $Q^{⊤} Q = I$ . In this case

$Q Q^{⊤} = I$

$Q^{- 1} = Q^{⊤}$

The columns form an orthonormal basis for $R^{n}$ .

Note that when $Q$ is not square, $Q^{⊤} Q = I$ still holds, but $Q Q^{⊤} = I$ doesn’t necessarily.

Examples:

2x2 Rotation matrices are orthogonal
Permutation matrices are orthogonal

6.3.6 Orthogonal matrices preserve norm and inner product

Orthogonal matrices preserve the norm and inner product of vectors. In other words, if $Q \in R^{n \times n}$ is orthogonal, then, for all $x, y \in R^{n}$ $∣∣ Q x ∣∣ = ∣∣ x ∣∣ and (Q x)^{⊤} (Q y) = x^{⊤} y$

Proof: $(Q x)^{⊤} (Q y) = x^{⊤} Q^{⊤} Q y = x^{⊤} I y = x^{⊤} y$ . since $Q^{⊤} Q = I$ . We can use this same argument for the first equality thus $∣∣ Q x ∣ ∣^{2} = (Q x)^{⊤} (Q x) = x^{⊤} x = ∣∣ x ∣ ∣^{2}$ and note that $∣∣ Q x ∣∣ \geq 0$ and $∣∣ x ∣∣ \geq 0$ thus it suffices to show that the squares are equal.

6.3.7 Projections with Orthogonal matrices

Let $S$ be a subspace of $R^{m}$ and $q_{1}, \dots, q_{n}$ be an orthonormal basis for $S$ . Let $Q$ be the $m \times n$ matrix whose columns are the $q_{i}$ ‘s. Then the projection matrix that projects to $S$ is given by $Q Q^{⊤}$ and the least squares solution to $Q x = b$ is given by $\overset{x}{^} = Q^{⊤} b$ .

This is the case because $A^{⊤} A$ simplifies to $I$ in the case where our $A$ is orthogonal. Thus $P = A (A^{⊤} A)^{- 1} A^{⊤}$ simplifies to $P = A A^{⊤}$ .

How to get an Orthonormal Basis: Gram-Schmidt

Algorithm to normalize two vectors: (Gram-Schmidt with 2 only)

Normalise $a_{1}$ to get $q_{1} = \frac{a _{1}}{∣∣ a _{1} ∣∣}$ .
Project $a_{2}$ onto $q_{1}$ using $p ro j_{q_{1}} (a_{2}) = \frac{q _{1} q _{1}^{⊤}}{q _{1}^{⊤} q _{1}} a_{2}$ . Since $q_{1}$ is normalised (norm 1), we have $p ro j_{q_{1}} (a_{2}) = q_{1} q_{q}^{⊤} a_{2} = (a_{2}^{⊤} q_{1}) q_{1}$ .
Subtract the projection of $a_{2}$ on $q_{1}$ we just calculated from $a_{2}$ to get $q_{2}^{'} = a_{2} - (a_{2}^{⊤} q_{1}) q_{1}$ .
Normalise $q_{2}^{'}$ to get $q_{2}$ .

We project $a_{2}$ onto $q_{1}$ and then subtract that since we want to remove the part of $a_{2}$ that is in the direction of $q_{1}$ to get orthogonal vectors.

Algorithm (Gram-Schmidt):

$q_{1} = \frac{a _{1}}{∣∣ a _{1} ∣∣}$
For $k = 2, \dots, n$ set $\begin{align*} q'_k =& a_k - \sum_{i = 1}^{k - 1} (a_k^\top q_i)q_i \\ q_k =& \frac{q_k'}{||q'_k} \end{align*}$ Linearly dependent case (not in lecture) (i.e. the vectors don’t form a basis) Since in a linearly dependent set of vectors one of them is a linear combination of the previous ones, you’d get $0$ in the subtraction step for it. By excluding those $0$ ‘s you’d still get an orthonormal basis.

QR-Decomposition

For an $A$ with linearly independent columns, let $Q$ be the result of G-S. Then define $R = Q^{⊤} A$ . $R$ is upper triangular because each $q_{k}$ is orthogonal to every $a_{i}$ for $i < k$ (all after it). Note that $Q$ not necessarily square and thus not invertible.

You can see here, since $q_{2}, \dots, q_{m}$ are by construction orthogonal to $q_{1}$ thus $a_{1}$ , all entries below $1$ in the first column are $0$ . The same goes for all entries below $2$ in the second column.

$Q Q^{⊤}$ is the projection on the span of the $q_{i}$ ‘s and thus also on the $a_{i}$ ‘s ( $C (Q) = C (A)$ ). Thus $Q Q^{⊤} A = A$ and therefore $QR = Q Q^{⊤} A = A$ .

6.3.10 QR-Decomposition

Let $A$ be an $m \times n$ matrix with linearly independent columns. The QR decomposition is given by $A = QR$ where

$Q$ is an $m \times n$ matrix with orthonormal columns (they are the output of Gram-Schmidt)

$R$ is an upper triangular matrix given by $R = Q^{⊤} A$ .

6.3.11 R is upper triangular and invertible

The matrix $R$ defined in 6.3.10 is upper triangular and invertible. Moreoever, $Q Q^{⊤} A = A$ and hence $A = QR$ is well defined.

We have $N (A) = {0}$ since $A$ has independent columns and thus $N (R) = {0}$ . Thus $R \in R^{n \times n}$ (square) must be invertible.

Fact 6.3.12 The QR decomposition greatly simplifies calculations involving Projections and Least Squares:

Since $C (A) = C (Q)$ then projetions on $C (A)$ can be done with $Q$ : $p ro j_{C (A)} (b) = Q Q^{⊤} b$ .
The least squares solution to $A x = b$ denoted $\overset{x}{^}$ is defined as a solution of the normal equations $A^{⊤} A \overset{x}{^} = A^{⊤} b$ Furthermore $A^{⊤} A = (QR)^{⊤} (QR) = R^{⊤} Q^{⊤} QR = R^{⊤} R$ and thus $R^{⊤} R \overset{x}{^} = R^{⊤} Q^{⊤} b$ Since $R$ is invertible we can simplify this to $R \overset{x}{^} = Q^{⊤} b$ which can efficiently be solved by back substitution since $R$ is a triangular matrix.

6.4 Pseudoinverse

Given $A x = b$ , we can find $x$ by applying $A^{- 1}$ if $A$ is invertible: $x = A^{- 1} b$ . But if $A$ is not invertible, we want to find a matrix $A^{†}$ which accomplishes a similar job as the actual inverse: $A^{†} b$ should give us $x$ .

Prelude

$A$ is a linear transformation $A : R^{n} \to R^{m}$ with $x \to A x$ . The inverse $A^{†}$ should be a function from $A : R^{m} \to R^{n}$ with $A x \to x$ .

If $A$ is invertible, then $A$ must be square so we have a mapping from $R^{n} \to R^{n}$ : This $A$ is bijective, it perfectly reverses.

If $A \in R^{m \times n}$ with independent columns (full column-rank), we can visualise $A$ as such: The column space $C (A)$ is just part of the whole $R^{m}$ . We first have to project $b$ into $C (A)$ $p ro j_{C (A)} (b) = \hat{b}$ , before we can invert it. By inverting, we map $\hat{b}$ back to $\overset{x}{^}$ , i.e. we find an $\overset{x}{^}$ that brings us closest to $b$ , which is exactly Least Squares.

If $A \in R^{m \times n}$ with independent rows (full row-rank), we can visualise $A$ as such: There are multiple $x \in R^{n}$ that map to $b$ via $A x$ , and we have to find one to which we invert. We pick one with the smallest norm $∣∣ x ∣ ∣^{2}$ minimal. By Lemma 6.4.5 we know that the smallest such $x$ is an $x \in R (A) = C (A^{⊤})$ . Since $A$ does not have full column rank, $A$ has a non-trivial nullspace, and thus the solution space of $A x = b$ is $x_{r} + x_{n}$ . If we take the unique vector in $C (A^{⊤})$ , we basically set $x_{n}$ to $0$ , thus making sure the norm is minimal.

If $A$ has neither full column nor full row-rank, we have to solve both problems at once. We then decompose $A = CR$ (with $C$ full column-rank and $R$ full row-rank). Then $A^{†} = R^{†} C^{†}$ .

6.4.1 Definitions

6.4.1 Pseudoinverse for matrices of full column rank

For $A \in R^{m \times n}$ with $rank (A) = n$ , we define the pseudo-inverse $A^{†} \in R^{n \times m}$ as $A^{†} = (A^{⊤} A)^{- 1} A^{⊤}$

6.4.2 Left Inverse

For $A \in R^{m \times n}$ with $rank (A) = n$ , the pseudoinverse $A^{†}$ is a left inverse of $A$ , meaning $A^{†} A = I$

Proof: Since $A$ has full column rank, $A^{⊤} A$ invertible and then $A^{†} A = ((A^{⊤} A)^{- 1} A^{⊤}) A$ $= (A^{⊤} A)^{- 1} (A^{⊤} A) = I$ .

6.4.3 Pseudoinverse for matrices with full row rank

For $A \in R^{m \times n}$ with $rank (A) = m$ , we define the pseudo-inverse $A^{†} \in R^{n \times m}$ as $A^{†} = A^{⊤} (A A^{⊤})^{- 1}$

For an $A$ with full column-rank, we basically define, $A^{†}$ as the transpose of the pseudoinverse of the transpose:

6.4.4 Right inverse

For $A \in R^{m \times n}$ with $rank (A) = m$ , the pseudo-inverse $A^{†} \in R^{n \times m}$ is a right inverse of $A$ : $A A^{†} = I$

Proof Since $A^{⊤}$ has full column rank, $((A^{⊤})^{⊤} A^{⊤}) = A A^{⊤}$ is invertible: $A A^{†} = A A^{⊤} (A A^{⊤})^{- 1} = I$ .

Since for $A$ full row rank there are many possible solutions, the pseudoinverse is choosing the one with the smallest norm $∣∣ x ∣ ∣^{2}$ such that $A x = b$ . Since each solution is $x = x_{r} + x_{n}$ with $x_{r} \in R (A) = C (A^{⊤})$ and $x_{n} \in N (A)$ , the pseudoinverse chooses an $x = x_{r} + 0$ with no nullspace component to get the smallest norm. Notice the $A^{⊤}$ at the front of the definition. This means that $\overset{x}{^}$ is in the row-space: exactly what we want!

6.4.5 & 6.4.6 Unique Solution for full row rank

For any matrix $A$ and a vector $b \in C (A)$ , the unique solution to $min_{x \in R^{n}} ∣∣ x ∣ ∣^{2} s.t. A x = b$ is given by the vector $\overset{x}{^} \in C (A^{⊤})$ that satisfies $A \overset{x}{^} = b$ .

For a full row rank matrix $A$ , the unique solution is given by the vector $\overset{x}{^} = A^{†} b$ .

Proof By Lemma 6.4.5 we only need to show that $\overset{x}{^} = A^{†} b$ satisfies $A \overset{x}{^} = b$ and that $\overset{x}{^} \in C (A^{⊤})$ .

$A \overset{x}{^} = A A^{†} b = A A^{⊤} (A A^{⊤})^{- 1} b = b$
$\overset{x}{^} = A^{†} b = A^{⊤} ((A A^{⊤})^{- 1} b) = A^{⊤} y$ for some $y$ thus $x \in C (A^{⊤})$

6.4.7 Pseudoinverse for all matrices

For $A \in R^{m \times n}$ with $rank (A) = r$ and CR decomposition $A = CR$ , we define the pseudoinverse $A^{†}$ as $A^{†} = R^{†} C^{†}$

We can rewrite this as $A^{†} = R^{⊤} (R R^{⊤})^{- 1} (C^{⊤} C)^{- 1} C^{⊤} =$ $R^{⊤} (C^{⊤} CR R^{⊤})^{- 1} C^{⊤} =$ $R^{⊤} (C^{⊤} A R^{⊤})^{- 1} C^{⊤}$ .

6.4.8 For any $A$

Given $A \in R^{m \times n}$ and a vector $b \in R^{m}$ , the unique solution to $min_{x \in R^{n}} ∣∣ x ∣ ∣^{2}$ such that $A^{⊤} A x = A^{⊤} b$ is given by $\overset{x}{^} = A^{†} b$ .

6.4.9 Full Rank Factorisation

For $A \in R^{m \times n}$ with $rank (A) = r$ , let $S \in R^{m \times r}$ and $T \in R^{r \times n}$ such that $A = ST$ . Then $A^{†} = T^{†} S^{†}$

6.4.10 Pseudoinverse Conditions

$A^{†}$ is a pseudoinverse if it satisfies the following conditions:

$A A^{†} A = A$

$A^{†} A A^{†} = A^{†}$

$(A^{⊤})^{†} = (A^{†})^{⊤}$

$A A^{†}$ is symmetric (i.e. $(A A^{†})^{⊤}) = A A^{†}$ ). It is the projection matrix on $C (A)$ .

$A^{†} A$ is symmetric (i.e. $(A A^{†})^{⊤} = A A^{†}$ ). It is the projection matrix on $C (A^{⊤})$ .

Nullspace of $A^{†}$ and $A^{⊤}$ : We have $N (A^{⊤}) = N (A^{†})$ . Intuitively this holds as $A^{†} \overset{x}{^} \in C (A^{⊤})$ (we pick the minimal $\overset{x}{^}$ ). Thus anything orthogonal to the subspace $C (A)^{⊥} = N (A^{⊤})$ is projected to $0$ : $\forall x \in C (A)^{⊥}$ we have $A^{†} x = 0$ . We conclude that $N (A^{⊤}) = C (A)^{⊥} = N (A^{†})$ .

Niklas @ ETHZ

Explorer

6. Applications of Orthogonality and Projections

6.1 Least Squares Approximation

6.2 The set of all solutions to a system of linear equations

The empty solution

6.3 Orthonormal Bases and Gram-Schmidt

How to get an Orthonormal Basis: Gram-Schmidt

QR-Decomposition

6.4 Pseudoinverse

Prelude

6.4.1 Definitions

Graph View

Table of Contents

Backlinks