In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • Homework 6 is due Friday, 3/24 at 8pm

  • Class cancelled on Monday, 3/27.

    • I will release notes and a prerecorded lecture before Monday's class.
  • Weekly reading and viewing assignments

    • Boyd-Vandenberghe Chapter 12

Recap¶

Basis vectors¶

For any basis for HH, each vector in HH can be written in only one way as a linear combination of the basis vectors.

Example. In R2R2, let's look at the point x=[16]x=[16]. We will plot this point in the standard basis and also relative to a new basis:

B={[10],[12]}.B={[10],[12]}.
In [2]:
# standard basis
xmin = -6.0 
xmax = 6.0 
ymin = -2.0
ymax = 8.0

b0 = [1, 0]
b1 = [1, 2]

fig = ut.two_d_figure('Dummylabel', xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, size=(6,5))
for x in np.linspace(ymin, ymax, int(ymax-ymin+1)):
    fig.plotLinEqn(0., 1., x, alpha=0.3)
for y in np.linspace(xmin, xmax, int(xmax-xmin+1)):
    fig.plotLinEqn(1., 0., y, alpha=0.3)
fig.plotLinEqn(1., 0, 0, color = 'k')
fig.plotLinEqn(0, 1, 0, color = 'k')
fig.plotPoint(0, 0, 'k')
fig.ax.text(0+.1, 0-.1, r'$\bf 0$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(1, 0, 'r')
fig.ax.text(1+.1, 0-.1, r'${\bf e}_1$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(0, 1, 'r')
fig.ax.text(0+.1, 1-.1, r'${\bf e}_2$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(b1[0], b1[1], 'g')
fig.ax.text(b1[0]+.1, b1[1]-.1, r'${\bf b}_2$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(1, 6, 'b')
fig.ax.text(1+.1, 6-.1, r'${\bf x}$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.ax.axis('off')
fig.ax.set_title(r'Standard Basis.  $\mathbf{x}$ = (1, 6)', size = 16);
In [3]:
# B-basis
fig = ut.two_d_figure('Dummylabel', xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, size=(6,5))
m = b1[1]/b1[0]
upper_intercept = ymax - m * xmin
upper_intercept = b1[1] * np.ceil(upper_intercept / b1[1])
lower_intercept = ymin - m * xmax
lower_intercept = b1[1] * np.floor(lower_intercept / b1[1])
for yint in np.linspace(lower_intercept, upper_intercept, int((upper_intercept-lower_intercept)/b1[1])+1):
    fig.plotLinEqn(-b1[1], b1[0], yint, color = 'g', alpha=0.3)
for y in np.linspace(ymin, ymax, int(((ymax-ymin)/b1[1])+1)):
    fig.plotLinEqn(0., 1., y, color = 'g', alpha=0.3)
fig.plotLinEqn(b1[1], -b1[0], 0, color = 'k')
fig.plotLinEqn(b0[1], -b0[0], 0, color = 'k')
fig.plotPoint(0, 0, 'k')
fig.ax.text(0+.1, 0-.1, r'$\bf 0$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(1, 0, 'g')
fig.ax.text(1+.1, 0-.1, r'${\bf b}_1$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(b1[0], b1[1], 'g')
fig.ax.text(b1[0]+.1, b1[1]-.1, r'${\bf b}_2$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.plotPoint(1, 6, 'b')
fig.ax.text(1+.1, 6-.1, r'${\bf x}$', size = 12, horizontalalignment='left', verticalalignment='top')
fig.ax.axis('off')
fig.ax.set_title('B-Basis. $[\mathbf{x}]_\mathcal{B}$ = (-2, 3)', size = 16);

Notice that the location of xx relative to the origin does not change.

However, using the BB-basis, it has different coordinates. The new coordinates are [x]B=[−23][x]B=[−23].

Orthogonal Sets¶

Definition. A set of vectors {u1,…,up}{u1,…,up} in RnRn is said to be an orthogonal set if each pair of distinct vectors from the set is orthogonal, i.e.,

u⊤iuj=0wheneveri≠j.ui⊤uj=0wheneveri≠j.
In [3]:
#
fig = ut.three_d_figure((23, 2), fig_desc = 'An orthogonal set of vectors',
                        xmin = -3, xmax = 3, ymin = -3, ymax = 3, zmin = -3, zmax = 3, 
                        figsize = (12, 8), qr = qr_setting)
u1 = np.array([3, 1, 1])
u2 = np.array([-1, 2, 1])
u3 = np.array([-1/2, -2, 7/2])
origin = np.array([0, 0, 0])
fig.plotLine([origin, u1], 'r', '--')
fig.plotPoint(u1[0], u1[1], u1[2], 'r')
fig.text(u1[0]+.1, u1[1]+.1, u1[2]+.1, r'$\bf u_1$', 'u1', size=16, color='k')
fig.plotLine([origin, u2], 'r', '--')
fig.plotPoint(u2[0], u2[1], u2[2], 'r')
fig.text(u2[0]+.1, u2[1]+.1, u2[2]+.1, r'$\bf u_2$', 'u2', size=16, color='k')
fig.plotLine([origin, u3], 'r', '--')
fig.plotPoint(u3[0], u3[1], u3[2], 'r')
fig.text(u3[0]+.1, u3[1]+.1, u3[2]+.1, r'$\bf u_3$', 'u3', size=16, color = 'k')
fig.text(origin[0]-.45, origin[1]-.45, origin[2]-.45, r'$\bf 0$', 0, size = 16)
fig.plotPerpSym(origin, u1, u2, 0.5)
fig.plotPerpSym(origin, u3, u2, 0.5)
fig.plotPerpSym(origin, u3, u1, 0.5)
fig.set_title(r'An orthogonal set of vectors in $\mathbb{R}^3$', 'An orthogonal set of vectors in R3', size = 16)
fig.save();

Lecture 23: QR Decomposition¶

[This lecture is based on Prof. Crovella's CS 132 lecture notes and Fast.ai Lecture 8.]

23.1 Orthogonal Basis¶

Definition. An orthogonal basis for a subspace HH of RnRn is a basis for HH that is also an orthogonal set.

For example, consider

u1=⎡⎢⎣−1/221⎤⎥⎦,u2=⎡⎢⎣8/31/32/3⎤⎥⎦.u1=[−1/221],u2=[8/31/32/3].

Note that u⊤1u2=0.u1⊤u2=0. Hence they form an orthogonal basis for their span.

Here is the subspace W=Span{u1,u2}W=Span{u1,u2}:

In [4]:
#
fig = ut.three_d_figure((23, 3), fig_desc = 'Orthogonal Basis on the subspace H', figsize = (8, 8),
                        xmin = -2, xmax = 10, ymin = -1, ymax = 10, zmin = -1, zmax = 10, qr = qr_setting)

v = 1/2 * np.array([-1, 4, 2])
u = 1/3 * np.array([8, 1, 2])
vpos = v + 0.4 * v - 0.5 * u
upos = u - 0.5 * v + 0.15 * u
fig.text(vpos[0], vpos[1], vpos[2], r'$\bf u_2$', 'v', size=16)
fig.text(upos[0], upos[1], upos[2], r'$\bf u_1$', 'u', size=16)
# fig.text(3*u[0]+2*v[0], 3*u[1]+2*v[1], 3*u[2]+2*v[2]+1, r'$\bf 2v_1+3v_2$', '2 v1 + 3 v2', size=16)
# plotting the span of v
fig.plotSpan(u, v, 'Green')
# blue grid lines
fig.plotPoint(0, 0, 0, 'y')
fig.plotPoint(u[0], u[1], u[2], 'b')
fig.plotPoint(2*u[0], 2*u[1], 2*u[2],'b')
fig.plotPoint(3*u[0], 3*u[1], 3*u[2], 'b')
fig.plotLine([[0, 0, 0], 4*u], color='b')
fig.plotLine([v, v+4*u], color='b')
fig.plotLine([2*v, 2*v+3*u], color='b')
fig.plotLine([3*v, 3*v+2.5*u], color='b')
# red grid lines
fig.plotPoint(v[0], v[1], v[2], 'r')
fig.plotPoint(2*v[0], 2*v[1], 2*v[2], 'r')
fig.plotPoint(3*v[0], 3*v[1], 3*v[2], 'r')
fig.plotLine([[0, 0, 0], 3.5*v], color='r')
fig.plotLine([u, u+3.5*v], color='r')
fig.plotLine([2*u, 2*u+3.5*v], color='r')
fig.plotLine([3*u, 3*u+2*v], color='r')
#
# fig.plotPoint(3*u[0]+2*v[0], 3*u[1]+2*v[1], 3*u[2]+2*v[2], color='m')
# plotting the axes
#fig.plotIntersection([0,0,1,0], [0,1,0,0], color='Black')
#fig.plotIntersection([0,0,1,0], [1,0,0,0], color='Black')
#fig.plotIntersection([0,1,0,0], [1,0,0,0], color='Black')
#
fig.plotPerpSym(np.array([0, 0, 0]), v, u, 1)
fig.plotPerpSym(u, v+u, u+u, 1)
fig.plotPerpSym(2*u, v+2*u, 3*u, 1)
#
fig.plotPerpSym(np.array([0, 0, 0])+v, 2*v, v+u, 1)
fig.plotPerpSym(u+v, 2*v+u, v+2*u, 1)
#
fig.set_title(r'Orthogonal Basis on the subspace $H$', 'Orthogonal Basis on the subspace H', size=16)
fig.save()

Coordinates in an orthogonal basis¶

We have seen that for any subspace, there may be many different sets of vectors that can serve as a basis for HH.

For example, let's say we have a basis B={u1,u2,u3}.B={u1,u2,u3}.

We know that to compute the coordinates of yy in this basis, denoted [y]B[y]B, we need to solve the linear system:

c1u1+c2u2+c3u3=yc1u1+c2u2+c3u3=y

or

Uc=y.Uc=y.

In general, we'd need to perform Gaussian Elimination, or matrix inversion, or some other computationally slow method to do this.

However an orthogonal basis is a particularly nice basis, because the coordinates of any point can be computed easily and simply. Let's see how.

Theorem. Let {u1,…,up}{u1,…,up} be an orthogonal basis for a subspace HH of RnRn. For each yy in HH, the weights of the linear combination

c1u1+⋯+cpup=yc1u1+⋯+cpup=y

are given by

cj=y⊤ujuTjujj=1,…,pcj=y⊤ujujTujj=1,…,p

Proof. Let's consider the inner product of yy and one of the uu vectors --- say, u1u1.

As we saw in the last proof, the orthogonality of {u1,…,up}{u1,…,up} means that

y⊤u1=(c1u1+c1u2+⋯+cpup)⊤u1=c1(u⊤1u1).(1)(2)(1)y⊤u1=(c1u1+c1u2+⋯+cpup)⊤u1(2)=c1(u1⊤u1).

Since u⊤1u1u1⊤u1 is not zero (why?), the equation above can be solved for c1.c1.

c1=y⊤u1u⊤1u1c1=y⊤u1u1⊤u1

To find any other cj,cj, compute y⊤ujy⊤uj and solve for cjcj.

Example. The set SS that we saw earlier, i.e.,

u1=⎡⎢⎣311⎤⎥⎦,u2=⎡⎢⎣−121⎤⎥⎦,u3=⎡⎢⎣−1/2−27/2⎤⎥⎦,u1=[311],u2=[−121],u3=[−1/2−27/2],

is an orthogonal basis for R3.R3.

Let's express the vector y=⎡⎢⎣61−8⎤⎥⎦y=[61−8] as a linear combination of the vectors in SS.

That is, find yy's coordinates in the basis SS --- i.e., in the coordinate system SS.

In [5]:
#
fig = ut.three_d_figure((23, 4), fig_desc = 'y in an orthogonal basis',
                        xmin = -3, xmax = 7, ymin = -5, ymax = 5, zmin = -8, zmax = 4, 
                        figsize = (12, 8), qr = qr_setting, equalAxes = False)
u1 = np.array([3, 1, 1])
u2 = np.array([-1, 2, 1])
u3 = np.array([-1/2, -2, 7/2])
origin = np.array([0, 0, 0])
#
fig.plotLine([origin, u1], 'r', '--')
fig.plotPoint(u1[0], u1[1], u1[2], 'r')
fig.text(u1[0]+.1, u1[1]+.1, u1[2]+.1, r'$\bf u_1$', 'u1', size=16, color='k')
#
fig.plotLine([origin, u2], 'r', '--')
fig.plotPoint(u2[0], u2[1], u2[2], 'r')
fig.text(u2[0]+.1, u2[1]+.1, u2[2]+.1, r'$\bf u_2$', 'u2', size=16, color='k')
#
fig.plotLine([origin, u3], 'r', '--')
fig.plotPoint(u3[0], u3[1], u3[2], 'r')
fig.text(u3[0]+.1, u3[1]+.1, u3[2]+.1, r'$\bf u_3$', 'u3', size=16, color = 'k')
#
fig.text(origin[0]-.45, origin[1]-.45, origin[2]-.45, r'$\bf 0$', 0, size = 16)
#
fig.plotPerpSym(origin, u1, u2, 0.5)
fig.plotPerpSym(origin, u3, u2, 0.5)
fig.plotPerpSym(origin, u3, u1, 0.5)
#
y = u1 - 2 * u2 - 2 * u3
# print(y)
fig.plotPoint(y[0], y[1], y[2], 'b')
fig.text(y[0]-2, y[1]+.1, y[2]+.1, r'$\bf y$ = (6, 1, -8)', 'y = (6, 1, -8)', size=16, color = 'b')
fig.text(y[0]-2, y[1]+.1, y[2]-2.5, r'${\bf y} = 1{\bf u}_1 -2 {\bf u}_2 -2 {\bf u}_3$', 'y = (6, 1, -8)', size=16, color = 'b')
#
fig.set_title(r'${\bf y}$ in an Orthogonal Basis', 'y in an Orthogonal Basis', size = 16)
fig.save();

Solution. Compute

y⊤u1=11,y⊤u2=−12,y⊤u3=−33,y⊤u1=11,y⊤u2=−12,y⊤u3=−33,u⊤1u1=11,u⊤2u2=6,u⊤3u3=33/2u1⊤u1=11,u2⊤u2=6,u3⊤u3=33/2

So

y=y⊤u1u⊤1u1u1+y⊤u2u⊤2u2u2+y⊤u3u⊤3u3u3y=y⊤u1u1⊤u1u1+y⊤u2u2⊤u2u2+y⊤u3u3⊤u3u3
=1111u1+−126u2+−3333/2u3=1111u1+−126u2+−3333/2u3
=u1−2u2−2u3.=u1−2u2−2u3.

As a result, [y]B=⎡⎢⎣1−2−2⎤⎥⎦[y]B=[1−2−2].

Note how much simpler it is finding the coordinates of yy in the orthogonal basis because:

  • There are no matrix operations involved, so the overall computational work is quadratic rather than cubic (nn inner products, each of which take time 2n2n).
  • Each coefficient cici can be found separately, so with enough computers available you can perform the calculation in linear time.

23.2 Orthogonal Projection¶

Now let's turn to the notion of projection. In general, a projection happens when we decompose a vector into the sum of other vectors.

Here is the central idea. Suppose we start with two nonzero vectors yy and uu in RnRn.

In [38]:
#
ax = ut.plotSetup(-1,10,-1,4,(9,6))
ut.centerAxes(ax)
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([0,2*pt[0]],[0,0],'r-',lw=3)
ut.plotVec(ax,pt,'b')
u = np.array([pt[0],0])
ut.plotVec(ax,2*u)
ax.text(2*pt[0],-0.75,r'$\mathbf{u}$',size=20)
ax.text(pt[0]+0.1,pt[1]+0.2,r'$\mathbf{y}$',size=20);

Our goal is to "decompose" yy into the sum of two vectors y=^y+zy=y^+z such that

  • ^yy^ is a multiple of uu; that is, ^y=αuy^=αu for some scalar αα. We call this the orthogonal projection of yy onto uu.

  • zz is orthogonal to uu. We call this the component of yy orthogonal to u.u.

where and zz is some vector orthogonal to u.u.

In [3]:
#
ax = ut.plotSetup(-1,10,-1,4,(9,6))
ut.centerAxes(ax)
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([pt[0],pt[0]],[0,pt[1]],'b--',lw=2)
plt.plot([0,pt[0]],[0,0],'r-',lw=3)
plt.plot([0,0],[0,pt[1]],'g-',lw=3)
ut.plotVec(ax,pt,'b')
u = np.array([pt[0],0])
v = [0,pt[1]]
ut.plotVec(ax,u)
ut.plotVec(ax,2*u)
ut.plotVec(ax,v,'g')
ax.text(pt[0],-0.75,r'${\bf \hat{y}}=\alpha{\bf u}$',size=20)
ax.text(2*pt[0],-0.75,r'$\mathbf{u}$',size=20)
ax.text(pt[0]+0.1,pt[1]+0.2,r'$\mathbf{y}$',size=20)
ax.text(0+0.1,pt[1]+0.2,r'$\mathbf{z}$',size=20)
ax.text(0+0.1,pt[1]+0.8,r'Component of $\mathbf{y}$ orthogonal to $\mathbf{u}$',size=16)
ax.text(pt[0],-1.25,r'Orthogonal projection of $\mathbf{y}$ onto to $\mathbf{u}$',size=16);
perpline1, perpline2 = ut.perp_sym(np.array([0,0]), u, v, 0.5)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1);
# ax.text(0+0.1,pt[1]+0.2,r'$\mathbf{z = y -\hat{y}}$',size=20);

To solve this, assume that we have some αα, and with it we compute y−αu=y−^y=z.y−αu=y−y^=z. We want zz to be orthogonal to u.u.

Now z=y−αuz=y−αu is orthogonal to uu if and only if

0=(y−αu)⊤u=y⊤u−(αu)⊤u=y⊤u−α(u⊤u)(3)(4)(5)(3)0=(y−αu)⊤u(4)=y⊤u−(αu)⊤u(5)=y⊤u−α(u⊤u)

Hence, the solution in which zz is orthogonal to uu happens if and only if

α=y⊤uu⊤uα=y⊤uu⊤u

and since ^y=αuy^=αu,

^y=y⊤uu⊤uu.y^=y⊤uu⊤uu.

Projections onto Subspaces¶

In [5]:
#
ax = ut.plotSetup(-1,10,-1,4,(9,6))
ut.centerAxes(ax)
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([pt[0],pt[0]],[0,pt[1]],'b--',lw=2)
plt.plot([0,pt[0]],[0,0],'r-',lw=3)
plt.plot([0,0],[0,pt[1]],'g-',lw=3)
ut.plotVec(ax,pt,'b')
u = np.array([pt[0],0])
v = [0,pt[1]]
ut.plotVec(ax,u)
ut.plotVec(ax,2*u)
ut.plotVec(ax,v,'g')
ax.text(pt[0],-0.75,r'${\bf \hat{y}}=\alpha{\bf u}$',size=20)
ax.text(2*pt[0],-0.75,r'$\mathbf{u}$',size=20)
ax.text(pt[0]+0.1,pt[1]+0.2,r'$\mathbf{y}$',size=20)
ax.text(0+0.1,pt[1]+0.2,r'$\mathbf{z}$',size=20)
ax.text(0+0.1,pt[1]+0.8,r'Component of $\mathbf{y}$ orthogonal to $\mathbf{u}$',size=16)
ax.text(pt[0],-1.25,r'Orthogonal projection of $\mathbf{y}$ onto to $\mathbf{u}$',size=16);
perpline1, perpline2 = ut.perp_sym(np.array([0,0]), u, v, 0.5)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1);

Now, note that if we had scaled uu by any amount (ie, moved it to the right or the left), we would not have changed the location of ^y.y^.

This can be seen as well by replacing uu with cucu and recomputing ^yy^:

^y=yTcucuTcucu=yTuuTuu.y^=yTcucuTcucu=yTuuTuu.

Thus, the projection of yy is determined by the subspace LL that is spanned by uu -- in other words, the line through uu and the origin.

Hence sometimes ^yy^ is denoted by projLyprojLy and is called the orthogonal projection of yy onto LL.

Specifically:

^y=projLy=yTuuTuuy^=projLy=yTuuTuu

Example. Let y=[76]y=[76] and u=[42].u=[42].

Find the orthogonal projection of yy onto u.u. Then write yy as the sum of two orthogonal vectors, one in Span{u}{u}, and one orthogonal to u.u.

In [13]:
#
ax = ut.plotSetup(-3,11,-1,7,(8,6))
ut.centerAxes(ax)
plt.axis('equal')
u = np.array([4.,2])
y = np.array([7.,6])
yhat = (y.T.dot(u)/u.T.dot(u))*u
z = y-yhat
ut.plotLinEqn(1.,-2.,0.)
ut.plotVec(ax,u)
ut.plotVec(ax,z)
ut.plotVec(ax,y)
ut.plotVec(ax,yhat)
ax.text(u[0]+0.3,u[1]-0.5,r'$\mathbf{u}$',size=20)
ax.text(yhat[0]+0.3,yhat[1]-0.5,r'$\mathbf{\hat{y}}$',size=20)
ax.text(y[0],y[1]+0.8,r'$\mathbf{y}$',size=20)
ax.text(z[0]-2,z[1],r'$\mathbf{y - \hat{y}}$',size=20)
ax.text(10,4.5,r'$L = $Span$\{\mathbf{u}\}$',size=20)
perpline1, perpline2 = ut.perp_sym(yhat, y, np.array([0,0]), 0.75)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1)
ax.plot([y[0],yhat[0]],[y[1],yhat[1]],'b--')
ax.plot([0,y[0]],[0,y[1]],'b-')
ax.plot([0,z[0]],[0,z[1]],'b-');

Solution. Compute

y⊤u=[76][42]=40y⊤u=[76][42]=40u⊤u=[42][42]=20u⊤u=[42][42]=20

The orthogonal projection of yy onto uu is

^y=y⊤uu⊤uuy^=y⊤uu⊤uu=4020u=2[42]=[84]=4020u=2[42]=[84]

And the component of yy orthogonal to uu is

y−^y=[76]−[84]=[−12].y−y^=[76]−[84]=[−12].

So

y=^y+z[76]=[84]+[−12]y=y^+z[76]=[84]+[−12]

Two Important Properties of ^yy^¶

  1. The closest point.

    Recall from geometry that given a line and a point PP, the closest point on the line to PP is given by the perpendicular from PP to the line.

    So this gives an important interpretation of ^yy^: it is the closest point to yy in the subspace LL.

In [13]:
#
ax = ut.plotSetup(-3,11,-1,7,(8,6))
ut.centerAxes(ax)
plt.axis('equal')
u = np.array([4.,2])
y = np.array([7.,6])
yhat = (y.T.dot(u)/u.T.dot(u))*u
z = y-yhat
ut.plotLinEqn(1.,-2.,0.)
ut.plotVec(ax,u)
ut.plotVec(ax,z)
ut.plotVec(ax,y)
ut.plotVec(ax,yhat)
ax.text(u[0]+0.3,u[1]-0.5,r'$\mathbf{u}$',size=20)
ax.text(yhat[0]+0.3,yhat[1]-0.5,r'$\mathbf{\hat{y}}$',size=20)
ax.text(y[0],y[1]+0.8,r'$\mathbf{y}$',size=20)
ax.text(z[0]-2,z[1],r'$\mathbf{y - \hat{y}}$',size=20)
ax.text(10,4.5,r'$L = $Span$\{\mathbf{u}\}$',size=20)
perpline1, perpline2 = ut.perp_sym(yhat, y, np.array([0,0]), 0.75)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1)
ax.plot([y[0],yhat[0]],[y[1],yhat[1]],'b--')
ax.plot([0,y[0]],[0,y[1]],'b-')
ax.plot([0,z[0]],[0,z[1]],'b-');
  1. The distance from yy to LL

    The distance from yy to LL is the length of the perpendicular from yy to its orthogonal projection on LL, namely ^yy^.

    This distance equals the length of y−^yy−y^, or in other words the norm of zz.

    In the previous example, the distance is

∥y−^y∥=∥z∥=√(−1)2+22=√5.‖y−y^‖=‖z‖=(−1)2+22=5.

Projections find Coordinates in an Orthogonal Basis¶

Let's combine two observations that we have made today.

  1. To decompose a vector yy into a linear combination of vectors {u1,…,up}{u1,…,up} in a orthogonal basis, we compute

    y=c1u1+⋯+cpup, wherecj=y⊤uju⊤juj(6)(7)(6)y=c1u1+⋯+cpup, where(7)cj=y⊤ujuj⊤uj

  2. The projection of yy onto the subspace spanned by any uu is

    projLy=y⊤uu⊤uu.projLy=y⊤uu⊤uu.

So a decomposition like y=c1u1+⋯+cpupy=c1u1+⋯+cpup is really decomposing yy into a sum of orthogonal projections onto one-dimensional subspaces.

In [5]:
#
ax = ut.plotSetup(-1,10,-1,4,(9,6))
ut.centerAxes(ax)
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([pt[0],pt[0]],[0,pt[1]],'b--',lw=2)
plt.plot([0,pt[0]],[0,0],'r-',lw=3)
plt.plot([0,0],[0,pt[1]],'g-',lw=3)
ut.plotVec(ax,pt,'b')
u = np.array([pt[0],0])
v = [0,pt[1]]
ut.plotVec(ax,u)
ut.plotVec(ax,2*u)
ut.plotVec(ax,v,'g')
ax.text(pt[0],-0.75,r'${\bf \hat{y}}=\alpha{\bf u}$',size=20)
ax.text(2*pt[0],-0.75,r'$\mathbf{u}$',size=20)
ax.text(pt[0]+0.1,pt[1]+0.2,r'$\mathbf{y}$',size=20)
ax.text(0+0.1,pt[1]+0.2,r'$\mathbf{z}$',size=20)
ax.text(0+0.1,pt[1]+0.8,r'Component of $\mathbf{y}$ orthogonal to $\mathbf{u}$',size=16)
ax.text(pt[0],-1.25,r'Orthogonal projection of $\mathbf{y}$ onto to $\mathbf{u}$',size=16);
perpline1, perpline2 = ut.perp_sym(np.array([0,0]), u, v, 0.5)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1);

For example, let's take the case where y∈R2.y∈R2.

Let's say we are given u1,u2u1,u2 such that u1u1 is orthogonal to u2u2, and so together they span R2.R2.

Then yy can be written in the form

y=y⊤u1u⊤1u1u1projection of y onto the subspace spanned by u1+y⊤u2u⊤2u2u2projection of y onto the subspace spanned by u2y=y⊤u1u1⊤u1u1⏟projection of y onto the subspace spanned by u1+y⊤u2u2⊤u2u2⏟projection of y onto the subspace spanned by u2

So this equation expresses yy as the sum of its projections onto the (orthogonal) axes determined by u1u1 and u2u2.

This is an useful way of thinking about coordinates in an orthogonal basis:

coordinates are projections onto the axes!

(Even if they are not the standard axes from the standard basis vectors.)

In [5]:
# Source: Lay 4th Edition, Chapter 6.2, Figure 4
display(Image("images/10-proj.jpg", width=600))

23.3 QR Decomposition¶

Given a subspace HH, remember that there are many possible basis sets that can be used to represent HH. In this lecture, we have seen that an orthogonal basis is particularly nice to work with.

So, let's use the concept of projection to transform any basis into an orthogonal one.

Gram-Schmidt Orthogonalization¶

The Gram-Schmidt process is an algorithm that transforms a basis {v1,v2,…,vn}{v1,v2,…,vn} for subspace HH into an orthogonal basis {u1,u2,…,un}{u1,u2,…,un} for the same subspace HH.

It operates as follows.

u1=v1u2=v2−proju1(v2)u3=v3−proju1(v3)−proju2(v3)⋮un=vn−n−1∑i=1projui(vn)(8)(9)(10)(11)(12)(8)u1=v1(9)u2=v2−proju1(v2)(10)u3=v3−proju1(v3)−proju2(v3)(11)⋮(12)un=vn−∑i=1n−1projui(vn)

In other words, we can find an orthogonal basis by iteratively removing the projections of all previous vectors, in order to keep only the component of each vector that is orthogonal to all vectors before it.

Here is another way to think about the Gram-Schmidt process.

  • Set u1=v1u1=v1.
  • Overwrite each of v2,v3,…,vnv2,v3,…,vn with the component of those vectors that are orthogonal to u1u1.
  • At this point, v2v2 is now orthogonal to the subspace spanned by u1u1. So, set u2=v2u2=v2.
  • Overwrite each of v3,v4,…,vnv3,v4,…,vn with the component of those vectors that are orthogonal to u2u2.
  • At this point, v3v3 is now orthogonal to the subspace spanned by u1u1 and u2u2. So, set u3=v3u3=v3.
  • Overwrite each of v4,v5,…,vnv4,v5,…,vn with the component of those vectors that are orthogonal to u3u3.
  • ……

Theorem. If we apply Gram-Schmidt starting from a basis {v1,v2,…,vn}{v1,v2,…,vn} for subspace HH, then the resulting set B={u1,u2,…,un}B={u1,u2,…,un} satisfies the following properties:

  1. The vectors in BB are orthogonal.
  2. BB spans the same subspace HH, and hence is a basis for HH.

Proof. The proof of orthogonality involves applying the inner product to the equations for the uiui vectors above; I recommend that you try writing it out yourself. Let's focus on the second property, and we will prove it by induction.

First, it is clear that u1∈span({v1})u1∈span({v1}).

Next, let's consider u2u2. Let's consider how this vector relates to the smaller subspace H2=span({v1,v2})H2=span({v1,v2}).

  • u2∈span({v1,v2})u2∈span({v1,v2}) because we wrote u2u2 as a linear combination of the vectors v1v1 and v2v2.

  • Moreover, u2u2 is not contained in the span of u1u1 (indeed, we removed that very component of v2v2), so u1u1 and u2u2 are linearly independent.

  • Hence, {u1,u2}{u1,u2} is a minimal spanning set for H2H2, and therefore is a basis of H2H2.

Both of these statements generalize to hold for each vector uiui:

  • uiui is in the span of span({v1,v2,…,vi})span({v1,v2,…,vi}) because we wrote it as a linear combination of v1,v2,…,viv1,v2,…,vi.
  • Moreover, uiui is not linearly dependent on u1,u2,…,ui−1u1,u2,…,ui−1; in fact, uiui is orthogonal to all of them.
  • As a result, span({u1,u2,…,ui})=span({v1,v2,…,vi})span({u1,u2,…,ui})=span({v1,v2,…,vi})

QR Decomposition¶

This procedure leads to a powerful new form of matrix factorization!

Let's first recall what matrix factorization (or decomposition) is. The objective is to write a single matrix A=BCA=BC as the product of two or more "simpler" matrices.

So far we have seen one kind of matrix factorization, the LU decomposition that decomposes an m×nm×n matrix AA into the factors A=LUA=LU where:

  • LL is an m×mm×m unit upper triangular matrix,
  • UU is an m×nm×n matrix in row echelon form (so in particular it is lower triangular).

Question. How did we compute the LU factorization?

Answer. Perform Gaussian elimination to convert AA to UU. Keep track of the elementary row operations performed along the way, and use them to construct LL.

In a similar fashion, it turns out that if we apply Gram-Schmidt to the column vectors of a matrix and keep track of the operations performed along the way, it gives us a new method to factor matrices!

Definition. Let AA be an m×nm×n matrix (not necessarily square) with linearly independent columns. Then, the QR decomposition of AA is a factorization A=QRA=QR where:

  • QQ is an m×nm×n matrix whose column vectors are orthogonal,
  • RR is an upper triangular n×nn×n matrix.

Rather than showing the procedure for "keeping track of the operations performed along the way," let me just skip ahead to show you some code that computes the QR decomposition of a matrix based on the Gram-Schmidt algorithm.

In [1]:
# source: Fast.ai lecture 8 from https://nbviewer.org/github/fastai/numerical-linear-algebra/blob/master/nbs/8.%20Implementing%20QR%20Factorization.ipynb
def GramSchmidt(A):
    m, n = A.shape
    Q = np.zeros([m,n], dtype=np.float64)
    R = np.zeros([n,n], dtype=np.float64)
    for j in range(n):
        v = A[:,j]                          # take each column vector in A
        for i in range(j):
            R[i,j] = np.dot(Q[:,i], A[:,j])
            v = v - (R[i,j] * Q[:,i])       # subtract the projection from previous vectors
        R[j,j] = np.linalg.norm(v)
        Q[:, j] = v / R[j,j]                # insert as a column of Q
    return Q, R
In [3]:
import numpy as np
A = np.matrix([[0, 1, 1],
              [1, 1, 1],
              [1, 2, 1]])
np.linalg.qr(A)
Out[3]:
(matrix([[ 0.        ,  0.81649658, -0.57735027],
         [-0.70710678, -0.40824829, -0.57735027],
         [-0.70710678,  0.40824829,  0.57735027]]),
 matrix([[-1.41421356, -2.12132034, -1.41421356],
         [ 0.        ,  1.22474487,  0.81649658],
         [ 0.        ,  0.        , -0.57735027]]))

Let's use this code to factor the matrix

A=⎡⎢ ⎢ ⎢⎣3−7−22−35106−40−5−95−512⎤⎥ ⎥ ⎥⎦A=[3−7−22−35106−40−5−95−512]

for which we computed the LU factorization a last week.

In [3]:
A = np.array([[3,-7,-2,2],
             [-3,5,1,0],
             [6,-4,0,-5],
             [-9,5,-5,12]])
Q, R = GramSchmidt(A)
print("Q =",Q)
print("R =",R)
Q = [[ 0.25819889 -0.80829038  0.11547005  0.51639778]
 [-0.25819889  0.46188022  0.01649572  0.84836778]
 [ 0.51639778  0.11547005 -0.84128182  0.11065667]
 [-0.77459667 -0.34641016 -0.5278631  -0.03688556]]
R = [[ 11.61895004  -9.03696114   3.09838668 -11.36075115]
 [  0.           5.77350269   3.81051178  -6.35085296]
 [  0.           0.           2.42487113  -1.89700803]
 [  0.           0.           0.           0.03688556]]

QR matrix factorization is a valuable tool when performing data science.

"One algorithm in numerical linear algebra is more important than all the others: QR factorization." -Trefethen, page 48

For this reason, the scipy package includes a built-in command to calculate it. (Note that Gram-Schmidt is just one way to compute a QR decomposition; there are others. Also, some QR decompositions factor AA into an m×mm×m matrix QQ followed by an m×nm×n matrix RR instead.)

In [9]:
import scipy
Qnew, Rnew = scipy.linalg.qr(A)
print(Qnew)
print(Rnew)
[[-0.25819889  0.80829038  0.11547005  0.51639778]
 [ 0.25819889 -0.46188022  0.01649572  0.84836778]
 [-0.51639778 -0.11547005 -0.84128182  0.11065667]
 [ 0.77459667  0.34641016 -0.5278631  -0.03688556]]
[[-11.61895004   9.03696114  -3.09838668  11.36075115]
 [  0.          -5.77350269  -3.81051178   6.35085296]
 [  0.           0.           2.42487113  -1.89700803]
 [  0.           0.           0.           0.03688556]]

We can use Numpy to check that columns of QQ are orthogonal.

Also, both of these code samples actually perform one final step of Gram-Schmidt: after calculating the vectors u1,u2,…,unu1,u2,…,un, the code normalizes each column vector to be of length 1.

In [18]:
for i in range(Q.shape[1]):
    for j in range(i,Q.shape[1]):
        print("the inner product between columns", i, "and", j,"is", Q[:,i].dot(Q[:,j]))

# remember that Python's rows and columns are 0-indexed!
the inner product between columns 0 and 0 is 1.0
the inner product between columns 0 and 1 is 4.440892098500626e-16
the inner product between columns 0 and 2 is -9.43689570931383e-16
the inner product between columns 0 and 3 is 6.752931547282515e-14
the inner product between columns 1 and 1 is 1.0
the inner product between columns 1 and 2 is -6.106226635438361e-16
the inner product between columns 1 and 3 is 9.314771176605063e-14
the inner product between columns 2 and 2 is 0.9999999999999999
the inner product between columns 2 and 3 is -4.0610570462007445e-13
the inner product between columns 3 and 3 is 1.0000000000000002

Another way to perform this same check is to compute Q⊤QQ⊤Q! (For easier reading, this time let's also round to the nearest integer.)

In [21]:
print((Q.T @ Q).round(0))
[[ 1.  0. -0.  0.]
 [ 0.  1. -0.  0.]
 [-0. -0.  1. -0.]
 [ 0.  0. -0.  1.]]

This is a very cool property! It means we can take the inverse of QQ very efficiently, without using the typical (slow) matrix inverse algorithm. Instead, we only need to take the matrix transpose. Let's make this idea more formal.

23.4 Orthonormal Sets¶

Orthogonal sets are therefore very useful. However, they become even more useful if we normalize the vectors in the set.

Definition. A set {u1,…,up}{u1,…,up} is an orthonormal set if it is an orthogonal set of unit vectors (that is: vectors of norm 1).

If WW is the subspace spanned by such as a set, then {u1,…,up}{u1,…,up} is an orthonormal basis for WW since the set is automatically linearly independent.

Example. The simplest example of an orthonormal set is the standard basis {e1,…,en}{e1,…,en} for RnRn.

Keep the terms clear in your head:

  • orthogonal is (just) perpendicular, while
  • orthonormal is perpendicular and unit length.

(You can see the word "normalized" inside "orthonormal").

Orthonormal Matrices¶

Matrices with orthonormal columns are particularly important.

Theorem. An m×nm×n matrix UU has orthonormal columns if and only if UTU=IUTU=I (where II denotes the n×nn×n identity matrix).

Proof. Let U=[u1un⋯un]U=[u1un⋯un] denote the column vectors of UU. Then:

U⊤U=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣⋯⋯u⊤1⋯⋯⋯⋯u⊤2⋯⋯⋮⋯⋯u⊤n⋯⋯⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⋮⋮u1⋮⋮⋮⋮u2⋮⋮…⋮⋮un⋮⋮⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣u⊤1u1u⊤1u2⋯u⊤1unu⊤2u1u⊤2u2⋯u⊤2un⋮⋮⋱⋮u⊤nu1u⊤nu2⋯u⊤nun⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦(13)(14)(13)U⊤U=[⋯⋯u1⊤⋯⋯⋯⋯u2⊤⋯⋯⋮⋯⋯un⊤⋯⋯][⋮⋮u1⋮⋮⋮⋮u2⋮⋮…⋮⋮un⋮⋮](14)=[u1⊤u1u1⊤u2⋯u1⊤unu2⊤u1u2⊤u2⋯u2⊤un⋮⋮⋱⋮un⊤u1un⊤u2⋯un⊤un]

Every diagonal entry of this matrix equals 1 because each uiui is a unit vector so u⊤iui=1ui⊤ui=1.

Every non-diagonal entry of this matrix equals 0 because each pair of vectors uiui and ujuj are orthogonal, so u⊤iuj=0ui⊤uj=0.

So U⊤U=I.U⊤U=I.

Orthonormal Tranformations¶

Theorem. Let UU by an m×nm×n matrix with orthonormal columns, and let xx and yy be in Rn.Rn. Then:

  1. ∥Ux∥=∥x∥.‖Ux‖=‖x‖.
  2. (Ux)⊤(Uy)=x⊤y.(Ux)⊤(Uy)=x⊤y.
  3. (Ux)⊤(Uy)=0if and only ifx⊤y=0.(Ux)⊤(Uy)=0if and only ifx⊤y=0.

We can combine Properties 1 and 3 to make an important statement.

For orthonormal UU, the linear transformation x↦Uxx↦Ux preserves lengths and orthogonality.

So, viewed as a linear operator, an orthonormal matrix is very special: the lengths of vectors, and therefore the distances between points is not changed by the action of UU.

Notice as well that UU is m×nm×n -- it may not be square.

So it may map vectors from one vector space to an entirely different vector space -- but the distances between points will not be changed!

In [6]:
#
display(Image("images/10-ortho1.png", width=600))

... and the orthogonality of vectors will not be changed!

In [8]:
#
display(Image("images/10-ortho2.png", width=600))

Note however that we cannot in general construct an orthonormal map from a higher dimension to a lower one.

For example, three orthogonal vectors in R3R3 cannot be mapped to three orthogonal vectors in R2R2. Can you see why this is impossible? What is it about the definition of an orthonormal set that prevents this?

Example. Let U=⎡⎢ ⎢⎣1/√22/31/√2−2/301/3⎤⎥ ⎥⎦U=[1/22/31/2−2/301/3] and x=[√23].x=[23]. Notice that UU has orthonormal columns, and

UTU=[1/√21/√202/3−2/31/3]⎡⎢ ⎢⎣1/√22/31/√2−2/301/3⎤⎥ ⎥⎦=[1001].UTU=[1/21/202/3−2/31/3][1/22/31/2−2/301/3]=[1001].

Let's verify that ∥Ux∥=∥x∥.‖Ux‖=‖x‖.

Ux=⎡⎢ ⎢⎣1/√22/31/√2−2/301/3⎤⎥ ⎥⎦[√23]=⎡⎢⎣3−11⎤⎥⎦Ux=[1/22/31/2−2/301/3][23]=[3−11]
∥Ux∥=√9+1+1=√11.‖Ux‖=9+1+1=11.
∥x∥=√2+9=√11.‖x‖=2+9=11.

When Orthonormal Matrices are Square¶

One of the most useful transformation matrices is obtained when the columns of the matrix are orthonormal...

... and the matrix is square.

These matrices map vectors in RnRn to new locations in the same space, ie, RnRn.

... in a way that preserves lengths, distances and orthogonality.

Now, consider the case when UU is square, and has orthonormal columns.

Then the fact that UTU=IUTU=I implies that U−1=UT.U−1=UT.

Then UU is called an orthogonal matrix.

A good example of an orthogonal matrix is a rotation matrix:

R=[cosθ−sinθsinθcosθ].R=[cos⁡θ−sin⁡θsin⁡θcos⁡θ].

Using trigonometric identities, you should be able to convince yourself that

RTR=IRTR=I

and hopefully you can visualize how RR preserves lengths and orthogonality.

In [ ]: