In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • Homework 5 due Friday 3/17

  • Office hours

    • Today: Prof McDonald from 4:30-6pm in CCDS 1341
    • Tomorrow: Peer tutor Rohan Anand from 1:30-3pm in CCDS 16th floor
  • Weekly reading and viewing assignments

    • Aggarwal sections 2.6-2.7
    • 3Blue1Brown video 7 and video 8

Lecture 20: LU Decomposition + Subspaces¶

A=⎡⎢ ⎢ ⎢⎣1000∗100∗∗10∗∗∗1⎤⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢⎣■∗∗∗∗0■∗∗∗000■∗00000⎤⎥ ⎥ ⎥⎦LUA=[1000∗100∗∗10∗∗∗1][◼∗∗∗∗0◼∗∗∗000◼∗00000]LU

[This lecture is based on lecture notes from Prof. Crovella's CS 132 and the fast.ai numerical linear algebra course.]

Recap from last lecture¶

We calculated the computational cost of matrix operations in terms of floating point operations (flops)

  • Matrix inversion of an n×nn×n square matrix takes 2n32n3 operations -- this is three times the cost of Gaussian elimination.
  • Matrix multiplication is faster and nicely parallelizable.

We started the study of matrix factorizations

  • A factorization of a matrix AA is an equation that expresses AA as a product of two or more matrices A=BCA=BC. Typically we want to factor a matrix AA into "simpler" matrices that make it easier to work with AA or expose important properties of AA.
  • The LU factorization problem states that: given matrices AA and BB, find a matrix XX such that AX=BAX=B. If XX has pp columns, then this requires solving pp different linear systems that have the same AA matrix.

We looked at Gaussian elimination in a new way

  • We saw that every elementary row operation on AA is a linear transformation, and therefore can be performed by multiplying AA by a suitable matrix.
  • A simple way to compute an elementary matrix EE is to apply the row operation starting from the identity matrix II.

Elementary matrices¶

Example: find the matrix EE that implements the operation 'add -4 times row 1 to row 3'

We can perform this transformation starting from II: ⎡⎢⎣100010001⎤⎥⎦→⎡⎢⎣100010−401⎤⎥⎦[100010001]→[100010−401]

We introduced our first factorization: A=LUA=LU where

  • UU is in row echelon form, and it has the same m×nm×n shape as AA. This is an upper triangular matrix (hence its name UU).
  • LL is a lower triangular square matrix of size m×mm×m, and it has 1s on the diagonal. This is called a unit lower triangular matrix (hence its name LL).
A=⎡⎢ ⎢ ⎢⎣1000∗100∗∗10∗∗∗1⎤⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢⎣■∗∗∗∗0■∗∗∗000■∗00000⎤⎥ ⎥ ⎥⎦LUA=[1000∗100∗∗10∗∗∗1][◼∗∗∗∗0◼∗∗∗000◼∗00000]LU

The fact that UU is in row echelon form may suggest to you (correctly!) that we could get it from AA by a sequence of row operations.

For now, let us suppose that:

  • We never need to interchange (or swap) two rows. Let's only consider the other two elementary operations.
  • The row reductions that convert AA to UU only add a multiple of one row to another row below it.

Now, if you consider an elementary matrix that implements such a row reduction, you will see that it will have 1s on the diagonal, and an additional entry somewhere below the diagonal.

For example, recall the scaling matrix E2E2 and addition matrix E3E3 from yesterday:

E2=⎡⎢⎣100010005⎤⎥⎦,E2A=⎡⎢⎣abcdef5g5h5i⎤⎥⎦,E3=⎡⎢⎣100010−401⎤⎥⎦,E3A=⎡⎢⎣abcdefg−4ah−4bi−4c⎤⎥⎦.E2=[100010005],E2A=[abcdef5g5h5i],E3=[100010−401],E3A=[abcdefg−4ah−4bi−4c].

These elementary matrices are both unit lower triangular matrices! (We are ignoring row interchanges for now because they are not lower triangular. We'll deal with row interchanges later.)

So if there is a sequence of elementary row operations that convert AA to UU, then there is a set of unit lower triangular elementary matrices E1,…,EpE1,…,Ep such that

Ep⋯E1A=U.Ep⋯E1A=U.

We know that elementary matrices are invertible, and the product of invertible matrices is invertible, so:

A=(Ep⋯E1)−1U=LUA=(Ep⋯E1)−1U=LU

where L=(Ep⋯E1)−1L=(Ep⋯E1)−1 = E−11⋯E−1pE1−1⋯Ep−1. Remember: the inverse of a product equals the product of inverses, except in the opposite order.

Fact. The product of unit lower triangular matrices is unit lower triangular. Additionally, the inverse of a unit lower triangular matrix is unit lower triangular.

(Think about how to prove this statement on your own.)

So we can conclude that L,L, as constructed from (Ep⋯E1)−1(Ep⋯E1)−1, is unit lower triangular.

Hence, we have defined the LU decomposition based on Gaussian Elimination.

We have rewritten Gaussian Elimination as:

U=L−1AU=L−1A

and shown that the LL so defined is unit lower triangular.

Let's take stock of what this all means: the LU decomposition is a way of capturing the application of Gaussian Elimination to AA.

It incorporates both the process of performing Gaussian Elimination, and the result:

  • UU is the row echelon form of AA.
  • L−1L−1 captures the row reductions that transform AA to row echelon form.
  • LL is the inverse of L−1L−1.

Finding LL¶

Recall that the motivation for developing the LU decomposition is that it is more efficient than matrix inversion. So we don't want to have to invert L−1L−1 in the standard way in order to find LL.

Here we have some good news:

  • Inverting each elementary row operation is simple, in fact much easier than general matrix inversion. (We have already seen examples of this.)
  • Multiplying elementary row operations is also simple: just apply the elementary row operation indicated by the left matrix to the right matrix. Here are some examples:
E−12E−13=⎡⎢⎣100010001/5⎤⎥⎦⎡⎢⎣100010401⎤⎥⎦=⎡⎢⎣1000104/501/5⎤⎥⎦E−13E−12=⎡⎢⎣100010401⎤⎥⎦⎡⎢⎣100010001/5⎤⎥⎦=⎡⎢⎣100010401/5⎤⎥⎦E2−1E3−1=[100010001/5][100010401]=[1000104/501/5]E3−1E2−1=[100010401][100010001/5]=[100010401/5]

Once again, we can verify these calculations using Numpy.

In [4]:
E2inv = np.array([[1,0,0],[0,1,0],[0,0,1/5]])
E3inv = np.array([[1,0,0],[0,1,0],[4,0,1]])
print("E2inv * E3inv ="); print(E2inv @ E3inv)
print("\nE3inv * E2inv ="); print(E3inv @ E2inv)
E2inv * E3inv =
[[1.  0.  0. ]
 [0.  1.  0. ]
 [0.8 0.  0.2]]

E3inv * E2inv =
[[1.  0.  0. ]
 [0.  1.  0. ]
 [4.  0.  0.2]]

This gives the following algorithm for LU factorization:

  1. Reduce AA to an echelon form UU by a sequence of row replacement operations.

    This is just Gaussian Elimination! But: keep track of the elementary row operations you perform along the way.

  1. Place entries in LL such that the same sequence of row operations reduces LL to II.

    If we can do this step efficiently, then the cost of LU factorization will be dominated by Gaussian Elimination itself.

The fact is that constructing LL can be done efficiently by a simple modification of Gaussian Elimination. So, LU decomposition takes time only 23n3.23n3.

20.1 Using the LU Factorization¶

Let's return to the motivation for developing the LU factorization.

We've seen that LU decomposition is essentially Gaussian Elimination, and as such, it doesn't take time much longer than Gaussian Elimination.

Now we want to show that, once you have the LU decomposition of a matrix A=LUA=LU, then the system Ax=bAx=b can be solved for any bb in time that is proportional to n2n2.

Informally, what we are going to do is to use LL to do a special, very efficient version of the forward step of Gaussian Elimination, and then use UU in the usual way to do backsubstitution.

We can write these two steps concisely as follows:

When A=LUA=LU, the equation Ax=bAx=b can be written as L(Ux)=b.L(Ux)=b.

Let's take this apart, and write yy for UxUx. Then we can find xx by solving the pair of equations:

Ly=b,Ly=b,Ux=y.Ux=y.

The idea is that we first solve Ly=bLy=b for y,y, then solve Ux=yUx=y for x.x.

The key observation: each equation is fast to solve because LL and UU are each triangular.

Example. Given the following LU decomposition of AA:

A=⎡⎢ ⎢ ⎢⎣3−7−22−35106−40−5−95−512⎤⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢⎣1000−11002−510−3831⎤⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢⎣3−7−220−2−1200−11000−1⎤⎥ ⎥ ⎥⎦=LUA=[3−7−22−35106−40−5−95−512]=[1000−11002−510−3831][3−7−220−2−1200−11000−1]=LU

use this LU factorization of AA to solve Ax=bAx=b, where b=⎡⎢ ⎢ ⎢⎣−95711⎤⎥ ⎥ ⎥⎦.b=[−95711].

Solution. To solve Ly=b,Ly=b, note that the arithmetic takes place only in the augmented column (column 5). The zeros below each pivot in LL are created automatically by the choice of row operations.

[Lb]=⎡⎢ ⎢ ⎢ ⎢⎣1000−9−110052−5107−383111⎤⎥ ⎥ ⎥ ⎥⎦∼⎡⎢ ⎢ ⎢ ⎢⎣1000−90100−40010500011⎤⎥ ⎥ ⎥ ⎥⎦=[Iy].[Lb]=[1000−9−110052−5107−383111]∼[1000−90100−40010500011]=[Iy].

Next, for Ux=yUx=y (the "backward" phase) the row reduction is again streamlined:

[Uy]=⎡⎢ ⎢ ⎢ ⎢⎣3−7−22−90−2−12−400−115000−11⎤⎥ ⎥ ⎥ ⎥⎦∼⎡⎢ ⎢ ⎢ ⎢⎣10003010040010−60001−1⎤⎥ ⎥ ⎥ ⎥⎦.[Uy]=[3−7−22−90−2−12−400−115000−11]∼[10003010040010−60001−1].

What we have done is:

  • Perform the forward step of Gaussian Elimination (but in a specially streamlined, efficient way)
  • Then perform the backwards (backsubstitution) step in the usual (efficient) fashion.
In [7]:
# image credit: Lay, 4th edition
display(Image("images/07-lu.jpeg", width=800))

Analysis. Both the forward and backward phases of solving a system as L(Ux)=bL(Ux)=b have flop counts of ∼2n2∼2n2.

Therefore the time consuming step is actually doing the factorization, which as we've seen is essentially Gaussian Elimination, and therefore requires ∼23n3∼23n3 flops.

Hence we have found that by using the LU decomposition, one can solve a series of systems all involving the same AA in ∼23n3∼23n3 flops. By contrast:

  • Doing Gaussian Elimination would require ∼23n3∼23n3 flops for each of the pp linear systems.
  • Using the matrix inverse would require ∼2n3∼2n3 flops to invert the matrix.

20.2 Pivoting¶

Up until now we have assumed that the Gaussian Elimination we used to define UU only involves scalar multiplications or adding a multiple of a row to a row below it.

However, in real situations, we sometime need to exchange two rows.

  • One reason is, as we know, that if the current row has a zero in the pivot position, we need to exchange it with a row that does not have a zero in the pivot position.
[0110]→[1001][0110]→[1001]
  • But there is another, more subtle reason having to do with numerical accuracy.

    We make the following observation: in general, we would like to avoid dividing by a small number.

Here is why. Consider the problem of computing a/ba/b where aa and bb are scalars.

Let's say there is some small error in the value of aa, call it ϵ.ϵ.

That means that what we are really computing is (a+ϵ)/b=a/b+ϵ/b(a+ϵ)/b=a/b+ϵ/b.

Note that a/ba/b is the correct value, but that what we compute is off by ϵ/bϵ/b. Now, if bb is a very small number, then the error in the result (ϵ/bϵ/b) will be large.

Hence we would like to avoid dividing by small numbers whenever possible.

Now: note that in performing Gaussian Elimination, we divide each row by the value of its pivot.

What this suggests is that we would like to avoid having small pivots.

For example, consider the matrix A=[10−20111]=[1010201]L[10−2010−1020]U(1)(2)(1)A=[10−20111](2)=[1010201]⏟L[10−2010−1020]⏟U

This is a valid LULU decomposition. But it creates several large and small numbers, which as we know can lead to arithmetic errors. It would be better to interchange the two rows of A first and then perform LULU decomposition on that matrix.

A′=[1110−201]=[1010−201]L[1101]U(3)(4)(3)A′=[1110−201](4)=[1010−201]⏟L[1101]⏟U

Note that this factorization is not quite correct! The bottom right entry is now 1+10−201+10−20 rather than the correct value of 11. But this error will vanish in the computer's approximate addition anyway: remember that the computer only remembers numbers to 16 digits of precision. And this way we have only one really small number in our factorization, and no really large numbers.

This factorization is more stable, even if it is less accurate.

A stable algorithm gives nearly the right answer to nearly the right question (Trefethen, pg 104)

For instance, if we start with the matrix

A=[10−20111]A=[10−20111]

then it is best to interchange the two rows because it's better to start with 1 as a pivot rather than 10−2010−20.

Permutation Matrices¶

There is a simple way to address this. In processing any particular row, we can avoid having a small pivot by interchanging the current row with one of those below it.

We would like to exchange the current row with the row that has the largest absolute value of its pivot. This algorithmic technique is called "partial pivoting."

Now, a row interchange is an elementary row operation, and can be implemented by an elementary matrix. This elementary matrix is the identity with its corresponding rows interchanged.

An elementary matrix that exchanges rows is called a permutation matrix. The product of permutation matrices is a permutation matrix.

Hence, the net result of all the partial pivoting done during Gaussian Elimination can be expressed in a single permutation matrix PP.

This means that the final factorization of AA is:

A=PLU.A=PLU.

You can read this equation in three ways:

  1. AA is the product of a unit lower triangular matrix LL and an echelon form matrix UU, and the rows of their product have been reordered according to the permulation PP. This is A=P(LU).A=P(LU).
  2. AA is the product of a permuted lower triangular matrix PLPL and an echelon form matrix UU. This is A=(PL)U.A=(PL)U.
  3. There is a permuted version of AA that does factor into the product of lower and upper triangular matrices. Namely, P⊤A=LUP⊤A=LU,

PTP=IPTP=I, so PT=P−1PT=P−1

Vector Subspace and Basis¶

In [2]:
#
fig = ut.three_d_figure((0,0), fig_desc = 'H = Span{a1, a2, a3}',
                        xmin = -10, xmax = 10, ymin = -10, ymax = 10, zmin = -10, zmax = 10, qr = qr_setting)
a1 = [-8.0, 8.0, 5.0]
a2 = [3.0,  2.0, -2.0]
a3 = 2.5 * np.array(a2)
fig.text(a1[0]+.5, a1[1]+.5, a1[2]+.5, r'$\bf a_1$', 'a_1', size=18)
fig.text(a3[0]+.5, a3[1]+.5, a3[2]+.5, r'$\bf a_3$', 'a_3', size=18)
fig.text(a2[0]+.5, a2[1]+.5, a2[2]+.5, r'$\bf a_2$', 'a_2', size=18)
fig.plotSpan(a1, a2,'Green')
fig.plotPoint(a1[0], a1[1], a1[2],'r')
fig.plotPoint(a3[0], a3[1], a3[2],'r')
fig.plotPoint(a2[0], a2[1], a2[2],'r')
fig.plotLine([[0, 0, 0], a3], 'r', '--')
fig.plotLine([[0, 0, 0], a1], 'r', '--')
# fig.plotPoint(a3[0], a3[1], a3[2],'r')
fig.text(0.1, 0.1, -3, r'$\bf 0$', '0', size=12)
fig.plotPoint(0, 0, 0, 'b')
# fig.set_title(r'$H$ = Span$\{{\bf a}_1, {\bf a}_2, {\bf a}_3\}$')
fig.text(9, -9, -7, r'H', 'H', size = 16)
img = fig.dont_save()

[This lecture is based on Prof. Crovella's CS 132 lecture notes.]

20.3 The Invertible Matrix Theorem¶

We have seen in this course that invertible matrices are "nicer" to work with when solving systems of linear equations. For instance, if a matrix AA is invertible, then Ax=bAx=b has a unique solution for any bb.

This suggests a deep connection between the invertibility of AA and the nature of the linear system Ax=b.Ax=b.

In fact, we are now at the point where we can collect together in a fairly complete way much of what we have learned about matrices and linear systems.

This remarkable collection of ten interrelated properties is called the Invertible Matrix Theorem (IMT). (And we will grow this list further in future lectures.)

Invertible Matrix Theorem. Let AA by a square n×nn×n matrix.

Then the following statements are equivalent; that is, they are either all true or all false.

  1. AA is an invertible matrix.
  1. ATAT is an invertible matrix.
    • Proof by direct construction: (AT)−1=(A−1)T.(AT)−1=(A−1)T.
  1. The equation Ax=bAx=b has a unique solution for each bb in Rn.Rn.
    • We saw in a previous lecture that the solution is x=A−1bx=A−1b.
  1. A is row equivalent to the identity matrix.
    • If Ax=bAx=b has a unique solution for any b,b, then the reduced row echelon form of AA is II.
  1. A has nn pivot positions.
    • Follows directly from the previous statement.
  1. The homogeneous equation Ax=0Ax=0 has only the trivial solution.
    • If Ax=bAx=b has a unique solution for any b,b, then the unique solution for b=0b=0 must be 0.0.
  1. The columns of AA form a linearly independent set.
    • Follows directly the previous statement and the definition of linear independence.
  1. The columns of AA span Rn.Rn.
    • For any b∈Rn,b∈Rn, there is a set of coefficients xx which can be used to construct bb from the columns of A.A.
  1. The linear transformation x↦Axx↦Ax maps RnRn onto Rn.Rn.
    • Follows directly from the previous statement.
  1. The linear transformation x↦Axx↦Ax is one-to-one.
    • Follows directly from the fact that Ax=bAx=b has a unique solution for any b.b.

We provided arguments above to show that if AA is invertible, then all the other statements are true.

In fact, the converse holds as well: if AA is not invertible, then all the other statements are false. We will skip the proof of the converse, but it's not difficult.

The Invertible Matrix Theorem has wide-ranging implications.

It divides the set of all n×nn×n matrices into two disjoint classes:

  1. the invertible (nonsingular) matrices, and
  2. the noninvertible (singular) matrices.

The power of the IMT lies in the connections it provides among so many important concepts.

For example, notice how it connects linear independence of the columns of a matrix AA to the existence of solutions to equations of the form Ax=b.Ax=b.

This allows us to bring many tools to bear as needed to solve a problem.

Example. Let A=[37−6−14]A=[37−6−14].

Decide if Ax=bAx=b has a solution for all bb... or in other words if the linear transformation defined by AA is onto.

Solution. The determinant of AA is (3⋅−14)−(7⋅−6)=0(3⋅−14)−(7⋅−6)=0.

So AA is not invertible, so Ax=bAx=b does not have a solution for all bb.

Example. Decide if AA is invertible:

A=⎡⎢⎣10−231−2−5−19⎤⎥⎦.A=[10−231−2−5−19].

(If we knew the determinant of AA, then we could decide whether it is invertible by checking if det(A)≠0det(A)≠0. But we don't yet know how to take the determinant of a 3×33×3 matrix.)

Solution.

A∼⎡⎢⎣10−20140−1−1⎤⎥⎦∼⎡⎢⎣10−2014003⎤⎥⎦.A∼[10−20140−1−1]∼[10−2014003].

AA has three pivot positions and hence is invertible, by the IMT.

Note. Keep in mind: while the IMT is quite powerful, it does not completely settle issues that arise with respect to Ax=b.Ax=b.

This is because it only applies to square matrices.

So if AA is nonsquare, then we can't use the IMT to conclude anything about the existence or nonexistence of solutions to Ax=b.Ax=b.

Invertible Linear Transformations¶

A linear transformation T:Rn→RnT:Rn→Rn is invertible if there exists a function S:Rn→RnS:Rn→Rn such that

S(T(x))=xfor allx∈Rn,S(T(x))=xfor allx∈Rn,

and

T(S(x))=xfor allx∈Rn.T(S(x))=xfor allx∈Rn.

Theorem. Let T:Rn→RnT:Rn→Rn be a linear transformation and let AA be the standard matrix for TT. Then,

  • TT is invertible if and only if AA is an invertible matrix.

  • In that case, the linear transformation SS given by S(x)=A−1xS(x)=A−1x is the unique function satisfying the definition.

Let's look at some invertible and non-invertible linear transformations.

Example. Here is a horizontal contraction

A=[0.5001].A=[0.5001].
In [9]:
#
square = np.array([[0.0,1,1,0],[1,1,0,0]])
A = np.array(
    [[0.5, 0], 
     [  0, 1]])
# print("A ="); print(A)
dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square,'r')
# Latex(r'Horizontal Contraction')

Its determinant is 1(0.5)−0(0)=0.5,1(0.5)−0(0)=0.5, so this linear transformation is invertible. Its inverse is:

10.5[1000.5]=[2001].10.5[1000.5]=[2001].

Clearly, just as AA contracted the x1x1 direction by 0.5, A−1A−1 will expand the x1x1 direction by 2.

Example. Consider the projection onto the x2x2 axis

A=[0001].A=[0001].

The determinant of AA is zero, so this linear transformation is not invertible.

In [10]:
#
A = np.array(
    [[0,0],
     [0,1]])
print("A = "); print(A)
ax = dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square)
ax.arrow(1.0,1.0,-0.9,0,head_width=0.15, head_length=0.1, length_includes_head=True);
ax.arrow(1.0,0.0,-0.9,0,head_width=0.15, head_length=0.1, length_includes_head=True);
A = 
[[0 0]
 [0 1]]

By the IMT, there are many equivalent ways to look at this:

  • The mapping TT is not onto R2.R2. (Only a subset of R2R2 can be output by TT).
  • The mapping TT is not one-to-one. (There are many values xx that give the same Ax.Ax.)
  • AA does not have 2 pivots.
  • The columns of AA do not span R2.R2.
  • Ax=0Ax=0 has a non-trivial solution.