In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

Homework

  • Homework 5 due Friday 3/17

Office hours

  • Tomorrow: Peer tutor Rohan Anand from 1:30-3pm in CCDS 16th floor
  • Tomorrow: Abhishek Tiwari from 3:30-4:30pm on CCDS 13th floor

Weekly reading and viewing assignments

  • Aggarwal sections 2.6-2.7
  • 3Blue1Brown video 7 and video 8

Lecture 19: LU Decomposition¶

A=⎡⎢ ⎢ ⎢⎣1000∗100∗∗10∗∗∗1⎤⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢⎣■∗∗∗∗0■∗∗∗000■∗00000⎤⎥ ⎥ ⎥⎦LUA=[1000∗100∗∗10∗∗∗1][◼∗∗∗∗0◼∗∗∗000◼∗00000]LU

[This lecture is based on lecture notes from Prof. Crovella's CS 132 and the fast.ai numerical linear algebra course.]

Recap¶

We've done a lot in this class so far.

  • We started from the idea of solving linear systems.

  • From there, we've developed an algebra of matrices and vectors.

  • This has led us to a view of a matrix as a linear operator: something that acts on a vector to create a new vector.

Before break, we calculated the computational cost of matrix operations in terms of floating point operations (flops)

  • Matrix multiplication of an n×nn×n square matrix takes 2n32n3 operations.
  • Matrix multiplication is faster and nicely parallelizable.

Order matters!¶

When you look at a mathematical expression involving matrices, think carefully about what it means and how you might efficiently compute it.

Example. What is the most efficient way to compute A2xA2x, where A∈RnxnA∈Rnxn and x∈Rnx∈Rn?

Here are your choices:

  1. First compute A2A2, then compute (A2)x(A2)x
  2. First compute AxAx, then compute A(Ax)A(Ax)
  1. First compute A2A2, then compute (A2)x(A2)x:

    • Complexity: 2n3+2n22n3+2n2
    • Runtime for n=10,000n=10,000 rows: about 2 Trillion flops
  1. First compute AxAx, then compute A(Ax)A(Ax):

    • Complexity: 2⋅2n2=4n22⋅2n2=4n2
    • Runtime for n=10,000n=10,000 rows: 4 Million flops ✓✓

19.1 Computational Complexity of Matrix Inverse¶

Recall that a matrix AA is called invertible if there exists a matrix CC such that

AC=I and CA=I.AC=I and CA=I.

In that case, CC is called the inverse of AA. Inverses are only defined for square matrices.

In [5]:
#
display(Image("images/06-inverse.png", width=1000))

Matrix inversion has many uses. For instance, if a matrix AA is invertible then we can solve the linear system by multiplying by A−1A−1 on the left. Ax=bA−1(Ax)=A−1b(A−1A)x=A−1bIx=A−1bx=A−1b (and this solution is unique)(1)(2)(3)(4)(5)(1)Ax=b(2)A−1(Ax)=A−1b(3)(A−1A)x=A−1b(4)Ix=A−1b(5)x=A−1b (and this solution is unique)

Python contains a function np.linalg.inv() that can compute the inverse of a square matrix of any size. For example:

In [87]:
import numpy as np
A = np.array(
    [[ 2.0, 5.0],
     [-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)
A =
 [[ 2.  5.]
 [-3. -7.]]
B = 
 [[-7. -5.]
 [ 3.  2.]]

Question: How would you write a computer program to perform this inverse operation? How long will it take to run?

In Lecture 18, we calculated a special formula for finding the inverse of a 2×22×2 matrix: A−1=1det(A)[d−b−ca]A−1=1det(A)[d−b−ca]

We also found general algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix [A∣I][A∣I]. The result in reduced row echelon form will be [I∣A−1][I∣A−1].

We've seen how to calculate the cost of Gaussian elimination for an n×nn×n square matrix AA and column vector bb.

Here, the augmented matrix is of size n×(n+1)n×(n+1). For this matrix, let's consider the cost of performing an elementary row operation like adding a scalar multiple of row 1 into row 2. We need

  • n+1n+1 multiplications
  • n+1n+1 additions

So the total cost is 2(n+1)2(n+1) floating point operations (or flops).

In [7]:
# Image credit: Prof. Mark Crovella
display(Image("images/03-ge1.jpg", width=800))

As you perform more elimination steps, you can work with smaller sub-matrices. In total, we calculated previously that Gaussian elimination on [A|b][A|b] costs

  • 23n323n3 floating point operations for the Elimination stage
  • n2n2 floating point operations for the Backsubstitution stage

For matrix inversion, we need to use a larger augmented matrix [A|I][A|I] whose size is n×2nn×2n. For this matrix, an elementary row operation now costs 2n2n multiplications and 2n2n additions, for a total of 4n4n flops.

If you go back to the derivation of the cost of Gaussian Elimination and recalculate the results for a wider augmented matrix [A|I][A|I], you will find that the costs are now:

  • 53n353n3 floating point operations for the Elimination stage
  • 13n313n3 floating point operations for the Backsubstitution stage

As a result, the total cost to calculate the inverse is 2n32n3 flops. This is three times the cost of an ordinary Gaussian elimination.

Hence, if I give you a matrix AA then you can:

  1. Solve a single linear system Ax=bAx=b at a cost of 23n323n3 flops.
  2. Calculate A−1A−1 at a cost of 2n32n3 flops and then be able to solve linear systems Ax=bAx=b for any vector bb you receive in the future (at a cost of 2n22n2 flops each to perform the matrix-vector multiplication).

Which option is better depends on how linear systems you will solve using the same matrix AA.

Today, we will find a new way to solve matrix operations that is the "best of both options": given a matrix AA, you can perform a one-time cost of 23n323n3 flops and then quickly solve any matrix equation of the form Ax=bAx=b.

19.2 Matrix Factorizations¶

Just as multiplication can be generalized from scalars to matrices, the notion of a factorization can also be generalized from scalars to matrices.

A factorization of a matrix AA is an equation that expresses AA as a product of two or more matrices.

A=BC.A=BC.

The essential difference with what we have done so far is that we have been given factors (BB and CC) and then computed their product AA.

In a factorization problem, you are given AA, and you want to find BB and CC -- that meet some conditions.

There are a number of reasons one may want to factor a matrix.

  • Recasting AA into a form that makes computing with AA faster.
  • Recasting AA into a form that makes working with AA easier.
  • Recasting AA into a form that exposes important properties of AA.

Today we'll work with one particular factorization that addesses the first case. In future lectures, we'll study factorizations that address the other two cases.

In [9]:
#
display(Image("images/01-svd.png", width=800))

The factorization we will study is called the LU Factorization. It is worth studying in its own right, and because it introduces the idea of factorizations, which we will study again later on.

19.3 The LU Factorization Problem¶

Consider the following problem. You are given a square n×nn×n matrix AA and an n×pn×p matrix BB.

You seek X∈Rn×pX∈Rn×p such that:

AX=B.AX=B.

In other words, instead of the usual matrix equation Ax=bAx=b that we have studied so far in this course, now XX and BB are matrices.

Question. Given AA and BB, how can we solve for XX using techniques we already know? And how long does it take?

By the rules of matrix multiplication, we can break this problem up.

Let X=[x1x2…xp],X=[x1x2…xp], and B=[b1b2…bp]B=[b1b2…bp].

Then:

Ax1=b1Ax1=b1Ax2=b2Ax2=b2……Axp=bpAxp=bp

In other words, there are pp linear systems to solve. Each linear system is conceptually a separate problem.

If we perform Gaussian Elimination on each of the separate systems, the total cost is ∼p⋅23n3∼p⋅23n3 flops.

As we saw earlier today, you could solve these systems by first computing A−1A−1 and then computing:

x1=A−1b1x1=A−1b1x2=A−1b2x2=A−1b2……xp=A−1bpxp=A−1bp

Or, more concisely:

X=A−1BX=A−1B

This is a faster technique if p>3p>3, because it exploits the fact that every linear system has the same AA matrix.

The biggest cost is computing the inverse matrix A−1A−1, which requires ∼2n3∼2n3 flops (three times as many as a single Gaussian Elimination).

What if we could solve all these systems while performing Gaussian Elimination only once? That would be a win, as it would cut our running time by a factor of 3.

Today we will explore the LU factorization, which will allow us to do exactly this. We will see that LU factorization has a close connection to Gaussian Elimination.

In fact, I hope that when we are done, you will see Gaussian Elimination in a new way, namely:

Gaussian Elimination is really a matrix factorization!

Before we start to discuss the LU factorization, we need to introduce a powerful tool for performing factorizations, called elementary matrices.

19.4 Elementary Matrices¶

Within Gaussian elimination, recall that the row reduction process consists of repeated applications of elementary row operations:

  • Interchange operation: Swap two rows.
  • Scaling operation: Multiply a row by a nonzero scalar.
  • Addition operation: Add a multiple of one row to another.
⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0■∗∗∗∗∗0∗∗∗∗∗∗0∗∗∗∗∗∗0∗∗∗∗∗∗0∗∗∗∗∗∗⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⇒⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0■∗∗∗∗∗000■∗∗∗0000■∗∗00000■∗0000000⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[0◼∗∗∗∗∗0∗∗∗∗∗∗0∗∗∗∗∗∗0∗∗∗∗∗∗0∗∗∗∗∗∗]⇒[0◼∗∗∗∗∗000◼∗∗∗0000◼∗∗00000◼∗0000000]

Now that we have much more theoretical machinery in our toolbox, we can make an important observation:

Every elementary row operation on AA is a linear transformation, and therefore can be performed by multiplying AA by a suitable matrix.

Recall that linear transformations respect addition and scalar multiplication.

To see that swapping two rows is a linear transformation: given two augmented matrices AA and BB, it doesn't matter if you swap rows in the individual matrices and then add them, or add them and then swap the rows. In other words:

swap(A+B)=swap(A)+swap(B).swap(A+B)=swap(A)+swap(B).

Check for yourself that swaps respect scalar multiplications in the same way, and the other two elementary operations also satisfy linearity.

As a result, each row operation has some matrix associated with it. Furthermore, the matrices that implement elementary row operations are particularly simple. They are called elementary matrices.

An elementary matrix is one that is obtained by performing a single elementary row operation on the identity matrix.

Example. Consider the following 3×33×3 matrices:

E1=⎡⎢⎣010100001⎤⎥⎦,E2=⎡⎢⎣100010005⎤⎥⎦,E3=⎡⎢⎣100010−401⎤⎥⎦.E1=[010100001],E2=[100010005],E3=[100010−401].

Let's see what each matrix does to an arbitrary 3×33×3 matrix A=⎡⎢⎣abcdefghi⎤⎥⎦A=[abcdefghi].

Left-multiplication by E1E1 swaps the first two rows of AA (for any matrix AA).

E1A=⎡⎢⎣010100001⎤⎥⎦⎡⎢⎣abcdefghi⎤⎥⎦=⎡⎢⎣defabcghi⎤⎥⎦.E1A=[010100001][abcdefghi]=[defabcghi].

Left-multiplication by E2E2 corresponds to a scalar multiplication of the third row by the scalar 5.

E2A=⎡⎢⎣100010005⎤⎥⎦⎡⎢⎣abcdefghi⎤⎥⎦=⎡⎢⎣abcdef5g5h5i⎤⎥⎦.E2A=[100010005][abcdefghi]=[abcdef5g5h5i].

Left-multiplication by E3E3 will add -4 times row 1 to row 3.

E3A=⎡⎢⎣100010−401⎤⎥⎦⎡⎢⎣abcdefghi⎤⎥⎦=⎡⎢⎣abcdefg−4ah−4bi−4c⎤⎥⎦.E3A=[100010−401][abcdefghi]=[abcdefg−4ah−4bi−4c].

Finding the Elementary Matrix¶

Here is a simple way to compute these elementary matrices.

Let EE be the matrix that implements the operation "add -4 times row 1 to row 3." (Suppose we don't yet know what EE is.)

Now, for any matrix, EI=EEI=E by the definition of II.

But note that this equation also says: "the matrix EE that implements the operation 'add -4 times row 1 to row 3' is the one you get by performing this operation on II."

Let's perform this transformation starting from II: ⎡⎢⎣100010001⎤⎥⎦→⎡⎢⎣100010−401⎤⎥⎦[100010001]→[100010−401]

Thus we have the following:

Fact. If an elementary row operation is performed on an m×nm×n matrix A,A, the resulting matrix can be written as EAEA, where the m×mm×m matrix EE is created by performing the same row operation on ImIm.

Question. Is an elementary matrix invertible?

Recall the types of elementary row operations:

  • Swap two rows.
  • Multiply a row by a nonzero scalar.
  • Add a multiple of one row to another.

Answer: yes, any row reduction operation can be reversed by another (related) row reduction operation.

Every row reduction is an invertible linear transformation -- so every elementary matrix is invertible.

Examples. Let's invert E1E1, E2E2, and E3E3 from earlier.

E−11=⎡⎢⎣010100001⎤⎥⎦−1=⎡⎢⎣010100001⎤⎥⎦E−12=⎡⎢⎣100010005⎤⎥⎦−1=⎡⎢⎣100010001/5⎤⎥⎦E−13=⎡⎢⎣100010−401⎤⎥⎦−1=⎡⎢⎣100010401⎤⎥⎦E1−1=[010100001]−1=[010100001]E2−1=[100010005]−1=[100010001/5]E3−1=[100010−401]−1=[100010401]

We can verify these operations using Python with Numpy.

In [16]:
E1 = np.array([[0,1,0],[1,0,0],[0,0,1]])
E2 = np.array([[1,0,0],[0,1,0],[0,0,5]])
E3 = np.array([[1,0,0],[0,1,0],[-4,0,1]])
print("E1 inverse = "); print(np.linalg.inv(E1))
print("E2 inverse = "); print(np.linalg.inv(E2))
print("E3 inverse = "); print(np.linalg.inv(E3))
E1 inverse = 
[[0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]
E2 inverse = 
[[1.  0.  0. ]
 [0.  1.  0. ]
 [0.  0.  0.2]]
E3 inverse = 
[[ 1. -0. -0.]
 [ 0.  1.  0.]
 [ 4.  0.  1.]]

19.5 The LU Factorization¶

Now, we will introduce the factorization

A=LU.A=LU.

Note that we are not restricted only to square matrices AA. LU decomposition (like Gaussian Elimination) works for a matrix AA having any m×nm×n shape.

An LU factorization of AA constructs two matrices that have this structure:

A=⎡⎢ ⎢ ⎢⎣1000∗100∗∗10∗∗∗1⎤⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢⎣■∗∗∗∗0■∗∗∗000■∗00000⎤⎥ ⎥ ⎥⎦LUA=[1000∗100∗∗10∗∗∗1][◼∗∗∗∗0◼∗∗∗000◼∗00000]LU

Stars (∗∗) denote arbitrary entries, and blocks (■◼) denote nonzero entries.

These two matrices each have a special structure.

  • UU is in row echelon form, and it has the same m×nm×n shape as AA. This is an upper triangular matrix (hence its name UU).

  • LL is a lower triangular square matrix of size m×mm×m, and it has 1s on the diagonal. This is called a unit lower triangular matrix (hence its name LL).

The fact that UU is in row echelon form may suggest to you (correctly!) that we could get it from AA by a sequence of row operations.

For now, let us suppose that:

  • We never need to interchange (or swap) two rows. Let's only consider the other two elementary operations.
  • The row reductions that convert AA to UU only add a multiple of one row to another row below it.

Now, if you consider an elementary matrix that implements such a row reduction, you will see that it will have 1s on the diagonal, and an additional entry somewhere below the diagonal.

For example, recall the scaling matrix E2E2 and addition matrix E3E3 from earlier:

E2=⎡⎢⎣100010005⎤⎥⎦,E2A=⎡⎢⎣abcdef5g5h5i⎤⎥⎦,E3=⎡⎢⎣100010−401⎤⎥⎦,E3A=⎡⎢⎣abcdefg−4ah−4bi−4c⎤⎥⎦.E2=[100010005],E2A=[abcdef5g5h5i],E3=[100010−401],E3A=[abcdefg−4ah−4bi−4c].

These elementary matrices are both unit lower triangular matrices! (We are ignoring row interchanges for now because they are not lower triangular. We'll deal with row interchanges later.)

So if there is a sequence of elementary row operations that convert AA to UU, then there is a set of unit lower triangular elementary matrices E1,…,EpE1,…,Ep such that

Ep⋯E1A=U.Ep⋯E1A=U.

We know that elementary matrices are invertible, and the product of invertible matrices is invertible, so:

A=(Ep⋯E1)−1U=LUA=(Ep⋯E1)−1U=LU

where L=(Ep⋯E1)−1L=(Ep⋯E1)−1 = E−11⋯E−1pE1−1⋯Ep−1. Remember: the inverse of a product equals the product of inverses, except in the opposite order.

Fact. The product of unit lower triangular matrices is unit lower triangular. Additionally, the inverse of a unit lower triangular matrix is unit lower triangular.

(Think about how to prove this statement on your own.)

So we can conclude that L,L, as constructed from (Ep⋯E1)−1(Ep⋯E1)−1, is unit lower triangular.

Hence, we have defined the LU decomposition based on Gaussian Elimination.

We have rewritten Gaussian Elimination as:

U=L−1AU=L−1A

and shown that the LL so defined is unit lower triangular.

Let's take stock of what this all means: the LU decomposition is a way of capturing the application of Gaussian Elimination to AA.

It incorporates both the process of performing Gaussian Elimination, and the result:

  • UU is the row echelon form of AA.
  • L−1L−1 captures the row reductions that transform AA to row echelon form.
  • LL is the inverse of L−1L−1.

Finding LL¶

Recall that the motivation for developing the LU decomposition is that it is more efficient than matrix inversion. So we don't want to have to invert L−1L−1 in the standard way in order to find LL.

Here we have some good news:

  • Inverting each elementary row operation is simple, in fact much easier than general matrix inversion. (We have already seen examples of this.)
  • Multiplying elementary row operations is also simple: just apply the elementary row operation indicated by the left matrix to the right matrix. Here are some examples:
E−12E−13=⎡⎢⎣100010001/5⎤⎥⎦⎡⎢⎣100010401⎤⎥⎦=⎡⎢⎣1000104/501/5⎤⎥⎦E−13E−12=⎡⎢⎣100010401⎤⎥⎦⎡⎢⎣100010001/5⎤⎥⎦=⎡⎢⎣100010401/5⎤⎥⎦E2−1E3−1=[100010001/5][100010401]=[1000104/501/5]E3−1E2−1=[100010401][100010001/5]=[100010401/5]

Once again, we can verify these calculations using Numpy.

In [4]:
E2inv = np.array([[1,0,0],[0,1,0],[0,0,1/5]])
E3inv = np.array([[1,0,0],[0,1,0],[4,0,1]])
print("E2inv * E3inv ="); print(E2inv @ E3inv)
print("\nE3inv * E2inv ="); print(E3inv @ E2inv)
E2inv * E3inv =
[[1.  0.  0. ]
 [0.  1.  0. ]
 [0.8 0.  0.2]]

E3inv * E2inv =
[[1.  0.  0. ]
 [0.  1.  0. ]
 [4.  0.  0.2]]

This gives the following algorithm for LU factorization:

  1. Reduce AA to an echelon form UU by a sequence of row replacement operations.

    This is just Gaussian Elimination! But: keep track of the elementary row operations you perform along the way.

  1. Place entries in LL such that the same sequence of row operations reduces LL to II.

    If we can do this step efficiently, then the cost of LU factorization will be dominated by Gaussian Elimination itself.

The fact is that constructing LL can be done efficiently by a simple modification of Gaussian Elimination. So, LU decomposition takes time only 23n3.23n3.