In [1]:

#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

Homework

Homework 5 due Friday 3/17

Office hours

Tomorrow: Peer tutor Rohan Anand from 1:30-3pm in CCDS 16th floor
Tomorrow: Abhishek Tiwari from 3:30-4:30pm on CCDS 13th floor

Weekly reading and viewing assignments

Aggarwal sections 2.6-2.7
3Blue1Brown video 7 and video 8

Lecture 19: LU Decomposition¶

A = \begin{array}{cc} [\begin{array}{cccc} 1 & 0 & 0 & 0 \\ * & 1 & 0 & 0 \\ * & * & 1 & 0 \\ * & * & * & 1 \end{array}] & [\begin{array}{ccccc} ◼ & * & * & * & * \\ 0 & ◼ & * & * & * \\ 0 & 0 & 0 & ◼ & * \\ 0 & 0 & 0 & 0 & 0 \end{array}] \\ L & U \end{array}

[This lecture is based on lecture notes from Prof. Crovella's CS 132 and the fast.ai numerical linear algebra course.]

Recap¶

We've done a lot in this class so far.

We started from the idea of solving linear systems.
From there, we've developed an algebra of matrices and vectors.
This has led us to a view of a matrix as a linear operator: something that acts on a vector to create a new vector.

Before break, we calculated the computational cost of matrix operations in terms of floating point operations (flops)

Matrix multiplication of an $n \times n$ square matrix takes $2 n^{3}$ operations.
Matrix multiplication is faster and nicely parallelizable.

Order matters!¶

When you look at a mathematical expression involving matrices, think carefully about what it means and how you might efficiently compute it.

Example. What is the most efficient way to compute $A^{2} x$ , where $A \in R^{n x n}$ and $x \in R^{n}$ ?

Here are your choices:

First compute $A^{2}$ , then compute $(A^{2}) x$
First compute $A x$ , then compute $A (A x)$

First compute $A^{2}$ , then compute $(A^{2}) x$ :
- Complexity: $2 n^{3} + 2 n^{2}$
- Runtime for $n = 10, 000$ rows: about 2 Trillion flops

First compute $A x$ , then compute $A (A x)$ :
- Complexity: $2 \cdot 2 n^{2} = 4 n^{2}$
- Runtime for $n = 10, 000$ rows: 4 Million flops $✓$

19.1 Computational Complexity of Matrix Inverse¶

Recall that a matrix $A$ is called invertible if there exists a matrix $C$ such that

A C = I and C A = I .

In that case, $C$ is called the inverse of $A$ . Inverses are only defined for square matrices.

In [5]:

#
display(Image("images/06-inverse.png", width=1000))

Matrix inversion has many uses. For instance, if a matrix $A$ is invertible then we can solve the linear system by multiplying by $A^{- 1}$ on the left. $\begin{aligned} (1) & A x & = b \\ (2) & A^{- 1} (A x) & = A^{- 1} b \\ (3) & (A^{- 1} A) x & = A^{- 1} b \\ (4) & I x & = A^{- 1} b \\ (5) & x & = A^{- 1} b (and this solution is unique) \end{aligned}$

Python contains a function np.linalg.inv() that can compute the inverse of a square matrix of any size. For example:

In [87]:

import numpy as np
A = np.array(
    [[ 2.0, 5.0],
     [-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)

A =
 [[ 2.  5.]
 [-3. -7.]]
B = 
 [[-7. -5.]
 [ 3.  2.]]

Question: How would you write a computer program to perform this inverse operation? How long will it take to run?

In Lecture 18, we calculated a special formula for finding the inverse of a $2 \times 2$ matrix: $A^{- 1} = \frac{1}{d e t (A)} [\begin{array}{rr} d & - b \\ - c & a \end{array}]$

We also found general algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix $[A ∣ I]$ . The result in reduced row echelon form will be $[I ∣ A^{- 1}]$ .

We've seen how to calculate the cost of Gaussian elimination for an $n \times n$ square matrix $A$ and column vector $b$ .

Here, the augmented matrix is of size $n \times (n + 1)$ . For this matrix, let's consider the cost of performing an elementary row operation like adding a scalar multiple of row 1 into row 2. We need

$n + 1$ multiplications
$n + 1$ additions

So the total cost is $2 (n + 1)$ floating point operations (or flops).

In [7]:

# Image credit: Prof. Mark Crovella
display(Image("images/03-ge1.jpg", width=800))

As you perform more elimination steps, you can work with smaller sub-matrices. In total, we calculated previously that Gaussian elimination on $[A | b]$ costs

$\frac{2}{3} n^{3}$ floating point operations for the Elimination stage
$n^{2}$ floating point operations for the Backsubstitution stage

For matrix inversion, we need to use a larger augmented matrix $[A | I]$ whose size is $n \times 2 n$ . For this matrix, an elementary row operation now costs $2 n$ multiplications and $2 n$ additions, for a total of $4 n$ flops.

If you go back to the derivation of the cost of Gaussian Elimination and recalculate the results for a wider augmented matrix $[A | I]$ , you will find that the costs are now:

$\frac{5}{3} n^{3}$ floating point operations for the Elimination stage
$\frac{1}{3} n^{3}$ floating point operations for the Backsubstitution stage

As a result, the total cost to calculate the inverse is $2 n^{3}$ flops. This is three times the cost of an ordinary Gaussian elimination.

Hence, if I give you a matrix $A$ then you can:

Solve a single linear system $A x = b$ at a cost of $\frac{2}{3} n^{3}$ flops.
Calculate $A^{- 1}$ at a cost of $2 n^{3}$ flops and then be able to solve linear systems $A x = b$ for any vector $b$ you receive in the future (at a cost of $2 n^{2}$ flops each to perform the matrix-vector multiplication).

Which option is better depends on how linear systems you will solve using the same matrix $A$ .

Today, we will find a new way to solve matrix operations that is the "best of both options": given a matrix $A$ , you can perform a one-time cost of $\frac{2}{3} n^{3}$ flops and then quickly solve any matrix equation of the form $A x = b$ .

19.2 Matrix Factorizations¶

Just as multiplication can be generalized from scalars to matrices, the notion of a factorization can also be generalized from scalars to matrices.

A factorization of a matrix $A$ is an equation that expresses $A$ as a product of two or more matrices.

A = B C .

The essential difference with what we have done so far is that we have been given factors ( $B$ and $C$ ) and then computed their product $A$ .

In a factorization problem, you are given $A$ , and you want to find $B$ and $C$ -- that meet some conditions.

There are a number of reasons one may want to factor a matrix.

Recasting $A$ into a form that makes computing with $A$ faster.
Recasting $A$ into a form that makes working with $A$ easier.
Recasting $A$ into a form that exposes important properties of $A$ .

Today we'll work with one particular factorization that addesses the first case. In future lectures, we'll study factorizations that address the other two cases.

In [9]:

#
display(Image("images/01-svd.png", width=800))

The factorization we will study is called the LU Factorization. It is worth studying in its own right, and because it introduces the idea of factorizations, which we will study again later on.

19.3 The LU Factorization Problem¶

Consider the following problem. You are given a square $n \times n$ matrix $A$ and an $n \times p$ matrix $B$ .

You seek $X \in R^{n \times p}$ such that:

A X = B .

In other words, instead of the usual matrix equation $A x = b$ that we have studied so far in this course, now $X$ and $B$ are matrices.

Question. Given $A$ and $B$ , how can we solve for $X$ using techniques we already know? And how long does it take?

By the rules of matrix multiplication, we can break this problem up.

Let $X = [x_{1} x_{2} \dots x_{p}],$ and $B = [b_{1} b_{2} \dots b_{p}]$ .

Then:

A x_{1} = b_{1}

A x_{2} = b_{2}

\dots

A x_{p} = b_{p}

In other words, there are $p$ linear systems to solve. Each linear system is conceptually a separate problem.

If we perform Gaussian Elimination on each of the separate systems, the total cost is $\sim p \cdot \frac{2}{3} n^{3}$ flops.

As we saw earlier today, you could solve these systems by first computing $A^{- 1}$ and then computing:

x_{1} = A^{- 1} b_{1}

x_{2} = A^{- 1} b_{2}

\dots

x_{p} = A^{- 1} b_{p}

Or, more concisely:

X = A^{- 1} B

This is a faster technique if $p > 3$ , because it exploits the fact that every linear system has the same $A$ matrix.

The biggest cost is computing the inverse matrix $A^{- 1}$ , which requires $\sim 2 n^{3}$ flops (three times as many as a single Gaussian Elimination).

What if we could solve all these systems while performing Gaussian Elimination only once? That would be a win, as it would cut our running time by a factor of 3.

Today we will explore the LU factorization, which will allow us to do exactly this. We will see that LU factorization has a close connection to Gaussian Elimination.

In fact, I hope that when we are done, you will see Gaussian Elimination in a new way, namely:

Gaussian Elimination is really a matrix factorization!

Before we start to discuss the LU factorization, we need to introduce a powerful tool for performing factorizations, called elementary matrices.

19.4 Elementary Matrices¶

Within Gaussian elimination, recall that the row reduction process consists of repeated applications of elementary row operations:

Interchange operation: Swap two rows.
Scaling operation: Multiply a row by a nonzero scalar.
Addition operation: Add a multiple of one row to another.

[\begin{array}{ccccccc} 0 & ◼ & * & * & * & * & * \\ 0 & * & * & * & * & * & * \\ 0 & * & * & * & * & * & * \\ 0 & * & * & * & * & * & * \\ 0 & * & * & * & * & * & * \end{array}] \Rightarrow [\begin{array}{ccccccc} 0 & ◼ & * & * & * & * & * \\ 0 & 0 & 0 & ◼ & * & * & * \\ 0 & 0 & 0 & 0 & ◼ & * & * \\ 0 & 0 & 0 & 0 & 0 & ◼ & * \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}]

Now that we have much more theoretical machinery in our toolbox, we can make an important observation:

Every elementary row operation on $A$ is a linear transformation, and therefore can be performed by multiplying $A$ by a suitable matrix.

Recall that linear transformations respect addition and scalar multiplication.

To see that swapping two rows is a linear transformation: given two augmented matrices $A$ and $B$ , it doesn't matter if you swap rows in the individual matrices and then add them, or add them and then swap the rows. In other words:

swap (A + B) = swap (A) + swap (B) .

Check for yourself that swaps respect scalar multiplications in the same way, and the other two elementary operations also satisfy linearity.

As a result, each row operation has some matrix associated with it. Furthermore, the matrices that implement elementary row operations are particularly simple. They are called elementary matrices.

An elementary matrix is one that is obtained by performing a single elementary row operation on the identity matrix.

Example. Consider the following $3 \times 3$ matrices:

E_{1} = [\begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array}], E_{2} = [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 5 \end{array}], E_{3} = [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ - 4 & 0 & 1 \end{array}] .

Let's see what each matrix does to an arbitrary $3 \times 3$ matrix $A = [\begin{array}{rrr} a & b & c \\ d & e & f \\ g & h & i \end{array}]$ .

Left-multiplication by $E_{1}$ swaps the first two rows of $A$ (for any matrix $A$ ).

E_{1} A = [\begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array}] [\begin{array}{rrr} a & b & c \\ d & e & f \\ g & h & i \end{array}] = [\begin{array}{rrr} d & e & f \\ a & b & c \\ g & h & i \end{array}] .

Left-multiplication by $E_{2}$ corresponds to a scalar multiplication of the third row by the scalar 5.

E_{2} A = [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 5 \end{array}] [\begin{array}{rrr} a & b & c \\ d & e & f \\ g & h & i \end{array}] = [\begin{array}{rrr} a & b & c \\ d & e & f \\ 5 g & 5 h & 5 i \end{array}] .

Left-multiplication by $E_{3}$ will add -4 times row 1 to row 3.

E_{3} A = [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ - 4 & 0 & 1 \end{array}] [\begin{array}{rrr} a & b & c \\ d & e & f \\ g & h & i \end{array}] = [\begin{array}{rrr} a & b & c \\ d & e & f \\ g - 4 a & h - 4 b & i - 4 c \end{array}] .

Finding the Elementary Matrix¶

Here is a simple way to compute these elementary matrices.

Let $E$ be the matrix that implements the operation "add -4 times row 1 to row 3." (Suppose we don't yet know what $E$ is.)

Now, for any matrix, $E I = E$ by the definition of $I$ .

But note that this equation also says: "the matrix $E$ that implements the operation 'add -4 times row 1 to row 3' is the one you get by performing this operation on $I$ ."

Let's perform this transformation starting from $I$ : $[\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}] \to [\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ - 4 & 0 & 1 \end{array}]$

Thus we have the following:

Fact. If an elementary row operation is performed on an $m \times n$ matrix $A,$ the resulting matrix can be written as $E A$ , where the $m \times m$ matrix $E$ is created by performing the same row operation on $I_{m}$ .

Question. Is an elementary matrix invertible?

Recall the types of elementary row operations:

Swap two rows.
Multiply a row by a nonzero scalar.
Add a multiple of one row to another.

Answer: yes, any row reduction operation can be reversed by another (related) row reduction operation.

Every row reduction is an invertible linear transformation -- so every elementary matrix is invertible.

Examples. Let's invert $E_{1}$ , $E_{2}$ , and $E_{3}$ from earlier.

\begin{array}{rcrcl} E_{1}^{- 1} & = & {[\begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array}]}^{- 1} & = & [\begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array}] \\ E_{2}^{- 1} & = & {[\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 5 \end{array}]}^{- 1} & = & [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 / 5 \end{array}] \\ E_{3}^{- 1} & = & {[\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ - 4 & 0 & 1 \end{array}]}^{- 1} & = & [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 4 & 0 & 1 \end{array}] \end{array}

We can verify these operations using Python with Numpy.

In [16]:

E1 = np.array([[0,1,0],[1,0,0],[0,0,1]])
E2 = np.array([[1,0,0],[0,1,0],[0,0,5]])
E3 = np.array([[1,0,0],[0,1,0],[-4,0,1]])
print("E1 inverse = "); print(np.linalg.inv(E1))
print("E2 inverse = "); print(np.linalg.inv(E2))
print("E3 inverse = "); print(np.linalg.inv(E3))

E1 inverse = 
[[0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]
E2 inverse = 
[[1.  0.  0. ]
 [0.  1.  0. ]
 [0.  0.  0.2]]
E3 inverse = 
[[ 1. -0. -0.]
 [ 0.  1.  0.]
 [ 4.  0.  1.]]

19.5 The LU Factorization¶

Now, we will introduce the factorization

A = L U .

Note that we are not restricted only to square matrices $A$ . LU decomposition (like Gaussian Elimination) works for a matrix $A$ having any $m \times n$ shape.

An LU factorization of $A$ constructs two matrices that have this structure:

A = \begin{array}{cc} [\begin{array}{cccc} 1 & 0 & 0 & 0 \\ * & 1 & 0 & 0 \\ * & * & 1 & 0 \\ * & * & * & 1 \end{array}] & [\begin{array}{ccccc} ◼ & * & * & * & * \\ 0 & ◼ & * & * & * \\ 0 & 0 & 0 & ◼ & * \\ 0 & 0 & 0 & 0 & 0 \end{array}] \\ L & U \end{array}

Stars ( $*$ ) denote arbitrary entries, and blocks ( $◼$ ) denote nonzero entries.

These two matrices each have a special structure.

$U$ is in row echelon form, and it has the same $m \times n$ shape as $A$ . This is an upper triangular matrix (hence its name $U$ ).
$L$ is a lower triangular square matrix of size $m \times m$ , and it has 1s on the diagonal. This is called a unit lower triangular matrix (hence its name $L$ ).

The fact that $U$ is in row echelon form may suggest to you (correctly!) that we could get it from $A$ by a sequence of row operations.

For now, let us suppose that:

We never need to interchange (or swap) two rows. Let's only consider the other two elementary operations.
The row reductions that convert $A$ to $U$ only add a multiple of one row to another row below it.

Now, if you consider an elementary matrix that implements such a row reduction, you will see that it will have 1s on the diagonal, and an additional entry somewhere below the diagonal.

For example, recall the scaling matrix $E_{2}$ and addition matrix $E_{3}$ from earlier:

\begin{aligned} E_{2} = [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 5 \end{array}], & E_{2} A = [\begin{array}{rrr} a & b & c \\ d & e & f \\ 5 g & 5 h & 5 i \end{array}], \\ E_{3} = [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ - 4 & 0 & 1 \end{array}], & E_{3} A = [\begin{array}{rrr} a & b & c \\ d & e & f \\ g - 4 a & h - 4 b & i - 4 c \end{array}] . \end{aligned}

These elementary matrices are both unit lower triangular matrices! (We are ignoring row interchanges for now because they are not lower triangular. We'll deal with row interchanges later.)

So if there is a sequence of elementary row operations that convert $A$ to $U$ , then there is a set of unit lower triangular elementary matrices $E_{1}, \dots, E_{p}$ such that

E_{p} \dots E_{1} A = U .

We know that elementary matrices are invertible, and the product of invertible matrices is invertible, so:

A = (E_{p} \dots E_{1})^{- 1} U = L U

where $L = (E_{p} \dots E_{1})^{- 1}$ = $E_{1}^{- 1} \dots E_{p}^{- 1}$ . Remember: the inverse of a product equals the product of inverses, except in the opposite order.

Fact. The product of unit lower triangular matrices is unit lower triangular. Additionally, the inverse of a unit lower triangular matrix is unit lower triangular.

(Think about how to prove this statement on your own.)

So we can conclude that $L,$ as constructed from $(E_{p} \dots E_{1})^{- 1}$ , is unit lower triangular.

Hence, we have defined the LU decomposition based on Gaussian Elimination.

We have rewritten Gaussian Elimination as:

U = L^{- 1} A

and shown that the $L$ so defined is unit lower triangular.

Let's take stock of what this all means: the LU decomposition is a way of capturing the application of Gaussian Elimination to $A$ .

It incorporates both the process of performing Gaussian Elimination, and the result:

$U$ is the row echelon form of $A$ .
$L^{- 1}$ captures the row reductions that transform $A$ to row echelon form.
$L$ is the inverse of $L^{- 1}$ .

Recall that the motivation for developing the LU decomposition is that it is more efficient than matrix inversion. So we don't want to have to invert $L^{- 1}$ in the standard way in order to find $L$ .

Here we have some good news:

Inverting each elementary row operation is simple, in fact much easier than general matrix inversion. (We have already seen examples of this.)
Multiplying elementary row operations is also simple: just apply the elementary row operation indicated by the left matrix to the right matrix. Here are some examples:

\begin{array}{rcccl} E_{2}^{- 1} E_{3}^{- 1} & = & [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 / 5 \end{array}] [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 4 & 0 & 1 \end{array}] & = & [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 4 / 5 & 0 & 1 / 5 \end{array}] \\ E_{3}^{- 1} E_{2}^{- 1} & = & [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 4 & 0 & 1 \end{array}] [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 / 5 \end{array}] & = & [\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 4 & 0 & 1 / 5 \end{array}] \end{array}

Once again, we can verify these calculations using Numpy.

In [4]:

E2inv = np.array([[1,0,0],[0,1,0],[0,0,1/5]])
E3inv = np.array([[1,0,0],[0,1,0],[4,0,1]])
print("E2inv * E3inv ="); print(E2inv @ E3inv)
print("\nE3inv * E2inv ="); print(E3inv @ E2inv)

E2inv * E3inv =
[[1.  0.  0. ]
 [0.  1.  0. ]
 [0.8 0.  0.2]]

E3inv * E2inv =
[[1.  0.  0. ]
 [0.  1.  0. ]
 [4.  0.  0.2]]

This gives the following algorithm for LU factorization:

Reduce $A$ to an echelon form $U$ by a sequence of row replacement operations.

This is just Gaussian Elimination! But: keep track of the elementary row operations you perform along the way.

Place entries in $L$ such that the same sequence of row operations reduces $L$ to $I$ .

If we can do this step efficiently, then the cost of LU factorization will be dominated by Gaussian Elimination itself.

The fact is that constructing $L$ can be done efficiently by a simple modification of Gaussian Elimination. So, LU decomposition takes time only $\frac{2}{3} n^{3} .$