#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None
mp.rcParams['animation.html'] = 'jshtml';
Homework
Office hours
Weekly reading and viewing assignments
[This lecture is based on lecture notes from Prof. Crovella's CS 132 and the fast.ai numerical linear algebra course.]
We've done a lot in this class so far.
We started from the idea of solving linear systems.
From there, we've developed an algebra of matrices and vectors.
This has led us to a view of a matrix as a linear operator: something that acts on a vector to create a new vector.
Before break, we calculated the computational cost of matrix operations in terms of floating point operations (flops)
When you look at a mathematical expression involving matrices, think carefully about what it means and how you might efficiently compute it.
Example. What is the most efficient way to compute , where and ?
Here are your choices:
First compute , then compute :
First compute , then compute :
Recall that a matrix is called invertible if there exists a matrix such that
In that case, is called the inverse of . Inverses are only defined for square matrices.
#
display(Image("images/06-inverse.png", width=1000))
Matrix inversion has many uses. For instance, if a matrix is invertible then we can solve the linear system by multiplying by on the left.
Python
contains a function np.linalg.inv()
that can compute the inverse of a square matrix of any size. For example:
import numpy as np
A = np.array(
[[ 2.0, 5.0],
[-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)
A = [[ 2. 5.] [-3. -7.]] B = [[-7. -5.] [ 3. 2.]]
Question: How would you write a computer program to perform this inverse operation? How long will it take to run?
In Lecture 18, we calculated a special formula for finding the inverse of a matrix:
We also found general algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix . The result in reduced row echelon form will be .
We've seen how to calculate the cost of Gaussian elimination for an square matrix and column vector .
Here, the augmented matrix is of size . For this matrix, let's consider the cost of performing an elementary row operation like adding a scalar multiple of row 1 into row 2. We need
So the total cost is floating point operations (or flops).
# Image credit: Prof. Mark Crovella
display(Image("images/03-ge1.jpg", width=800))
As you perform more elimination steps, you can work with smaller sub-matrices. In total, we calculated previously that Gaussian elimination on costs
For matrix inversion, we need to use a larger augmented matrix whose size is . For this matrix, an elementary row operation now costs multiplications and additions, for a total of flops.
If you go back to the derivation of the cost of Gaussian Elimination and recalculate the results for a wider augmented matrix , you will find that the costs are now:
As a result, the total cost to calculate the inverse is flops. This is three times the cost of an ordinary Gaussian elimination.
Hence, if I give you a matrix then you can:
Which option is better depends on how linear systems you will solve using the same matrix .
Today, we will find a new way to solve matrix operations that is the "best of both options": given a matrix , you can perform a one-time cost of flops and then quickly solve any matrix equation of the form .
Just as multiplication can be generalized from scalars to matrices, the notion of a factorization can also be generalized from scalars to matrices.
A factorization of a matrix is an equation that expresses as a product of two or more matrices.
The essential difference with what we have done so far is that we have been given factors ( and ) and then computed their product .
In a factorization problem, you are given , and you want to find and -- that meet some conditions.
There are a number of reasons one may want to factor a matrix.
Today we'll work with one particular factorization that addesses the first case. In future lectures, we'll study factorizations that address the other two cases.
#
display(Image("images/01-svd.png", width=800))
The factorization we will study is called the LU Factorization. It is worth studying in its own right, and because it introduces the idea of factorizations, which we will study again later on.
Consider the following problem. You are given a square matrix and an matrix .
You seek such that:
In other words, instead of the usual matrix equation that we have studied so far in this course, now and are matrices.
Question. Given and , how can we solve for using techniques we already know? And how long does it take?
By the rules of matrix multiplication, we can break this problem up.
Let and .
Then:
In other words, there are linear systems to solve. Each linear system is conceptually a separate problem.
If we perform Gaussian Elimination on each of the separate systems, the total cost is flops.
As we saw earlier today, you could solve these systems by first computing and then computing:
Or, more concisely:
This is a faster technique if , because it exploits the fact that every linear system has the same matrix.
The biggest cost is computing the inverse matrix , which requires flops (three times as many as a single Gaussian Elimination).
What if we could solve all these systems while performing Gaussian Elimination only once? That would be a win, as it would cut our running time by a factor of 3.
Today we will explore the LU factorization, which will allow us to do exactly this. We will see that LU factorization has a close connection to Gaussian Elimination.
In fact, I hope that when we are done, you will see Gaussian Elimination in a new way, namely:
Gaussian Elimination is really a matrix factorization!
Before we start to discuss the LU factorization, we need to introduce a powerful tool for performing factorizations, called elementary matrices.
Within Gaussian elimination, recall that the row reduction process consists of repeated applications of elementary row operations:
Now that we have much more theoretical machinery in our toolbox, we can make an important observation:
Every elementary row operation on is a linear transformation, and therefore can be performed by multiplying by a suitable matrix.
Recall that linear transformations respect addition and scalar multiplication.
To see that swapping two rows is a linear transformation: given two augmented matrices and , it doesn't matter if you swap rows in the individual matrices and then add them, or add them and then swap the rows. In other words:
Check for yourself that swaps respect scalar multiplications in the same way, and the other two elementary operations also satisfy linearity.
As a result, each row operation has some matrix associated with it. Furthermore, the matrices that implement elementary row operations are particularly simple. They are called elementary matrices.
An elementary matrix is one that is obtained by performing a single elementary row operation on the identity matrix.
Example. Consider the following matrices:
Let's see what each matrix does to an arbitrary matrix .
Left-multiplication by swaps the first two rows of (for any matrix ).
Left-multiplication by corresponds to a scalar multiplication of the third row by the scalar 5.
Left-multiplication by will add -4 times row 1 to row 3.
Here is a simple way to compute these elementary matrices.
Let be the matrix that implements the operation "add -4 times row 1 to row 3." (Suppose we don't yet know what is.)
Now, for any matrix, by the definition of .
But note that this equation also says: "the matrix that implements the operation 'add -4 times row 1 to row 3' is the one you get by performing this operation on ."
Let's perform this transformation starting from :
Thus we have the following:
Fact. If an elementary row operation is performed on an matrix the resulting matrix can be written as , where the matrix is created by performing the same row operation on .
Question. Is an elementary matrix invertible?
Recall the types of elementary row operations:
Answer: yes, any row reduction operation can be reversed by another (related) row reduction operation.
Every row reduction is an invertible linear transformation -- so every elementary matrix is invertible.
Examples. Let's invert , , and from earlier.
We can verify these operations using Python
with Numpy
.
E1 = np.array([[0,1,0],[1,0,0],[0,0,1]])
E2 = np.array([[1,0,0],[0,1,0],[0,0,5]])
E3 = np.array([[1,0,0],[0,1,0],[-4,0,1]])
print("E1 inverse = "); print(np.linalg.inv(E1))
print("E2 inverse = "); print(np.linalg.inv(E2))
print("E3 inverse = "); print(np.linalg.inv(E3))
E1 inverse = [[0. 1. 0.] [1. 0. 0.] [0. 0. 1.]] E2 inverse = [[1. 0. 0. ] [0. 1. 0. ] [0. 0. 0.2]] E3 inverse = [[ 1. -0. -0.] [ 0. 1. 0.] [ 4. 0. 1.]]
Now, we will introduce the factorization
Note that we are not restricted only to square matrices . LU decomposition (like Gaussian Elimination) works for a matrix having any shape.
An LU factorization of constructs two matrices that have this structure:
Stars () denote arbitrary entries, and blocks () denote nonzero entries.
These two matrices each have a special structure.
is in row echelon form, and it has the same shape as . This is an upper triangular matrix (hence its name ).
is a lower triangular square matrix of size , and it has 1s on the diagonal. This is called a unit lower triangular matrix (hence its name ).
The fact that is in row echelon form may suggest to you (correctly!) that we could get it from by a sequence of row operations.
For now, let us suppose that:
Now, if you consider an elementary matrix that implements such a row reduction, you will see that it will have 1s on the diagonal, and an additional entry somewhere below the diagonal.
For example, recall the scaling matrix and addition matrix from earlier:
These elementary matrices are both unit lower triangular matrices! (We are ignoring row interchanges for now because they are not lower triangular. We'll deal with row interchanges later.)
So if there is a sequence of elementary row operations that convert to , then there is a set of unit lower triangular elementary matrices such that
We know that elementary matrices are invertible, and the product of invertible matrices is invertible, so:
where = . Remember: the inverse of a product equals the product of inverses, except in the opposite order.
Fact. The product of unit lower triangular matrices is unit lower triangular. Additionally, the inverse of a unit lower triangular matrix is unit lower triangular.
(Think about how to prove this statement on your own.)
So we can conclude that as constructed from , is unit lower triangular.
Hence, we have defined the LU decomposition based on Gaussian Elimination.
We have rewritten Gaussian Elimination as:
and shown that the so defined is unit lower triangular.
Let's take stock of what this all means: the LU decomposition is a way of capturing the application of Gaussian Elimination to .
It incorporates both the process of performing Gaussian Elimination, and the result:
Recall that the motivation for developing the LU decomposition is that it is more efficient than matrix inversion. So we don't want to have to invert in the standard way in order to find .
Here we have some good news:
Once again, we can verify these calculations using Numpy
.
E2inv = np.array([[1,0,0],[0,1,0],[0,0,1/5]])
E3inv = np.array([[1,0,0],[0,1,0],[4,0,1]])
print("E2inv * E3inv ="); print(E2inv @ E3inv)
print("\nE3inv * E2inv ="); print(E3inv @ E2inv)
E2inv * E3inv = [[1. 0. 0. ] [0. 1. 0. ] [0.8 0. 0.2]] E3inv * E2inv = [[1. 0. 0. ] [0. 1. 0. ] [4. 0. 0.2]]
This gives the following algorithm for LU factorization:
Reduce to an echelon form by a sequence of row replacement operations.
This is just Gaussian Elimination! But: keep track of the elementary row operations you perform along the way.
Place entries in such that the same sequence of row operations reduces to .
If we can do this step efficiently, then the cost of LU factorization will be dominated by Gaussian Elimination itself.
The fact is that constructing can be done efficiently by a simple modification of Gaussian Elimination. So, LU decomposition takes time only