#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None
mp.rcParams['animation.html'] = 'jshtml';
Homework 5 due Friday 3/17
Office hours
Weekly reading and viewing assignments
[This lecture is based on lecture notes from Prof. Crovella's CS 132 and the fast.ai numerical linear algebra course.]
We calculated the computational cost of matrix operations in terms of floating point operations (flops)
We started the study of matrix factorizations
We looked at Gaussian elimination in a new way
Example: find the matrix that implements the operation 'add -4 times row 1 to row 3'
We can perform this transformation starting from :
We introduced our first factorization: where
The fact that is in row echelon form may suggest to you (correctly!) that we could get it from by a sequence of row operations.
For now, let us suppose that:
Now, if you consider an elementary matrix that implements such a row reduction, you will see that it will have 1s on the diagonal, and an additional entry somewhere below the diagonal.
For example, recall the scaling matrix and addition matrix from yesterday:
These elementary matrices are both unit lower triangular matrices! (We are ignoring row interchanges for now because they are not lower triangular. We'll deal with row interchanges later.)
So if there is a sequence of elementary row operations that convert to , then there is a set of unit lower triangular elementary matrices such that
We know that elementary matrices are invertible, and the product of invertible matrices is invertible, so:
where = . Remember: the inverse of a product equals the product of inverses, except in the opposite order.
Fact. The product of unit lower triangular matrices is unit lower triangular. Additionally, the inverse of a unit lower triangular matrix is unit lower triangular.
(Think about how to prove this statement on your own.)
So we can conclude that as constructed from , is unit lower triangular.
Hence, we have defined the LU decomposition based on Gaussian Elimination.
We have rewritten Gaussian Elimination as:
and shown that the so defined is unit lower triangular.
Let's take stock of what this all means: the LU decomposition is a way of capturing the application of Gaussian Elimination to .
It incorporates both the process of performing Gaussian Elimination, and the result:
Recall that the motivation for developing the LU decomposition is that it is more efficient than matrix inversion. So we don't want to have to invert in the standard way in order to find .
Here we have some good news:
Once again, we can verify these calculations using Numpy.
E2inv = np.array([[1,0,0],[0,1,0],[0,0,1/5]])
E3inv = np.array([[1,0,0],[0,1,0],[4,0,1]])
print("E2inv * E3inv ="); print(E2inv @ E3inv)
print("\nE3inv * E2inv ="); print(E3inv @ E2inv)
E2inv * E3inv = [[1. 0. 0. ] [0. 1. 0. ] [0.8 0. 0.2]] E3inv * E2inv = [[1. 0. 0. ] [0. 1. 0. ] [4. 0. 0.2]]
This gives the following algorithm for LU factorization:
Reduce to an echelon form by a sequence of row replacement operations.
This is just Gaussian Elimination! But: keep track of the elementary row operations you perform along the way.
Place entries in such that the same sequence of row operations reduces to .
If we can do this step efficiently, then the cost of LU factorization will be dominated by Gaussian Elimination itself.
The fact is that constructing can be done efficiently by a simple modification of Gaussian Elimination. So, LU decomposition takes time only
Let's return to the motivation for developing the LU factorization.
We've seen that LU decomposition is essentially Gaussian Elimination, and as such, it doesn't take time much longer than Gaussian Elimination.
Now we want to show that, once you have the LU decomposition of a matrix , then the system can be solved for any in time that is proportional to .
Informally, what we are going to do is to use to do a special, very efficient version of the forward step of Gaussian Elimination, and then use in the usual way to do backsubstitution.
We can write these two steps concisely as follows:
When , the equation can be written as
Let's take this apart, and write for . Then we can find by solving the pair of equations:
The idea is that we first solve for then solve for
The key observation: each equation is fast to solve because and are each triangular.
Example. Given the following LU decomposition of :
use this LU factorization of to solve , where
Solution. To solve note that the arithmetic takes place only in the augmented column (column 5). The zeros below each pivot in are created automatically by the choice of row operations.
Next, for (the "backward" phase) the row reduction is again streamlined:
What we have done is:
# image credit: Lay, 4th edition
display(Image("images/07-lu.jpeg", width=800))
Analysis. Both the forward and backward phases of solving a system as have flop counts of .
Therefore the time consuming step is actually doing the factorization, which as we've seen is essentially Gaussian Elimination, and therefore requires flops.
Hence we have found that by using the LU decomposition, one can solve a series of systems all involving the same in flops. By contrast:
Up until now we have assumed that the Gaussian Elimination we used to define only involves scalar multiplications or adding a multiple of a row to a row below it.
However, in real situations, we sometime need to exchange two rows.
But there is another, more subtle reason having to do with numerical accuracy.
We make the following observation: in general, we would like to avoid dividing by a small number.
Here is why. Consider the problem of computing where and are scalars.
Let's say there is some small error in the value of , call it
That means that what we are really computing is .
Note that is the correct value, but that what we compute is off by . Now, if is a very small number, then the error in the result () will be large.
Hence we would like to avoid dividing by small numbers whenever possible.
Now: note that in performing Gaussian Elimination, we divide each row by the value of its pivot.
What this suggests is that we would like to avoid having small pivots.
For example, consider the matrix
This is a valid decomposition. But it creates several large and small numbers, which as we know can lead to arithmetic errors. It would be better to interchange the two rows of A first and then perform decomposition on that matrix.
Note that this factorization is not quite correct! The bottom right entry is now rather than the correct value of . But this error will vanish in the computer's approximate addition anyway: remember that the computer only remembers numbers to 16 digits of precision. And this way we have only one really small number in our factorization, and no really large numbers.
This factorization is more stable, even if it is less accurate.
A stable algorithm gives nearly the right answer to nearly the right question (Trefethen, pg 104)
For instance, if we start with the matrix
then it is best to interchange the two rows because it's better to start with 1 as a pivot rather than .
There is a simple way to address this. In processing any particular row, we can avoid having a small pivot by interchanging the current row with one of those below it.
We would like to exchange the current row with the row that has the largest absolute value of its pivot. This algorithmic technique is called "partial pivoting."
Now, a row interchange is an elementary row operation, and can be implemented by an elementary matrix. This elementary matrix is the identity with its corresponding rows interchanged.
An elementary matrix that exchanges rows is called a permutation matrix. The product of permutation matrices is a permutation matrix.
Hence, the net result of all the partial pivoting done during Gaussian Elimination can be expressed in a single permutation matrix .
This means that the final factorization of is:
You can read this equation in three ways:
, so
#
fig = ut.three_d_figure((0,0), fig_desc = 'H = Span{a1, a2, a3}',
xmin = -10, xmax = 10, ymin = -10, ymax = 10, zmin = -10, zmax = 10, qr = qr_setting)
a1 = [-8.0, 8.0, 5.0]
a2 = [3.0, 2.0, -2.0]
a3 = 2.5 * np.array(a2)
fig.text(a1[0]+.5, a1[1]+.5, a1[2]+.5, r'$\bf a_1$', 'a_1', size=18)
fig.text(a3[0]+.5, a3[1]+.5, a3[2]+.5, r'$\bf a_3$', 'a_3', size=18)
fig.text(a2[0]+.5, a2[1]+.5, a2[2]+.5, r'$\bf a_2$', 'a_2', size=18)
fig.plotSpan(a1, a2,'Green')
fig.plotPoint(a1[0], a1[1], a1[2],'r')
fig.plotPoint(a3[0], a3[1], a3[2],'r')
fig.plotPoint(a2[0], a2[1], a2[2],'r')
fig.plotLine([[0, 0, 0], a3], 'r', '--')
fig.plotLine([[0, 0, 0], a1], 'r', '--')
# fig.plotPoint(a3[0], a3[1], a3[2],'r')
fig.text(0.1, 0.1, -3, r'$\bf 0$', '0', size=12)
fig.plotPoint(0, 0, 0, 'b')
# fig.set_title(r'$H$ = Span$\{{\bf a}_1, {\bf a}_2, {\bf a}_3\}$')
fig.text(9, -9, -7, r'H', 'H', size = 16)
img = fig.dont_save()
[This lecture is based on Prof. Crovella's CS 132 lecture notes.]
We have seen in this course that invertible matrices are "nicer" to work with when solving systems of linear equations. For instance, if a matrix is invertible, then has a unique solution for any .
This suggests a deep connection between the invertibility of and the nature of the linear system
In fact, we are now at the point where we can collect together in a fairly complete way much of what we have learned about matrices and linear systems.
This remarkable collection of ten interrelated properties is called the Invertible Matrix Theorem (IMT). (And we will grow this list further in future lectures.)
Invertible Matrix Theorem. Let by a square matrix.
Then the following statements are equivalent; that is, they are either all true or all false.
We provided arguments above to show that if is invertible, then all the other statements are true.
In fact, the converse holds as well: if is not invertible, then all the other statements are false. We will skip the proof of the converse, but it's not difficult.
The Invertible Matrix Theorem has wide-ranging implications.
It divides the set of all matrices into two disjoint classes:
The power of the IMT lies in the connections it provides among so many important concepts.
For example, notice how it connects linear independence of the columns of a matrix to the existence of solutions to equations of the form
This allows us to bring many tools to bear as needed to solve a problem.
Example. Let .
Decide if has a solution for all ... or in other words if the linear transformation defined by is onto.
Solution. The determinant of is .
So is not invertible, so does not have a solution for all .
Example. Decide if is invertible:
(If we knew the determinant of , then we could decide whether it is invertible by checking if . But we don't yet know how to take the determinant of a matrix.)
Solution.
has three pivot positions and hence is invertible, by the IMT.
Note. Keep in mind: while the IMT is quite powerful, it does not completely settle issues that arise with respect to
This is because it only applies to square matrices.
So if is nonsquare, then we can't use the IMT to conclude anything about the existence or nonexistence of solutions to
A linear transformation is invertible if there exists a function such that
and
Theorem. Let be a linear transformation and let be the standard matrix for . Then,
is invertible if and only if is an invertible matrix.
In that case, the linear transformation given by is the unique function satisfying the definition.
Let's look at some invertible and non-invertible linear transformations.
Example. Here is a horizontal contraction
#
square = np.array([[0.0,1,1,0],[1,1,0,0]])
A = np.array(
[[0.5, 0],
[ 0, 1]])
# print("A ="); print(A)
dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square,'r')
# Latex(r'Horizontal Contraction')
Its determinant is so this linear transformation is invertible. Its inverse is:
Clearly, just as contracted the direction by 0.5, will expand the direction by 2.
Example. Consider the projection onto the axis
The determinant of is zero, so this linear transformation is not invertible.
#
A = np.array(
[[0,0],
[0,1]])
print("A = "); print(A)
ax = dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square)
ax.arrow(1.0,1.0,-0.9,0,head_width=0.15, head_length=0.1, length_includes_head=True);
ax.arrow(1.0,0.0,-0.9,0,head_width=0.15, head_length=0.1, length_includes_head=True);
A = [[0 0] [0 1]]
By the IMT, there are many equivalent ways to look at this: