#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None
mp.rcParams['animation.html'] = 'jshtml';
Test 1 grades are out
Homework 4 is due today
Homework 5 will be posted later today and due after the break
Aggarwal Section 2.4-2.5
Definition. A mapping is said to be onto if each in is the image of at least one in .
In other words, is onto if every element of its codomain is in its range.
#
display(Image("images/05-image.jpg", width=1000))
Here, we see that maps points in to a plane lying within . That is, the range of is a strict subset of the codomain of .
So is not onto .
Definition. A mapping is said to be one-to-one if each in is the image of at most one in .
#
display(Image("images/06-onto.jpeg", width=1000))
#
A = np.array([[2, 1],[0.75, 1.5]])
print("A = "); print(A)
ax = dm.plotSetup(-1, 4, -1, 4)
dm.plotSquare(square, 'r')
dm.plotSquare(A @ square);
A = [[2. 1. ] [0.75 1.5 ]]
#
a = 2
b = 1
c = 0.75
d = 1.5
v1 = np.array([a, c])
v2 = np.array([b, d])
A = np.column_stack([v1, v2])
ax = dm.plotSetup(-1, 4, -1, 4)
# red triangles
dm.plotShape(np.array([[0, 0], [0, d], [b, d]]).T, 'r')
plt.text(0.2, 1, r'$\frac{1}{2}bd$', size = 12)
dm.plotShape(np.array([[a, c], [a+b, c+d], [a+b, c]]).T, 'r')
plt.text(2.5, 1, r'$\frac{1}{2}bd$', size = 12)
# gray triangles
dm.plotShape(np.array([[b, d], [b, c+d], [a+b, c+d]]).T, 'k')
plt.text(1.2, 1.9, r'$\frac{1}{2}ac$', size = 12)
dm.plotShape(np.array([[0, 0], [a, c], [a, 0]]).T, 'k')
plt.text(1.2, 0.15, r'$\frac{1}{2}ac$', size = 12)
# green squares
dm.plotShape(np.array([[a, 0], [a, c], [a+b, c], [a+b, 0]]).T, 'g')
plt.text(0.2, 1.9, r'$bc$', size = 12)
dm.plotShape(np.array([[0, d], [0, c+d], [b, c+d], [b, d]]).T, 'g')
plt.text(2.5, 0.15, r'$bc$', size = 12)
#
dm.plotSquare(A @ square)
plt.text(v1[0]-0.5, v1[1]+0.05, '(a, c)', size = 12)
plt.text((v1+v2)[0]+0.1, (v1+v2)[1]+0.1, '(a+b, c+d)', size = 12)
plt.text(v2[0]+0.1, v2[1]-0.15, '(b, d)', size = 12);
So we conclude that when we use a linear transformation
the area of a unit square (or any shape) is scaled by a factor of .
This quantity is a fundamental property of the matrix .
So, we give it a name: it is the determinant of .
We denote it as .
So, for a matrix ,
Definition. A matrix is called invertible if there exists a matrix such that
In that case, is called the inverse of .
#
display(Image("images/06-inverse.png", width=1000))
Clearly, must also be square and the same size as .
The inverse of is denoted
A matrix that is not invertible is called a singular matrix. A matrix that is invertible is called nonsingular.
Here are some nice properties of the matrix inversion.
Theorem.
Example.
If and , then:
and:
so we conclude that
How do we find ? Let's first look at the special case of matrices and then consider the general case of any square matrix.
Theorem. Let =
If , then
If , then
Notice that this theorem tells us, for matrices, exactly which ones are invertible... namely, those that have nonzero determinant.
Example. Given a matrix , if the columns of are linearly dependent, is invertible?
Solution. If the columns of are linearly dependent, then at least one of the columns is a multiple of the other.
Let the multiplier be
Then we can express as:
The determinant of is
So a matrix with linearly dependent columns is not invertible.
Now let's look at a general method for computing the inverse of . Recall our definition of matrix multiplication: is the matrix formed by multiplying times each column of .
Let's look at the equation
Let's call the columns of =
We know what the columns of are:
So:
Notice that we can break this up into separate problems:
(This is a common trick ... make sure you understand why it works!)
So here is a general way to compute the inverse of :
If any of the systems are inconsistent or has an infinite solution set, then does not exist.
This general strategy leads to an algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix . The result in reduced row echelon form will be .
We can confirm that we found by multiplying them together:
But solving all of these linear systems is complicated.
Any time you need to invert a matrix larger than you may need to use a calculator or computer.
To invert a matrix in Python/numpy,
use the function np.linalg.inv().
For example:
import numpy as np
A = np.array(
[[ 2.0, 5.0],
[-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)
A = [[ 2. 5.] [-3. -7.]] B = [[-7. -5.] [ 3. 2.]]
What do you think happens if you call np.linalg.inv()
on a matrix that is not invertible?
A = np.array([[2.,4.],[2.,4.]])
np.linalg.inv(A) # try uncommenting this line
--------------------------------------------------------------------------- LinAlgError Traceback (most recent call last) <ipython-input-10-55215ad851ba> in <module> 1 A = np.array([[2.,4.],[2.,4.]]) ----> 2 np.linalg.inv(A) # try uncommenting this line ~/.local/lib/python3.8/site-packages/numpy/core/overrides.py in inv(*args, **kwargs) ~/.local/lib/python3.8/site-packages/numpy/linalg/linalg.py in inv(a) 543 signature = 'D->D' if isComplexType(t) else 'd->d' 544 extobj = get_linalg_error_extobj(_raise_linalgerror_singular) --> 545 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj) 546 return wrap(ainv.astype(result_t, copy=False)) 547 ~/.local/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag) 86 87 def _raise_linalgerror_singular(err, flag): ---> 88 raise LinAlgError("Singular matrix") 89 90 def _raise_linalgerror_nonposdef(err, flag): LinAlgError: Singular matrix
The right way to handle this is:
A = np.array([[2.,4.],[2.,4.]])
try:
np.linalg.inv(A)
except np.linalg.LinAlgError:
print('Oops, looks like A is singular!')
Oops, looks like A is singular!
Let's discover a new way to solve a linear system based on computing the inverse of a square matrix (if it exists!).
Theorem. If is an invertible matrix, then for each in the equation has the unique solution
Proof. Let's multiply both sides by on the left:
This very simple, powerful theorem gives us a new way to solve a linear system.
Furthermore, this theorem tells us the number of solutions to the system. In general, we know that a system of equations in variables can have:
But this theorem says that if is invertible, then the system has a unique solution.
Example. Solve the system:
Rewrite this system as a matrix equation
The determinant of is which is nonzero, so has an inverse.
According to our formula, the inverse of is:
So the solution is:
We know that matrix-vector multiplication should be interpreted as a linear transformation over .
Question. What does the product of two matrices mean, geometrically?
We determined that the matrix
implements a 180 degree rotation through the origin.
#
note = dm.mnote()
C = np.array(
[[-1, 0],
[ 0,-1]])
print("C ="); print(C)
dm.plotSetup()
dm.plotShape(note)
dm.plotShape(C @ note,'r')
C = [[-1 0] [ 0 -1]]
But notice that we could have accomplished this another way:
#
B = np.array(
[[ 1, 0],
[ 0,-1]])
A = np.array(
[[-1, 0],
[ 0, 1]])
#print("A ="); print(A)
#print("B ="); print(B)
dm.plotSetup()
dm.plotShape(note)
dm.plotShape(B @ note,'g')
dm.plotShape(A @ (B @ note),'r')
So, another way to reflect point through the origin would be:
As a result, We also know that . And this is true for all vectors .
Therefore, multiplication of matrices corresponds to composition of linear transformations.
That is: is the result of applying the transformation corresponding to matrix , and then transformation . (Just like functions: the order matters, and we read them from right to left!)
Example. Let and . Show that .
You could calculate using the algebraic rules for matrix multiplication, but there is an easier way to think about this question geometrically.
Note that maps and , so it corresponds to a 90 degree counterclockwise rotation . Similarly, corresponds to a 270 degree counterclockwise rotation .
What happens if we perform a 90 degree rotation three times in a row? The result of the function must be a 270 degree rotation. In other words, .
Because composition of linear transformations corresponds to multiplication of matrices, it follows immediately that .
We can check this computationally.
A = np.array([[0, -1], [1, 0]])
A @ A @ A
array([[ 0, 1], [-1, 0]])
A factorization of a matrix is an equation that expresses as a product of two or more matrices.
The essential difference with what we have done so far is that we have been given factors ( and ) and then computed their product .
In a factorization problem, you are given , and you want to find and -- that meet some conditions.
There are a number of reasons one may want to factor a matrix.
Let's motivate this problem by calculating the cost of matrix operations:
Then, we will explore a new, faster way to solve linear systems.
Recall that in Python/numpy:
C = A @ B
denotes matrix multiplication.
We can manually calculate the product of two matrices and using the inner product rule.
Question. What is the computational complexity of matrix multiplication?
The machine learning algorithms we explore later in the course all rely heavily on matrix muliplication. So, minimizing the time required to do matrix multiplication is immensely important.
For two square matrices and , consider the definition that uses inner product:
Each element of the product requires floating point operations ( multiplications and additions).
There are elements of , so the overall computation requires
operations using this algorithm.
Note: There do exist faster algorithms for computing matrix multiplication that run in time where , but let's stick with this simple algorithm for now. [Image: Wikipedia]
That's not particularly good news; for two matrices of size 10,000 10,000 (which is not particularly large in practice), this is 2 trillion operations (2 teraflops).
Question. What is the computational complexity of matrix-vector multiplication?
We know that matrix-vector multiplication requires inner products, each of size .
So, matrix-vector multiplication requires
operations.
When you look at a mathematical expression involving matrices, think carefully about what it means and how you might efficiently compute it.
Example. What is the most efficient way to compute ?
Here are your choices:
First compute , then compute :
First compute , then compute :
Although matrix multiplication is computationally demanding, it has a wonderful property: it is highly parallel.
That is, the computation needed for each element does not require computing the other elements.
(This is not true, for example, for Gaussian elimination; think about the role of a pivot.)
This means that if we have multiple processors, and each has access to and , the work can be divided up very cleanly.
For example, let's say you have processors. Then each processor can independently compute one column of the result, without needing to know anything about what the other processors are doing.
Since all processors are working in parallel, the elapsed time is reduced from down to
Even better, say you have processors. Then each processor can compute a single element of the result, as .
Then the elapsed time is reduced all the way down to .
This sort of strategy is used for huge computations like web search and weather modeling.
Recall that a matrix is called invertible if there exists a matrix such that
In that case, is called the inverse of . Inverses are only defined for square matrices.
#
display(Image("images/06-inverse.png", width=1000))
Matrix inversion has many uses. For instance, if a matrix is invertible then we can solve the linear system by multiplying by on the left.
Python
contains a function np.linalg.inv()
that can compute the inverse of a square matrix of any size. For example:
import numpy as np
A = np.array(
[[ 2.0, 5.0],
[-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)
A = [[ 2. 5.] [-3. -7.]] B = [[-7. -5.] [ 3. 2.]]
Question: How would you write a computer program to perform this inverse operation? How long will it take to run?
In Lecture 5, we calculated a special formula for finding the inverse of a matrix.
We also found general algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix . The result in reduced row echelon form will be .
We've seen how to calculate the cost of Gaussian elimination for
Here, the augmented matrix is of size . For this matrix, let's consider the cost of performing an elementary row operation like adding a scalar multiple of row 1 into row 2. We need
So the total cost is floating point operations (or flops).
# Image credit: Prof. Mark Crovella
display(Image("images/03-ge1.jpg", width=800))
As you perform more elimination steps, you can work with smaller sub-matrices. In total, we calculated in Lecture 3 that Gaussian elimination on costs
For matrix inversion, we need to use a larger augmented matrix whose size is . For this matrix, an elementary row operation now costs multiplications and additions, for a total of flops.
If you go back to the derivation of the cost of Gaussian Elimination in Lecture 3 and recalculate the results for a wider augmented matrix , you will find that the costs are now:
As a result, the total cost to calculate the inverse is flops. This is three times the cost of an ordinary Gaussian elimination.
Hence, if I give you a matrix then you can:
Which option is better depends on how linear systems you will solve using the same matrix .
After break, we will find a new way to solve matrix operations that is the "best of both options": given a matrix , you can perform a one-time cost of flops and then quickly solve any matrix equation of the form .