In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • Test 1 grades are out

  • Homework 4 is due today

  • Homework 5 will be posted later today and due after the break

  • Aggarwal Section 2.4-2.5

  • Watch 3Blue1Brown video 5 and video 6

Recap¶

Onto¶

Definition. A mapping T:Rn→RmT:Rn→Rm is said to be onto RmRm if each bb in RmRm is the image of at least one xx in RnRn.

In other words, TT is onto if every element of its codomain is in its range.

In [2]:
#
display(Image("images/05-image.jpg", width=1000))

Here, we see that TT maps points in R2R2 to a plane lying within R3R3. That is, the range of TT is a strict subset of the codomain of TT.

So TT is not onto R3R3.

One-to-one¶

Definition. A mapping T:Rn→RmT:Rn→Rm is said to be one-to-one if each bb in RmRm is the image of at most one xx in RnRn.

In [3]:
#
display(Image("images/06-onto.jpeg", width=1000))

Determinant¶

In [61]:
#
A = np.array([[2, 1],[0.75, 1.5]])
print("A = "); print(A)
ax = dm.plotSetup(-1, 4, -1, 4)
dm.plotSquare(square, 'r')
dm.plotSquare(A @ square);
A = 
[[2.   1.  ]
 [0.75 1.5 ]]
In [63]:
#
a = 2
b = 1
c = 0.75
d = 1.5
v1 = np.array([a, c])
v2 = np.array([b, d])
A = np.column_stack([v1, v2])
ax = dm.plotSetup(-1, 4, -1, 4)
# red triangles
dm.plotShape(np.array([[0, 0], [0, d], [b, d]]).T, 'r')
plt.text(0.2, 1, r'$\frac{1}{2}bd$', size = 12)
dm.plotShape(np.array([[a, c], [a+b, c+d], [a+b, c]]).T, 'r')
plt.text(2.5, 1, r'$\frac{1}{2}bd$', size = 12)
# gray triangles
dm.plotShape(np.array([[b, d], [b, c+d], [a+b, c+d]]).T, 'k')
plt.text(1.2, 1.9, r'$\frac{1}{2}ac$', size = 12)
dm.plotShape(np.array([[0, 0], [a, c], [a, 0]]).T, 'k')
plt.text(1.2, 0.15, r'$\frac{1}{2}ac$', size = 12)
# green squares
dm.plotShape(np.array([[a, 0], [a, c], [a+b, c], [a+b, 0]]).T, 'g')
plt.text(0.2, 1.9, r'$bc$', size = 12)
dm.plotShape(np.array([[0, d], [0, c+d], [b, c+d], [b, d]]).T, 'g')
plt.text(2.5, 0.15, r'$bc$', size = 12)
#
dm.plotSquare(A @ square)
plt.text(v1[0]-0.5, v1[1]+0.05, '(a, c)', size = 12)
plt.text((v1+v2)[0]+0.1, (v1+v2)[1]+0.1, '(a+b, c+d)', size = 12)
plt.text(v2[0]+0.1, v2[1]-0.15, '(b, d)', size = 12);

So we conclude that when we use a linear transformation

A=[abcd]A=[abcd]

the area of a unit square (or any shape) is scaled by a factor of ad−bcad−bc.

This quantity is a fundamental property of the matrix AA.

So, we give it a name: it is the determinant of AA.

We denote it as det(A)det(A).

So, for a 2×22×2 matrix A=[abcd]A=[abcd],

det(A)=ad−bc.det(A)=ad−bc.

Inverse of a matrix¶

Definition. A matrix AA is called invertible if there exists a matrix CC such that

AC=I and CA=I.AC=I and CA=I.

In that case, CC is called the inverse of AA.

In [5]:
#
display(Image("images/06-inverse.png", width=1000))

Clearly, CC must also be square and the same size as AA.

The inverse of AA is denoted A−1.A−1.

A matrix that is not invertible is called a singular matrix. A matrix that is invertible is called nonsingular.

Here are some nice properties of the matrix inversion.

Theorem.

  • If AA is an invertible matrix, then A−1A−1 is invertible, and
(A−1)−1=A.(A−1)−1=A.
  • If AA is an invertible matrix, then so is AT,AT, and the inverse of ATAT is the transpose of A−1.A−1.
(AT)−1=(A−1)T.(AT)−1=(A−1)T.
  • If AA and BB are n×nn×n invertible matrices, then so is AB,AB, and the inverse of ABAB is the product of the inverses of AA and BB in the reverse order.
(AB)−1=B−1A−1.(AB)−1=B−1A−1.

Example.

If A=[25−3−7]A=[25−3−7] and C=[−7−532]C=[−7−532], then:

AC=[25−3−7][−7−532]=[1001],AC=[25−3−7][−7−532]=[1001],

and:

CA=[−7−532][25−3−7]=[1001],CA=[−7−532][25−3−7]=[1001],

so we conclude that C=A−1.C=A−1.

Lecture 18: Matrix Factorization¶

18.1 Finding the inverse of a matrix¶

How do we find A−1A−1? Let's first look at the special case of 2×22×2 matrices and then consider the general case of any square matrix.

The 2×22×2 case¶

Theorem. Let AA = [abcd].[abcd].

If det(A)≠0det(A)≠0, then

  • AA is invertible and
  • A−1=1det(A)[d−b−ca].A−1=1det(A)[d−b−ca].

If det(A)=0det(A)=0, then

  • AA is not invertible.

Notice that this theorem tells us, for 2×22×2 matrices, exactly which ones are invertible... namely, those that have nonzero determinant.

Example. Given a 2×22×2 matrix AA, if the columns of AA are linearly dependent, is AA invertible?

Solution. If the columns of AA are linearly dependent, then at least one of the columns is a multiple of the other.

Let the multiplier be m.m.

Then we can express AA as: [amabmb].[amabmb].

The determinant of AA is a(mb)−b(ma)=0.a(mb)−b(ma)=0.

So a 2×22×2 matrix with linearly dependent columns is not invertible.

Matrices larger than 2×22×2.¶

Now let's look at a general method for computing the inverse of AA. Recall our definition of matrix multiplication: ABAB is the matrix formed by multiplying AA times each column of BB.

AB=[Ab1…Abn].AB=[Ab1…Abn].

Let's look at the equation

AA−1=I.AA−1=I.

Let's call the columns of A−1A−1 = [x1,x2,…,xn].[x1,x2,…,xn].

We know what the columns of II are: [e1,e2,…,en].[e1,e2,…,en].

So:

AA−1=A[x1,x2,…,xn]=[e1,e2,…,en].AA−1=A[x1,x2,…,xn]=[e1,e2,…,en].

Notice that we can break this up into nn separate problems:

Ax1=e1Ax1=e1Ax2=e2Ax2=e2⋮⋮Axn=enAxn=en

(This is a common trick ... make sure you understand why it works!)

So here is a general way to compute the inverse of AA:

  • Solve the linear system Ax1=e1Ax1=e1 to get the first column of A−1.A−1.
  • Solve the linear system Ax2=e2Ax2=e2 to get the second column of A−1.A−1.
  • ……
  • Solve the linear system Axn=enAxn=en to get the last column of A−1.A−1.

If any of the systems are inconsistent or has an infinite solution set, then A−1A−1 does not exist.

Using Gaussian Elimination to find A−1A−1¶

This general strategy leads to an algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix [A∣I][A∣I]. The result in reduced row echelon form will be [I∣A−1][I∣A−1].

A=⎡⎢⎣12−1−2011−10⎤⎥⎦A=[12−1−2011−10][A|I]=⎡⎢ ⎢⎣12−1100−2010101−10001⎤⎥ ⎥⎦[A|I]=[12−1100−2010101−10001][I|A−1]=⎡⎢ ⎢⎣100112010111001234⎤⎥ ⎥⎦[I|A−1]=[100112010111001234]

We can confirm that we found A−1A−1 by multiplying them together:

⎡⎢⎣12−1−2011−10⎤⎥⎦⎡⎢⎣112111234⎤⎥⎦=⎡⎢⎣100010001⎤⎥⎦[12−1−2011−10][112111234]=[100010001]

Computing inverse matrices¶

But solving all of these linear systems is complicated.

Any time you need to invert a matrix larger than 2×2,2×2, you may need to use a calculator or computer.

To invert a matrix in Python/numpy, use the function np.linalg.inv(). For example:

In [87]:
import numpy as np
A = np.array(
    [[ 2.0, 5.0],
     [-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)
A =
 [[ 2.  5.]
 [-3. -7.]]
B = 
 [[-7. -5.]
 [ 3.  2.]]

What do you think happens if you call np.linalg.inv() on a matrix that is not invertible?

In [10]:
A = np.array([[2.,4.],[2.,4.]])
np.linalg.inv(A) # try uncommenting this line
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-10-55215ad851ba> in <module>
      1 A = np.array([[2.,4.],[2.,4.]])
----> 2 np.linalg.inv(A) # try uncommenting this line

~/.local/lib/python3.8/site-packages/numpy/core/overrides.py in inv(*args, **kwargs)

~/.local/lib/python3.8/site-packages/numpy/linalg/linalg.py in inv(a)
    543     signature = 'D->D' if isComplexType(t) else 'd->d'
    544     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 545     ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    546     return wrap(ainv.astype(result_t, copy=False))
    547 

~/.local/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
     86 
     87 def _raise_linalgerror_singular(err, flag):
---> 88     raise LinAlgError("Singular matrix")
     89 
     90 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

The right way to handle this is:

In [3]:
A = np.array([[2.,4.],[2.,4.]])
try:
    np.linalg.inv(A)
except np.linalg.LinAlgError:
    print('Oops, looks like A is singular!')
Oops, looks like A is singular!

18.2 Solving linear systems using matrix inverses¶

Let's discover a new way to solve a linear system based on computing the inverse of a square matrix AA (if it exists!).

Theorem. If AA is an invertible n×nn×n matrix, then for each bb in Rn,Rn, the equation Ax=bAx=b has the unique solution A−1b.A−1b.

Proof. Let's multiply both sides by A−1A−1 on the left:

A−1(Ax)=A−1b(A−1A)x=A−1bIx=A−1bx=A−1b(1)(2)(3)(4)(1)A−1(Ax)=A−1b(2)(A−1A)x=A−1b(3)Ix=A−1b(4)x=A−1b

This very simple, powerful theorem gives us a new way to solve a linear system.

Furthermore, this theorem tells us the number of solutions to the system. In general, we know that a system of nn equations in nn variables can have:

  • No solution (it is inconsistent)
  • A unique solution
  • Infinitely many solutions

But this theorem says that if AA is invertible, then the system has a unique solution.

Example. Solve the system:

3x1+4x2=35x1+6x2=73x1+4x2=35x1+6x2=7

Rewrite this system as a matrix equation Ax=b:Ax=b:

[3456]x=[37].[3456]x=[37].

The determinant of AA is 3(6)−4(5)=−2,3(6)−4(5)=−2, which is nonzero, so AA has an inverse.

According to our 2×22×2 formula, the inverse of AA is:

A−1=1−2[6−4−53]=[−325/2−3/2].A−1=1−2[6−4−53]=[−325/2−3/2].

So the solution is:

x=A−1b=[−325/2−3/2][37]=[5−3].x=A−1b=[−325/2−3/2][37]=[5−3].

18.3 Matrix Multiplication and Factorization¶

We know that matrix-vector multiplication should be interpreted as a linear transformation over RnRn.

Question. What does the product of two matrices mean, geometrically?

We determined that the matrix

C=[−100−1]C=[−100−1]

implements a 180 degree rotation through the origin.

In [80]:
#
note = dm.mnote()
C = np.array(
    [[-1, 0],
     [ 0,-1]])
print("C ="); print(C)
dm.plotSetup()
dm.plotShape(note)
dm.plotShape(C @ note,'r')
C =
[[-1  0]
 [ 0 -1]]

But notice that we could have accomplished this another way:

  • First reflect through the x1x1 axis, using multiplication by B=[100−1]B=[100−1].
  • Then reflect through the x2x2 axis, using multiplication by A=[−1001]A=[−1001].
In [85]:
#
B = np.array(
    [[ 1, 0],
     [ 0,-1]])
A = np.array(
    [[-1, 0],
     [ 0, 1]])
#print("A ="); print(A)
#print("B ="); print(B)
dm.plotSetup()
dm.plotShape(note)
dm.plotShape(B @ note,'g')
dm.plotShape(A @ (B @ note),'r')

So, another way to reflect point uu through the origin would be:

  • v=Buv=Bu
  • Followed by w=Av.w=Av.

As a result, w=A(Bu).w=A(Bu). We also know that w=Cuw=Cu. And this is true for all vectors uu.

Therefore, multiplication of matrices corresponds to composition of linear transformations.

That is: CC is the result of applying the transformation corresponding to matrix BB, and then transformation AA. (Just like functions: the order matters, and we read them from right to left!)

[−1001]A[100−1]B=[−100−1]C.[−1001]⏟A[100−1]⏟B=[−100−1]⏟C.

Example. Let A=[0−110]A=[0−110] and B=[01−10]B=[01−10]. Show that A3=BA3=B.

You could calculate A3A3 using the algebraic rules for matrix multiplication, but there is an easier way to think about this question geometrically.

Note that AA maps e1↦e2e1↦e2 and e2↦−e1e2↦−e1, so it corresponds to a 90 degree counterclockwise rotation T90T90. Similarly, BB corresponds to a 270 degree counterclockwise rotation T270T270.

What happens if we perform a 90 degree rotation three times in a row? The result of the function T90(T90(T90(x)))T90(T90(T90(x))) must be a 270 degree rotation. In other words, T90∘T90∘T90=T270T90∘T90∘T90=T270.

Because composition of linear transformations corresponds to multiplication of matrices, it follows immediately that A3=BA3=B.

We can check this computationally.

In [6]:
A = np.array([[0, -1], [1, 0]])
A @ A @ A
Out[6]:
array([[ 0,  1],
       [-1,  0]])

Matrix Factorization¶

A factorization of a matrix AA is an equation that expresses AA as a product of two or more matrices.

A=BC.A=BC.

The essential difference with what we have done so far is that we have been given factors (BB and CC) and then computed their product AA.

In a factorization problem, you are given AA, and you want to find BB and CC -- that meet some conditions.

There are a number of reasons one may want to factor a matrix.

  • Recasting AA into a form that makes computing with AA faster.
  • Recasting AA into a form that makes working with AA easier.
  • Recasting AA into a form that exposes important properties of AA.

Let's motivate this problem by calculating the cost of matrix operations:

  • Matrix multiplication using the inner product method
  • Matrix inversion using Gaussian elimination

Then, we will explore a new, faster way to solve linear systems.

18.4 Computational Complexity of Matrix Multiplication¶

Recall that in Python/numpy:

C = A @ B

denotes matrix multiplication.

We can manually calculate the product of two matrices AA and BB using the inner product rule.

⎡⎢⎣a11a12a13∗∗∗∗∗∗⎤⎥⎦⎡⎢⎣b11∗b21∗b31∗⎤⎥⎦=⎡⎢⎣a11b11+a12b21+a13b31∗∗∗∗∗⎤⎥⎦[a11a12a13∗∗∗∗∗∗][b11∗b21∗b31∗]=[a11b11+a12b21+a13b31∗∗∗∗∗]

Question. What is the computational complexity of matrix multiplication?

The machine learning algorithms we explore later in the course all rely heavily on matrix muliplication. So, minimizing the time required to do matrix multiplication is immensely important.

For two square n×nn×n matrices AA and BB, consider the definition that uses inner product:

(AB)ij=n∑k=1aikbkj.(AB)ij=∑k=1naikbkj.

Each element of the product ABAB requires 2n2n floating point operations (nn multiplications and nn additions).

There are n2n2 elements of ABAB, so the overall computation requires

2n⋅n2=2n32n⋅n2=2n3

operations using this algorithm.

Note: There do exist faster algorithms for computing matrix multiplication that run in time O(nω)O(nω) where 2≤ω<32≤ω<3, but let's stick with this simple algorithm for now. [Image: Wikipedia]

Improvements in matrix multiplication over time. Source: Wikipedia.

That's not particularly good news; for two matrices of size 10,000 ×× 10,000 (which is not particularly large in practice), this is 2 trillion operations (2 teraflops).

Question. What is the computational complexity of matrix-vector multiplication?

We know that matrix-vector multiplication requires nn inner products, each of size nn.

So, matrix-vector multiplication requires

2n22n2

operations.

Order matters!¶

When you look at a mathematical expression involving matrices, think carefully about what it means and how you might efficiently compute it.

Example. What is the most efficient way to compute A2xA2x?

Here are your choices:

  1. First compute A2A2, then compute (A2)x(A2)x
  2. First compute AxAx, then compute A(Ax)A(Ax)
  1. First compute A2A2, then compute (A2)x(A2)x:

    • Complexity: 2n3+2n22n3+2n2
    • Runtime for n=10,000n=10,000 rows: about 2 Trillion flops
  1. First compute AxAx, then compute A(Ax)A(Ax):

    • Complexity: 2⋅2n2=4n22⋅2n2=4n2
    • Runtime for n=10,000n=10,000 rows: 4 Million flops ✓✓

Parallelization¶

Although matrix multiplication is computationally demanding, it has a wonderful property: it is highly parallel.

That is, the computation needed for each element does not require computing the other elements.

(This is not true, for example, for Gaussian elimination; think about the role of a pivot.)

This means that if we have multiple processors, and each has access to AA and BB, the work can be divided up very cleanly.

For example, let's say you have nn processors. Then each processor can independently compute one column AbiAbi of the result, without needing to know anything about what the other processors are doing.

Since all processors are working in parallel, the elapsed time is reduced from 2n32n3 down to 2n2.2n2.

Even better, say you have n2n2 processors. Then each processor can compute a single element of the result, (AB)ij(AB)ij as ∑nk=1AikBkj∑k=1nAikBkj.

Then the elapsed time is reduced all the way down to 2n2n.

This sort of strategy is used for huge computations like web search and weather modeling.

18.5 Computational Complexity of Matrix Inverse¶

Recall that a matrix AA is called invertible if there exists a matrix CC such that

AC=I and CA=I.AC=I and CA=I.

In that case, CC is called the inverse of AA. Inverses are only defined for square matrices.

In [5]:
#
display(Image("images/06-inverse.png", width=1000))

Matrix inversion has many uses. For instance, if a matrix AA is invertible then we can solve the linear system by multiplying by A−1A−1 on the left. Ax=bA−1(Ax)=A−1b(A−1A)x=A−1bIx=A−1bx=A−1b (and this solution is unique)(5)(6)(7)(8)(9)(5)Ax=b(6)A−1(Ax)=A−1b(7)(A−1A)x=A−1b(8)Ix=A−1b(9)x=A−1b (and this solution is unique)

Python contains a function np.linalg.inv() that can compute the inverse of a square matrix of any size. For example:

In [87]:
import numpy as np
A = np.array(
    [[ 2.0, 5.0],
     [-3.0,-7.0]])
print('A =\n',A)
B = np.linalg.inv(A)
print('B = \n',B)
A =
 [[ 2.  5.]
 [-3. -7.]]
B = 
 [[-7. -5.]
 [ 3.  2.]]

Question: How would you write a computer program to perform this inverse operation? How long will it take to run?

In Lecture 5, we calculated a special formula for finding the inverse of a 2×22×2 matrix.

We also found general algorithm for inverting any matrix. We can use Gaussian Elimination on the larger augmented matrix [A∣I][A∣I]. The result in reduced row echelon form will be [I∣A−1][I∣A−1].

We've seen how to calculate the cost of Gaussian elimination for

  • An n×nn×n square matrix AA
  • A column vector bb.

Here, the augmented matrix is of size n×(n+1)n×(n+1). For this matrix, let's consider the cost of performing an elementary row operation like adding a scalar multiple of row 1 into row 2. We need

  • n+1n+1 multiplications
  • n+1n+1 additions

So the total cost is 2(n+1)2(n+1) floating point operations (or flops).

In [7]:
# Image credit: Prof. Mark Crovella
display(Image("images/03-ge1.jpg", width=800))

As you perform more elimination steps, you can work with smaller sub-matrices. In total, we calculated in Lecture 3 that Gaussian elimination on [A|b][A|b] costs

  • 23n323n3 floating point operations for the Elimination stage
  • n2n2 floating point operations for the Backsubstitution stage

For matrix inversion, we need to use a larger augmented matrix [A|I][A|I] whose size is n×2nn×2n. For this matrix, an elementary row operation now costs 2n2n multiplications and 2n2n additions, for a total of 4n4n flops.

If you go back to the derivation of the cost of Gaussian Elimination in Lecture 3 and recalculate the results for a wider augmented matrix [A|I][A|I], you will find that the costs are now:

  • 53n353n3 floating point operations for the Elimination stage
  • 13n313n3 floating point operations for the Backsubstitution stage

As a result, the total cost to calculate the inverse is 2n32n3 flops. This is three times the cost of an ordinary Gaussian elimination.

Hence, if I give you a matrix AA then you can:

  1. Calculate A−1A−1 at a cost of 2n32n3 flops and then be able to solve linear systems Ax=bAx=b for any vector bb you receive in the future (at a cost of 2n22n2 flops each to perform the matrix-vector multiplication).
  2. Solve a single linear system Ax=bAx=b at a cost of 23n323n3 flops.

Which option is better depends on how linear systems you will solve using the same matrix AA.

After break, we will find a new way to solve matrix operations that is the "best of both options": given a matrix AA, you can perform a one-time cost of 23n323n3 flops and then quickly solve any matrix equation of the form Ax=bAx=b.

In [ ]: