#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None
mp.rcParams['animation.html'] = 'jshtml';
Of all of the linear transformations associated with a square matrix, scaling is special because:
If a matrix scales , then that transformation could also have been expressed without a matrix-vector multiplication, i.e., as for some scalar value .
An eigenvector of a matrix is a special vector that does not change its direction when multiplied by .
The eigenvalues of a matrix are all of the scalars that an eigenvector can be scaled by.
#
ax = ut.plotSetup(size=(12,8))
ut.centerAxes(ax)
A = np.array([[3,-2],[1,0]])
u = np.array([-1,1])
v = np.array([2,1])
#
ut.plotArrowVec(ax, v, [0,0], color='Red')
ut.plotArrowVec(ax, A.dot(v), [0,0], color='Red')
ax.text(v[0],v[1]+0.2,r'${\bf v}$',size=20)
ax.text(A.dot(v)[0],A.dot(v)[1]+0.2,r'$A{\bf v}$',size=20)
#
ut.plotArrowVec(ax, u, [0,0], color='Blue')
ut.plotArrowVec(ax, A.dot(u), [0,0], color='Blue')
ax.text(u[0]-0.5,u[1]+0.1,r'${\bf u}$',size=20)
ax.text(A.dot(u)[0]-0.7,A.dot(u)[1]+0.3,r'$A{\bf u}$',size=20);
To find an eigenvalue: we saw that is an eigenvalue of an matrix if and only if the equation
has a nontrivial solution.
Some special cases:
To find all eigenvectors corresponding to an eigenvalue: compute the null space of the matrix
(Remember that this forms a subspace, which we call the eigenspace of corresponding to .)
So, is invertible if and only if is not zero.
To return to the question of how to compute eigenvalues of recall that is an eigenvalue if and only if is not invertible.
We capture this fact using the characteristic equation:
We can conclude that is an eigenvalue of an matrix if and only if satisfies the characteristic equation
Example. Find the characteristic equation of
Solution. Form and note that is the product of the entries on the diagonal of if is triangular.
So the characteristic equation is:
Notice that, once again, is a polynomial in .
In fact, for any matrix, is a polynomial of degree , called the characteristic polynomial of .
We say that the eigenvalue 5 in this example has multiplicity 2, because occurs two times as a factor of the characteristic polynomial. In general, the mutiplicity of an eigenvalue is its multiplicity as a root of the characteristic equation.
Example. The characteristic polynomial of a matrix is Find the eigenvalues and their multiplicity.
Solution. Factor the polynomial
So the eigenvalues are 0 (with multiplicity 4), 6, and -2.
Since the characteristic polynomial for an matrix has degree the equation has roots, counting multiplicities -- provided complex numbers are allowed.
As a consequence:
Note that you need not compute eigenvalues for matrices larger than by hand. For any matrix or larger, you should use a computer.
[This lecture is based on Prof. Crovella's CS 132 lecture notes.]
Before we get to diagonal matrices specifically, let's first look at the notion of similar matrices.
Definition. If and are matrices, then is similar to if there is an invertible matrix such that or, equivalently,
Similarity is symmetric, so if is similar to , then is similar to . Hence we just say that and are similar.
Changing into is called a similarity transformation.
An important way to think of similarity between and is that they have the same eigenvalues.
Theorem. If matrices and are similar, then they have the same characteristic polynomial, and hence the same eigenvalues (with the same multiplicities.)
Proof. If then
Now let's construct the characteristic polynomial by taking the determinant:
Using the properties of determinants we discussed last lecture, we compute:
Since we can see that
Now let's return to the setting of square matrices.
Given a square matrix , the factorization of is of the form
where is a diagonal matrix.
We will say that any two matrices and are similar matrices if they can be related in this way.
So this factorization amounts to finding a that allows us to make similar to a diagonal matrix.
Now, it is important to understand:
This factorization may not always be possible!
Hence, we have a definition:
Definition. A square matrix is said to be diagonalizable if is similar to a diagonal matrix.
That is, if we can find some invertible such that
and is a diagonal matrix.
Note: a matrix may or may not be invertible, and it may or may not be diagonalizable. And these properties are not directly related.
This factorization allows us to represent in a form that exposes many interesting properties of .
Let's start by showing one benefit of this factorization: we can compute quickly for large values of .
Let's look at an example to show why this factorization is so important. Consider taking the powers of the following diagonal matrix
Then note that
And
So in general,
Now, consider if is diagonalizable, meaning that it is similar to a diagonal matrix.
Then, is easy to compute in this case as well. Let's see this by example.
Example. Let
Find a formula for given that where
Solution. By associativity of matrix multiplication,
So in general, for
Next we will show that to diagonalize a matrix, one must use the eigenvectors and eigenvalues of .
Theorem. (The Diagonalization Theorem)
An matrix is diagonalizable if and only if has linearly independent eigenvectors.
In fact,
with a diagonal matrix,
if and only if the columns of are linearly independent eigenvectors of
In this case, the diagonal entries of are eigenvalues of that correspond, respectively, to the eigenvectors in .
In other words, is diagonalizable if and only if there are enough eigenvectors to form a basis of .
We call such a basis an eigenvector basis or an eigenbasis of .
Example. A rotation matrix like
is not diagonalizable, because we saw last week that it does not have any (real-valued) eigenvalues.
Proof. First, we prove the "only if" () direction: if is diagonalizable, it has linearly independent eigenvectors.
is diagonalizable, so .
Observe that if is any matrix with columns then
next, note if is any diagonal matrix with diagonal entries
Now suppose is diagonalizable and Then right-multiplying this relation by , we have
In this case, the calculations above show that
Equating columns, we find that
Because is scaling by , we know must be an eigenvector of .
Since is invertible, its columns must be linearly independent.
Also, since these columns are nonzero, the equations above show that are eigenvalues and are the corresponding eigenvectors.
This proves the "only if" part of the theorem.
The "if" () direction of the theorem is: if has linearly independent eigenvectors, is diagonalizable.
This is straightforward: given 's eigenvectors use them to construct the columns of and use corresponding eigenvalues to construct .
Using the sequence of equations above in reverse order, we can go from
to
Since the eigenvectors are given as linearly independent, is invertible and so
The takeaway is this:
Every matrix having linearly independent eigenvectors can be factored into the product of
... where holds the eigenvectors of , and holds the eigenvalues of .
This is the eigendecomposition of .
(It is quite fundamental!)
Let's put this all together and see how to diagonalize a matrix.
Example. Diagonalize the following matrix, if possible.
That is, find an invertible matrix and a diagonal matrix such that
If we are working with a matrix, then we can compute by hand the roots of the characteristic (quadratic) polynomial. For anything larger we'd use a computer.
In this case, the characteristic equation turns out to involve a cubic polynomial that can be factored:
So the eigenvalues are and (with multiplicity two).
Note that we need three linearly independent vectors because is
This is the step where we find out whether can be diagonalized, depending on whether we can form 3 independent eigenvectors.
Using our standard method (finding the nullspace of ) we find a basis for each eigenspace:
Basis for :
.
Using the same technique, we can find the basis for . It turns out that the result is:
At this point we must ensure that forms a linearly independent set.
(These vectors in fact do.)
The order of the vectors is actually not important (yet).
The order of eigenvalues must match the order of eigenvectors used in the previous step.
If an eigenvalue has multiplicity greater than 1, then repeat it the corresponding number of times.
And we are done. We have diagonalized :
So, just as a reminder, we can now take powers of quite efficiently:
Example. Let's look at an example of how diagonalization can fail.
Diagonalize the following matrix, if possible.
Solution. The characteristic equation of turns out to be the same as in the last example:
The eigenvalues are and However, it is easy to verify that each eigenspace is only one-dimensional:
Basis for :
Basis for :
There are not other eigenvalues, and every eigenvector of is a multiple of either or
Hence it is impossible to construct a basis of using eigenvectors of .
So we conclude that is not diagonalizable.
There is an important situation in which we can conclude immediately that is diagonalizable, without explicitly constructing and testing the eigenspaces of .
Theorem. An matrix with distinct eigenvalues is diagonalizable.
Proof. First, remember that every eigenvalue has at least one eigenvector associated with it. So we have distinct eigenvectors .
It only remains to show that these eigenvectors must be linearly independent.
We will prove this statement by induction. As the base case, it is clear that the first eigenvector is linearly independent on its own. (Remember that eigenvectors are not equal to by definition.)
For the induction step: suppose that the first eigenvectors are linearly independent.
Now let's see what happens if we take an arbitrary linear combination of the first vectors:
Let's see what happens when we multiply this equation by the scalar and left-multiply by the matrix .
If we take the difference of these two equations, we get:
where the term has been canceled out.
This is a linear combination of the first vectors alone, and remember that since the eigenvalues are distinct.
From our induction step, we know that the first vectors are linearly independent, which means that all of the coefficients must be zero. And if that's the case then it is clear that . So all of the first eigenvectors are linearly independent. The proof follows by induction.
Example. Determine if the following matrix is diagonalizable.
Solution. It's easy!
We can now turn to a geometric understanding of how diagonalization informs us about the properties of .
Let's interpret the diagonalization in terms of how acts as a linear operator.
When thinking of as a linear operator, diagonalization has a specific interpretation:
Diagonalization separates the influence of each vector component from the others.
To see what this means, let's first consider an easier case: a diagonal matrix. When we multiply a vector by a diagonal matrix , the change to each component of depends only on that component.
That is, multiplying by a diagonal matrix simply scales the components of the vector.
Example. Let Then,
On the other hand, when we multiply by a matrix that has off-diagonal entries, the components of affect each other.
So diagonalizing a matrix allows us to bring intuition to its behavior as as linear operator.
When we compute we are taking a vector sum of the columns of :
Now is square and invertible, so its columns are a basis for . Let's call that basis
So, we can think of as "the point that has coordinates in the basis ."
On the other hand, what if we wanted to find the coordinates of a vector in basis ?
Let's say we have some written in the standard basis, and we want to find its coordinates in the basis .
So
Then since is invertible,
Thus, is "the coordinates of in the basis "
So we can interpret as:
Compute the coordinates of in the basis .
This is
Scale those coordinates according to the diagonal matrix .
This is
Find the point that has those scaled coordinates in the basis
This is
A = np.array([[ 1.86363636, 0.68181819],
[-0.22727273, 3.13636364]])
P = np.array([[0.98058068, 0.51449576],
[0.19611614, 0.85749293]])
D = np.array([[2, 0],
[0, 3]])
np.allclose(A, P @ D @ np.linalg.inv(P))
True
Example. Let's visualize diagonalization geometrically.
Consider a fixed-but-unwritten matrix . Here's a picture showing how it transforms a point .
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
ax.annotate('', xy=(p2[0], p2[1]), xycoords='data',
xytext=(p1[0], p1[1]), textcoords='data',
size=15,
arrowprops={'arrowstyle': 'simple',
'fc': '0.7',
'ec': 'none',
'connectionstyle' : 'arc3,rad=-0.3'},
)
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');
So far, we cannot say much about what the linear transformation does in general.
Now, let's compute
Remember that the columns of are the eigenvectors of .
So is the coordinates of the point in the eigenvector basis:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
ut.plotVec(ax,v1,'b')
ut.plotVec(ax,v2)
ut.plotLinEqn(-v1[1],v1[0],0,color='b')
ut.plotLinEqn(-v2[1],v2[0],0,color='r')
for i in range(-4,8):
ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(v2[0]-0.15,v2[1]+0.5,r'${\bf p_2}$',size=20)
ax.text(v1[0]-0.15,v1[1]+0.35,r'${\bf p_1}$',size=20)
ax.plot(0,0,'');
The coordinates of in this basis are (2,1).
In other words
Now, we compute Since is diagonal, this is just scaling each of the -coordinates.
In this example the eigenvalue corresponding to is 2, and the eigenvalue corresponding to is 3.
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
#ut.plotVec(ax,v1,'b')
#ut.plotVec(ax,v2)
ut.plotLinEqn(-v1[1],v1[0],0,color='b')
ut.plotLinEqn(-v2[1],v2[0],0,color='r')
for i in range(-4,8):
ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
ax.annotate('', xy=(p2[0], p2[1]), xycoords='data',
xytext=(p1[0], p1[1]), textcoords='data',
size=15,
#bbox=dict(boxstyle="round", fc="0.8"),
arrowprops={'arrowstyle': 'simple',
'fc': '0.7',
'ec': 'none',
'connectionstyle' : 'arc3,rad=-0.3'},
)
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');
So the coordinates of in the basis are
Now we convert back to the standard basis -- that is, we ask which point has coordinates (4,3) in basis
We rely on the fact that if has coordinates in the basis , then
So
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
#ut.plotVec(ax,v1,'b')
#ut.plotVec(ax,v2)
#ut.plotLinEqn(-v1[1],v1[0],0,color='b')
#ut.plotLinEqn(-v2[1],v2[0],0,color='r')
#for i in range(-3,8):
# ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
# ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
#ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
#ax.annotate('', xy=(p2[0], p2[1]), xycoords='data',
# xytext=(p1[0], p1[1]), textcoords='data',
# size=15,
# #bbox=dict(boxstyle="round", fc="0.8"),
# arrowprops={'arrowstyle': 'simple',
# 'fc': '0.7',
# 'ec': 'none',
# 'connectionstyle' : 'arc3,rad=-0.3'},
# )
#ax.text(2.5,0.75,r'${\bf x}$',size=16)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');
We find that =
In conclusion: notice that the transformation may be a complicated one in which each component of affects each component of .
However, by changing to the basis defined by the eigenspaces of , the action of becomes simple to understand.
Diagonalization of changes to a basis in which the action of is particularly easy to understand and compute with.