In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • Homework 7 out now, due Friday, April 7 at 8pm
  • Upcoming office hours:
    • Today: Prof McDonald from 4:30-6pm in CCDS 1341
    • Tomorrow: Peer tutor Rohan Anand from 1:30-3pm in CCDS 16th floor
  • Reading
    • Deisenroth-Faisal-Ong Section 4.2 and 4.4
    • 3Blue1Brown video 13 and video 14

Recap: Eigenvectors¶

Of all of the linear transformations associated with a square matrix, scaling is special because:

If a matrix AA scales xx, then that transformation could also have been expressed without a matrix-vector multiplication, i.e., as λxλx for some scalar value λλ.

An eigenvector of a matrix AA is a special vector that does not change its direction when multiplied by AA.

The eigenvalues of a matrix AA are all of the scalars that an eigenvector can be scaled by.

In [2]:
#
ax = ut.plotSetup(size=(12,8))
ut.centerAxes(ax)
A = np.array([[3,-2],[1,0]])
u = np.array([-1,1])
v = np.array([2,1])
#
ut.plotArrowVec(ax, v, [0,0], color='Red')
ut.plotArrowVec(ax, A.dot(v), [0,0], color='Red')
ax.text(v[0],v[1]+0.2,r'${\bf v}$',size=20)
ax.text(A.dot(v)[0],A.dot(v)[1]+0.2,r'$A{\bf v}$',size=20)
#
ut.plotArrowVec(ax, u, [0,0], color='Blue')
ut.plotArrowVec(ax, A.dot(u), [0,0], color='Blue')
ax.text(u[0]-0.5,u[1]+0.1,r'${\bf u}$',size=20)
ax.text(A.dot(u)[0]-0.7,A.dot(u)[1]+0.3,r'$A{\bf u}$',size=20);

To find an eigenvalue: we saw that λλ is an eigenvalue of an n×nn×n matrix AA if and only if the equation

(A−λI)x=0(A−λI)x=0

has a nontrivial solution.

Some special cases:

  • The eigenvalues of an upper triangular matrix or a lower triangular matrix are the entries on its main diagonal.
  • A matrix has an eigenvalue of 0 if and only if it is not invertible (i.e., is singular).

To find all eigenvectors corresponding to an eigenvalue: compute the null space of the matrix A−λI.A−λI.

(Remember that this forms a subspace, which we call the eigenspace of AA corresponding to λλ.)

27.6 The Characteristic Equation¶

So, AA is invertible if and only if detAdetA is not zero.

To return to the question of how to compute eigenvalues of A,A, recall that λλ is an eigenvalue if and only if (A−λI)(A−λI) is not invertible.

We capture this fact using the characteristic equation:

det(A−λI)=0.det(A−λI)=0.

We can conclude that λλ is an eigenvalue of an n×nn×n matrix AA if and only if λλ satisfies the characteristic equation det(A−λI)=0.det(A−λI)=0.

Example. Find the characteristic equation of

A=⎡⎢ ⎢ ⎢⎣5−26−103−8000540001⎤⎥ ⎥ ⎥⎦A=[5−26−103−8000540001]

Solution. Form A−λI,A−λI, and note that detAdetA is the product of the entries on the diagonal of A,A, if AA is triangular.

det(A−λI)=det⎡⎢ ⎢ ⎢⎣5−λ−26−103−λ−80005−λ40001−λ⎤⎥ ⎥ ⎥⎦det(A−λI)=det[5−λ−26−103−λ−80005−λ40001−λ]
=(5−λ)(3−λ)(5−λ)(1−λ).=(5−λ)(3−λ)(5−λ)(1−λ).

So the characteristic equation is:

(λ−5)2(λ−3)(λ−1)=0λ4−14λ3+68λ2−130λ+75=0.(1)(2)(1)(λ−5)2(λ−3)(λ−1)=0(2)λ4−14λ3+68λ2−130λ+75=0.

Notice that, once again, det(A−λI)det(A−λI) is a polynomial in λλ.

In fact, for any n×nn×n matrix, det(A−λI)det(A−λI) is a polynomial of degree nn, called the characteristic polynomial of AA.

We say that the eigenvalue 5 in this example has multiplicity 2, because (λ−5)(λ−5) occurs two times as a factor of the characteristic polynomial. In general, the mutiplicity of an eigenvalue λλ is its multiplicity as a root of the characteristic equation.

Example. The characteristic polynomial of a 6×66×6 matrix is λ6−4λ5−12λ4.λ6−4λ5−12λ4. Find the eigenvalues and their multiplicity.

Solution. Factor the polynomial

λ6−4λ5−12λ4=λ4(λ2−4λ−12)=λ4(λ−6)(λ+2)λ6−4λ5−12λ4=λ4(λ2−4λ−12)=λ4(λ−6)(λ+2)

So the eigenvalues are 0 (with multiplicity 4), 6, and -2.

Since the characteristic polynomial for an n×nn×n matrix has degree n,n, the equation has nn roots, counting multiplicities -- provided complex numbers are allowed.

As a consequence:

  • There is, in principle, a way to find eigenvalues of any matrix.
  • But, even for a real matrix, eigenvalues may sometimes be complex.

Note that you need not compute eigenvalues for matrices larger than 2×22×2 by hand. For any matrix 3×33×3 or larger, you should use a computer.

Lecture 28: Diagonalization¶

[This lecture is based on Prof. Crovella's CS 132 lecture notes.]

28.1 Similarity¶

Before we get to diagonal matrices specifically, let's first look at the notion of similar matrices.

Definition. If AA and BB are n×nn×n matrices, then AA is similar to BB if there is an invertible matrix PP such that P−1AP=B,P−1AP=B, or, equivalently, A=PBP−1.A=PBP−1.

Similarity is symmetric, so if AA is similar to BB, then BB is similar to AA. Hence we just say that AA and BB are similar.

Changing AA into BB is called a similarity transformation.

An important way to think of similarity between AA and BB is that they have the same eigenvalues.

Theorem. If n×nn×n matrices AA and BB are similar, then they have the same characteristic polynomial, and hence the same eigenvalues (with the same multiplicities.)

Proof. If B=P−1AP,B=P−1AP, then

B−λI=P−1AP−λP−1PB−λI=P−1AP−λP−1P
=P−1(AP−λP)=P−1(AP−λP)
=P−1(A−λI)P=P−1(A−λI)P

Now let's construct the characteristic polynomial by taking the determinant:

det(B−λI)=det[P−1(A−λI)P]det(B−λI)=det[P−1(A−λI)P]

Using the properties of determinants we discussed last lecture, we compute:

=det(P−1)⋅det(A−λI)⋅det(P).=det(P−1)⋅det(A−λI)⋅det(P).

Since det(P−1)⋅det(P)=det(P−1P)=detI=1,det(P−1)⋅det(P)=det(P−1P)=detI=1, we can see that

det(B−λI)=det(A−λI).det(B−λI)=det(A−λI).

28.2 Diagonalization¶

Now let's return to the setting of square matrices.

Given a square matrix AA, the factorization of AA is of the form

A=PDP−1A=PDP−1

where DD is a diagonal matrix.

We will say that any two matrices AA and DD are similar matrices if they can be related in this way.

So this factorization amounts to finding a PP that allows us to make AA similar to a diagonal matrix.

Now, it is important to understand:

This factorization may not always be possible!

Hence, we have a definition:

Definition. A square matrix AA is said to be diagonalizable if AA is similar to a diagonal matrix.

That is, if we can find some invertible PP such that

A=PDP−1A=PDP−1

and DD is a diagonal matrix.

Note: a matrix AA may or may not be invertible, and it may or may not be diagonalizable. And these properties are not directly related.

  • It is possible for a matrix to be invertible but not diagonalizable.
  • It is possible for a matrix to be diagonalizable but not invertible.

This factorization A=PDP−1A=PDP−1 allows us to represent AA in a form that exposes many interesting properties of AA.

Let's start by showing one benefit of this factorization: we can compute AkAk quickly for large values of kk.

Powers of a Diagonal Matrix¶

Let's look at an example to show why this factorization is so important. Consider taking the powers of the following diagonal matrix

D=[5003].D=[5003].

Then note that D2=[5003][5003]=[520032],D2=[5003][5003]=[520032],

And D3=DD2=[5003][520032]=[530033].D3=DD2=[5003][520032]=[530033].

So in general,

Dk=[5k003k]fork≥1.Dk=[5k003k]fork≥1.

Extending to a general matrix AA¶

Now, consider if AA is diagonalizable, meaning that it is similar to a diagonal matrix.

Then, AkAk is easy to compute in this case as well. Let's see this by example.

Example. Let A=[72−41].A=[72−41].

Find a formula for Ak,Ak, given that A=PDP−1,A=PDP−1, where

P=[11−1−2],D=[5003],andP−1=[21−1−1].P=[11−1−2],D=[5003],andP−1=[21−1−1].

Solution. By associativity of matrix multiplication,

A2=(PDP−1)(PDP−1)A2=(PDP−1)(PDP−1)
=PD(P−1P)DP−1=PD(P−1P)DP−1
=PDDP−1=PDDP−1
=PD2P−1=PD2P−1
=[11−1−2][520032][21−1−1]=[11−1−2][520032][21−1−1]

So in general, for k≥1,k≥1,

Ak=PDkP−1Ak=PDkP−1
=[11−1−2][5k003k][21−1−1]=[11−1−2][5k003k][21−1−1]
=[2⋅5k−3k5k−3k2⋅3k−2⋅5k2⋅3k−5k]=[2⋅5k−3k5k−3k2⋅3k−2⋅5k2⋅3k−5k]

28.3 Diagonalization Requires Eigenvectors and Eigenvalues¶

Next we will show that to diagonalize a matrix, one must use the eigenvectors and eigenvalues of AA.

Theorem. (The Diagonalization Theorem)

An n×nn×n matrix AA is diagonalizable if and only if AA has nn linearly independent eigenvectors.

In fact,

A=PDP−1,A=PDP−1,

with DD a diagonal matrix,

if and only if the columns of PP are nn linearly independent eigenvectors of A.A.

In this case, the diagonal entries of DD are eigenvalues of AA that correspond, respectively, to the eigenvectors in PP.

In other words, AA is diagonalizable if and only if there are enough eigenvectors to form a basis of RnRn.

We call such a basis an eigenvector basis or an eigenbasis of RnRn.

Example. A 2×22×2 rotation matrix like

[0−110][0−110]

is not diagonalizable, because we saw last week that it does not have any (real-valued) eigenvalues.

Proof. First, we prove the "only if" (⇒⇒) direction: if AA is diagonalizable, it has nn linearly independent eigenvectors.

AA is diagonalizable, so A=PDP−1A=PDP−1.

Observe that if PP is any n×nn×n matrix with columns v1,…,vn,v1,…,vn, then

AP=A[v1v2⋯vn]=[Av1Av2⋯Avn]AP=A[v1v2⋯vn]=[Av1Av2⋯Avn]

next, note if DD is any diagonal matrix with diagonal entries λ1,…,λn,λ1,…,λn,

PD=P⎡⎢ ⎢ ⎢ ⎢ ⎢⎣λ10⋯00λ2⋯0⋮⋮⋱⋮00⋯λn⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=[λ1v1λ2v2⋯λnvn].PD=P[λ10⋯00λ2⋯0⋮⋮⋱⋮00⋯λn]=[λ1v1λ2v2⋯λnvn].

Now suppose AA is diagonalizable and A=PDP−1.A=PDP−1. Then right-multiplying this relation by PP, we have

AP=PDAP=PD

In this case, the calculations above show that

[Av1Av2⋯Avn]=[λ1v1λ2v2⋯λnvn].[Av1Av2⋯Avn]=[λ1v1λ2v2⋯λnvn].

Equating columns, we find that

Av1=λ1v1,Av2=λ2v2,…,Avn=λnvnAv1=λ1v1,Av2=λ2v2,…,Avn=λnvn

Because AviAvi is scaling vivi by λiλi, we know vivi must be an eigenvector of AA.

Since PP is invertible, its columns v1,…,vnv1,…,vn must be linearly independent.

Also, since these columns are nonzero, the equations above show that λ1,…,λnλ1,…,λn are eigenvalues and v1,…,vnv1,…,vn are the corresponding eigenvectors.

This proves the "only if" part of the theorem.

The "if" (⇐⇐) direction of the theorem is: if AA has nn linearly independent eigenvectors, AA is diagonalizable.

This is straightforward: given AA's nn eigenvectors v1,…,vn,v1,…,vn, use them to construct the columns of PP and use corresponding eigenvalues λ1,…,λnλ1,…,λn to construct DD.

Using the sequence of equations above in reverse order, we can go from

Av1=λ1v1,Av2=λ2v2,…,Avn=λnvnAv1=λ1v1,Av2=λ2v2,…,Avn=λnvn

to

AP=PD.AP=PD.

Since the eigenvectors are given as linearly independent, PP is invertible and so

A=PDP−1.A=PDP−1.

The takeaway is this:

Every n×nn×n matrix having nn linearly independent eigenvectors can be factored into the product of

  • a matrix PP,
  • a diagonal matrix DD, and
  • the inverse of PP

... where PP holds the eigenvectors of AA, and DD holds the eigenvalues of AA.

This is the eigendecomposition of AA.

(It is quite fundamental!)

28.4 Diagonalizing a Matrix¶

Let's put this all together and see how to diagonalize a matrix.

Four Steps to Diagonalization¶

Example. Diagonalize the following matrix, if possible.

A=⎡⎢⎣133−3−5−3331⎤⎥⎦A=[133−3−5−3331]

That is, find an invertible matrix PP and a diagonal matrix DD such that A=PDP−1.A=PDP−1.

Step 1: Find the eigenvalues of AA.¶

If we are working with a 2×22×2 matrix, then we can compute by hand the roots of the characteristic (quadratic) polynomial. For anything larger we'd use a computer.

In this case, the characteristic equation turns out to involve a cubic polynomial that can be factored:

0=det(A−λI)=−λ3−3λ2+4=−(λ−1)(λ+2)2(3)(4)(5)(3)0=det(A−λI)(4)=−λ3−3λ2+4(5)=−(λ−1)(λ+2)2

So the eigenvalues are λ=1λ=1 and λ=−2λ=−2 (with multiplicity two).

Step 2: Find three linearly independent eigenvectors of AA.¶

Note that we need three linearly independent vectors because AA is 3×3.3×3.

This is the step where we find out whether AA can be diagonalized, depending on whether we can form 3 independent eigenvectors.

Using our standard method (finding the nullspace of A−λIA−λI) we find a basis for each eigenspace:

Basis for λ=1λ=1:

  • We must find the nullspace of

A−I=⎡⎢⎣1−133−3−5−1−3331−1⎤⎥⎦=⎡⎢⎣033−3−6−3330⎤⎥⎦A−I=[1−133−3−5−1−3331−1]=[033−3−6−3330].

  • Note that this matrix has rank 2 (do you see why?) so it has a nullspace of dimension 1.
  • Using Gaussian elimination, we can find a nonzero solution to the homogeneous equation (A−I)v=0(A−I)v=0, namely v1=⎡⎢⎣1−11⎤⎥⎦.v1=[1−11].

Using the same technique, we can find the basis for λ=−2λ=−2. It turns out that the result is:

v2=⎡⎢⎣−110⎤⎥⎦ and v3=⎡⎢⎣−101⎤⎥⎦.v2=[−110] and v3=[−101].

At this point we must ensure that {v1,v2,v3}{v1,v2,v3} forms a linearly independent set.

(These vectors in fact do.)

Step 3: Construct PP from the vectors in Step 2.¶

The order of the vectors is actually not important (yet).

P=[v1v2v3]=⎡⎢⎣1−1−1−110101⎤⎥⎦.P=[v1v2v3]=[1−1−1−110101].

Step 4: Construct DD from the corresponding eigenvalues.¶

The order of eigenvalues must match the order of eigenvectors used in the previous step.

If an eigenvalue has multiplicity greater than 1, then repeat it the corresponding number of times.

D=⎡⎢⎣1000−2000−2⎤⎥⎦.D=[1000−2000−2].

And we are done. We have diagonalized AA:

A=⎡⎢⎣133−3−5−3331⎤⎥⎦=⎡⎢⎣1−1−1−110101⎤⎥⎦⎡⎢⎣1000−2000−2⎤⎥⎦⎡⎢⎣1−1−1−110101⎤⎥⎦−1A=[133−3−5−3331]=[1−1−1−110101][1000−2000−2][1−1−1−110101]−1

So, just as a reminder, we can now take powers of AA quite efficiently:

A100=⎡⎢⎣1−1−1−110101⎤⎥⎦⎡⎢ ⎢⎣1100000(−2)100000(−2)100⎤⎥ ⎥⎦⎡⎢⎣1−1−1−110101⎤⎥⎦−1A100=[1−1−1−110101][1100000(−2)100000(−2)100][1−1−1−110101]−1

When Diagonalization Fails¶

Example. Let's look at an example of how diagonalization can fail.

Diagonalize the following matrix, if possible.

A=⎡⎢⎣243−4−6−3331⎤⎥⎦.A=[243−4−6−3331].

Solution. The characteristic equation of AA turns out to be the same as in the last example:

0=det(A−λI)=−(λ−1)(λ+2)20=det(A−λI)=−(λ−1)(λ+2)2

The eigenvalues are λ=1λ=1 and λ=−2.λ=−2. However, it is easy to verify that each eigenspace is only one-dimensional:

Basis for λ1=1λ1=1: v1=⎡⎢⎣1−11⎤⎥⎦.v1=[1−11].

Basis for λ2=−2λ2=−2: v2=⎡⎢⎣−110⎤⎥⎦.v2=[−110].

There are not other eigenvalues, and every eigenvector of AA is a multiple of either v1v1 or v2.v2.

Hence it is impossible to construct a basis of R3R3 using eigenvectors of AA.

So we conclude that AA is not diagonalizable.

An Important Case¶

There is an important situation in which we can conclude immediately that AA is diagonalizable, without explicitly constructing and testing the eigenspaces of AA.

Theorem. An n×nn×n matrix with nn distinct eigenvalues is diagonalizable.

Proof. First, remember that every eigenvalue λiλi has at least one eigenvector pipi associated with it. So we have nn distinct eigenvectors p1,…,pnp1,…,pn.

It only remains to show that these eigenvectors must be linearly independent.

We will prove this statement by induction. As the base case, it is clear that the first eigenvector p1p1 is linearly independent on its own. (Remember that eigenvectors are not equal to 00 by definition.)

For the induction step: suppose that the first jj eigenvectors are linearly independent.

Now let's see what happens if we take an arbitrary linear combination of the first j+1j+1 vectors:

0=c1p1+c2p2+⋯+cj+1pj+1.0=c1p1+c2p2+⋯+cj+1pj+1.

Let's see what happens when we multiply this equation by the scalar λj+1λj+1 and left-multiply by the matrix AA.

0=λj+1(c1p1+c2p2+⋯+cj+1pj+1)=λj+1c1p1+λj+1c2p2+⋯+λj+1cj+1pj+1.(6)(7)(6)0=λj+1(c1p1+c2p2+⋯+cj+1pj+1)(7)=λj+1c1p1+λj+1c2p2+⋯+λj+1cj+1pj+1.
0=A(c1p1+c2p2+⋯+cj+1pj+1)=λ1c1p1+λ2c2p2+⋯+λj+1cj+1pj+1.(8)(9)(8)0=A(c1p1+c2p2+⋯+cj+1pj+1)(9)=λ1c1p1+λ2c2p2+⋯+λj+1cj+1pj+1.

If we take the difference of these two equations, we get:

0=(λ1−λj+1)c1p1+(λ2−λj+1)c2p2+⋯+(λj−λj+1)cjpj,0=(λ1−λj+1)c1p1+(λ2−λj+1)c2p2+⋯+(λj−λj+1)cjpj,

where the pj+1pj+1 term has been canceled out.

This is a linear combination of the first jj vectors alone, and remember that λi−λj+1≠0λi−λj+1≠0 since the eigenvalues are distinct.

From our induction step, we know that the first jj vectors are linearly independent, which means that all of the coefficients c1,c2,…,cjc1,c2,…,cj must be zero. And if that's the case then it is clear that cj+1=0cj+1=0. So all of the first j+1j+1 eigenvectors are linearly independent. The proof follows by induction.

Example. Determine if the following matrix is diagonalizable.

A=⎡⎢⎣5−8100700−2⎤⎥⎦.A=[5−8100700−2].

Solution. It's easy!

  • Since AA is (upper) triangular, its eigenvalues are 5,0,5,0, and −2−2.
  • Since AA is a 3×33×3 with 3 distinct eigenvalues, AA is diagonalizable.

28.5 Diagonalization as a Change of Basis¶

We can now turn to a geometric understanding of how diagonalization informs us about the properties of AA.

Let's interpret the diagonalization A=PDP−1A=PDP−1 in terms of how AA acts as a linear operator.

When thinking of AA as a linear operator, diagonalization has a specific interpretation:

Diagonalization separates the influence of each vector component from the others.

To see what this means, let's first consider an easier case: a diagonal matrix. When we multiply a vector xx by a diagonal matrix DD, the change to each component of xx depends only on that component.

That is, multiplying by a diagonal matrix simply scales the components of the vector.

Example. Let D=[5003].D=[5003]. Then, D[x1x2]=[5x13x2].D[x1x2]=[5x13x2].

On the other hand, when we multiply by a matrix AA that has off-diagonal entries, the components of xx affect each other.

So diagonalizing a matrix allows us to bring intuition to its behavior as as linear operator.

Interpreting Diagonalization Geometrically¶

When we compute Px,Px, we are taking a vector sum of the columns of PP:

Px=p1x1+p2x2+…pnxn.Px=p1x1+p2x2+…pnxn.

Now PP is square and invertible, so its columns are a basis for RnRn. Let's call that basis B={p1,p2,…,pn}.B={p1,p2,…,pn}.

So, we can think of PxPx as "the point that has coordinates xx in the basis BB."

On the other hand, what if we wanted to find the coordinates of a vector in basis BB?

Let's say we have some yy written in the standard basis, and we want to find its coordinates in the basis BB.

So y=Px=x1p1+x2p2+⋯+xnpn=[x]B.y=Px=x1p1+x2p2+⋯+xnpn=[x]B.

Then since PP is invertible, x=P−1y.x=P−1y.

Thus, P−1yP−1y is "the coordinates of yy in the basis B.B."

So we can interpret Ax=PDP−1xAx=PDP−1x as:

  1. Compute the coordinates of xx in the basis BB.

    This is P−1x.P−1x.

  1. Scale those coordinates according to the diagonal matrix DD.

    This is DP−1x.DP−1x.

  1. Find the point that has those scaled coordinates in the basis B.B.

    This is PDP−1x.PDP−1x.

In [32]:
A = np.array([[ 1.86363636, 0.68181819],
              [-0.22727273, 3.13636364]])

P = np.array([[0.98058068, 0.51449576],
              [0.19611614, 0.85749293]])
D = np.array([[2, 0],
              [0, 3]])

np.allclose(A, P @ D @ np.linalg.inv(P))
Out[32]:
True

Example. Let's visualize diagonalization geometrically.

Consider a fixed-but-unwritten matrix AA. Here's a picture showing how it transforms a point x=[2.471.25]x=[2.471.25].

In [3]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
ax.annotate('', xy=(p2[0], p2[1]),  xycoords='data',
                xytext=(p1[0], p1[1]), textcoords='data',
                size=15,
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.7', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                )
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');

So far, we cannot say much about what the linear transformation AA does in general.

Now, let's compute P−1x.P−1x.

Remember that the columns of PP are the eigenvectors of AA.

So P−1xP−1x is the coordinates of the point xx in the eigenvector basis:

In [4]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
ut.plotVec(ax,v1,'b')
ut.plotVec(ax,v2)
ut.plotLinEqn(-v1[1],v1[0],0,color='b')
ut.plotLinEqn(-v2[1],v2[0],0,color='r')
for i in range(-4,8):
    ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
    ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(v2[0]-0.15,v2[1]+0.5,r'${\bf p_2}$',size=20)
ax.text(v1[0]-0.15,v1[1]+0.35,r'${\bf p_1}$',size=20)
ax.plot(0,0,'');

The coordinates of xx in this basis are (2,1).

In other words P−1x=[21].P−1x=[21].

Now, we compute DP−1x.DP−1x. Since DD is diagonal, this is just scaling each of the BB-coordinates.

In this example the eigenvalue corresponding to p1p1 is 2, and the eigenvalue corresponding to p2p2 is 3.

In [5]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
#ut.plotVec(ax,v1,'b')
#ut.plotVec(ax,v2)
ut.plotLinEqn(-v1[1],v1[0],0,color='b')
ut.plotLinEqn(-v2[1],v2[0],0,color='r')
for i in range(-4,8):
    ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
    ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
ax.annotate('', xy=(p2[0], p2[1]),  xycoords='data',
                xytext=(p1[0], p1[1]), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.7', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                )
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');

So the coordinates of AxAx in the basis BB are

[2003][21]=[43].[2003][21]=[43].

Now we convert back to the standard basis -- that is, we ask which point has coordinates (4,3) in basis B.B.

We rely on the fact that if yy has coordinates xx in the basis BB, then y=Px.y=Px.

So

Ax=P[43]Ax=P[43]=PDP−1x.=PDP−1x.
In [6]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
#ut.plotVec(ax,v1,'b')
#ut.plotVec(ax,v2)
#ut.plotLinEqn(-v1[1],v1[0],0,color='b')
#ut.plotLinEqn(-v2[1],v2[0],0,color='r')
#for i in range(-3,8):
#    ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
#    ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
#ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
#ax.annotate('', xy=(p2[0], p2[1]),  xycoords='data',
#                xytext=(p1[0], p1[1]), textcoords='data',
#                size=15,
#                #bbox=dict(boxstyle="round", fc="0.8"),
#                arrowprops={'arrowstyle': 'simple',
#                                'fc': '0.7', 
#                                'ec': 'none',
#                                'connectionstyle' : 'arc3,rad=-0.3'},
#                )
#ax.text(2.5,0.75,r'${\bf x}$',size=16)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');

We find that AxAx = PDP−1x=[5.463.35].PDP−1x=[5.463.35].

In conclusion: notice that the transformation x↦Axx↦Ax may be a complicated one in which each component of xx affects each component of AxAx.

However, by changing to the basis defined by the eigenspaces of AA, the action of AA becomes simple to understand.

Diagonalization of AA changes to a basis in which the action of AA is particularly easy to understand and compute with.

In [ ]: