In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • Homework 7 due tonight at 8pm
  • Homework 8 will be out later today
  • Upcoming office hours:
    • Today: Peer tutor Daniel Cho from 12:30-3:30pm in CCDS 16th floor
    • Today: Abhishek Tiwari from 4-5pm on CCDS 13th floor
  • Reading
    • Deisenroth-Faisal-Ong Section 4.2 and 4.4
    • 3Blue1Brown video 13 and video 14

Recap: Diagonalization¶

Last time, we formed a theory of diagonalizing a square matrix AA into A=PDP−1A=PDP−1.

Diagonalization is not always possible.

  • If it exists, its format is related to the eigenvalues and eigenvectors of AA
  • If it exists, it makes many matrix problems substantially easier, such as taking matrix powers AkAk

We also proved the following theorem.

Theorem. (The Diagonalization Theorem)

A square n×nn×n matrix AA is diagonalizable if and only if AA has nn linearly independent eigenvectors.

28.5 Diagonalization as a Change of Basis¶

We can now turn to a geometric understanding of how diagonalization informs us about the properties of AA.

Let's interpret the diagonalization A=PDP−1A=PDP−1 in terms of how AA acts as a linear operator.

When thinking of AA as a linear operator, diagonalization has a specific interpretation:

Diagonalization separates the influence of each vector component from the others.

To see what this means, let's first consider an easier case: a diagonal matrix. When we multiply a vector xx by a diagonal matrix DD, the change to each component of xx depends only on that component.

That is, multiplying by a diagonal matrix simply scales the components of the vector.

Example. Let D=[5003].D=[5003]. Then, D[x1x2]=[5x13x2].D[x1x2]=[5x13x2].

On the other hand, when we multiply by a matrix AA that has off-diagonal entries, the components of xx affect each other.

So diagonalizing a matrix allows us to bring intuition to its behavior as as linear operator.

Interpreting Diagonalization Geometrically¶

When we compute Px,Px, we are taking a vector sum of the columns of PP:

Px=p1x1+p2x2+…pnxn.Px=p1x1+p2x2+…pnxn.

Now PP is square and invertible, so its columns are a basis for RnRn. Let's call that basis B={p1,p2,…,pn}.B={p1,p2,…,pn}.

So, we can think of PxPx as "the point that has coordinates xx in the basis BB."

On the other hand, what if we wanted to find the coordinates of a vector in basis BB?

Let's say we have some yy written in the standard basis, and we want to find its coordinates in the basis BB.

So y=Px=x1p1+x2p2+⋯+xnpn=[x]B.y=Px=x1p1+x2p2+⋯+xnpn=[x]B.

Then since PP is invertible, x=P−1y.x=P−1y.

Thus, P−1yP−1y is "the coordinates of yy in the basis B.B."

So we can interpret Ax=PDP−1xAx=PDP−1x as:

  1. Compute the coordinates of xx in the basis BB.

    This is P−1x.P−1x.

  1. Scale those coordinates according to the diagonal matrix DD.

    This is DP−1x.DP−1x.

  1. Find the point that has those scaled coordinates in the basis B.B.

    This is PDP−1x.PDP−1x.

In [32]:
A = np.array([[ 1.86363636, 0.68181819],
              [-0.22727273, 3.13636364]])

P = np.array([[0.98058068, 0.51449576],
              [0.19611614, 0.85749293]])
D = np.array([[2, 0],
              [0, 3]])

np.allclose(A, P @ D @ np.linalg.inv(P))
Out[32]:
True

Example. Let's visualize diagonalization geometrically.

Consider a fixed-but-unwritten matrix AA. Here's a picture showing how it transforms a point x=[2.471.25]x=[2.471.25].

In [3]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
ax.annotate('', xy=(p2[0], p2[1]),  xycoords='data',
                xytext=(p1[0], p1[1]), textcoords='data',
                size=15,
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.7', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                )
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');

So far, we cannot say much about what the linear transformation AA does in general.

Now, let's compute P−1x.P−1x.

Remember that the columns of PP are the eigenvectors of AA.

So P−1xP−1x is the coordinates of the point xx in the eigenvector basis:

In [4]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
ut.plotVec(ax,v1,'b')
ut.plotVec(ax,v2)
ut.plotLinEqn(-v1[1],v1[0],0,color='b')
ut.plotLinEqn(-v2[1],v2[0],0,color='r')
for i in range(-4,8):
    ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
    ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(v2[0]-0.15,v2[1]+0.5,r'${\bf p_2}$',size=20)
ax.text(v1[0]-0.15,v1[1]+0.35,r'${\bf p_1}$',size=20)
ax.plot(0,0,'');

The coordinates of xx in this basis are (2,1).

In other words P−1x=[21].P−1x=[21].

Now, we compute DP−1x.DP−1x. Since DD is diagonal, this is just scaling each of the BB-coordinates.

In this example the eigenvalue corresponding to p1p1 is 2, and the eigenvalue corresponding to p2p2 is 3.

In [5]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
#ut.plotVec(ax,v1,'b')
#ut.plotVec(ax,v2)
ut.plotLinEqn(-v1[1],v1[0],0,color='b')
ut.plotLinEqn(-v2[1],v2[0],0,color='r')
for i in range(-4,8):
    ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
    ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
ax.annotate('', xy=(p2[0], p2[1]),  xycoords='data',
                xytext=(p1[0], p1[1]), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.7', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                )
ax.text(2.5,0.75,r'${\bf x}$',size=20)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');

So the coordinates of AxAx in the basis BB are

[2003][21]=[43].[2003][21]=[43].

Now we convert back to the standard basis -- that is, we ask which point has coordinates (4,3) in basis B.B.

We rely on the fact that if yy has coordinates xx in the basis BB, then y=Px.y=Px.

So

Ax=P[43]Ax=P[43]=PDP−1x.=PDP−1x.
In [6]:
#
ax = ut.plotSetup(-1,6,-1,6,size=(12,8))
ut.centerAxes(ax)
v1 = np.array([5.0,1.0])
v1 = v1 / np.sqrt(np.sum(v1*v1))
v2 = np.array([3.0,5.0])
v2 = v2 / np.sqrt(np.sum(v2*v2))
#ut.plotVec(ax,v1,'b')
#ut.plotVec(ax,v2)
#ut.plotLinEqn(-v1[1],v1[0],0,color='b')
#ut.plotLinEqn(-v2[1],v2[0],0,color='r')
#for i in range(-3,8):
#    ut.plotLinEqn(-v1[1],v1[0],i*(v1[0]*v2[1]-v1[1]*v2[0]),format=':',color='b')
#    ut.plotLinEqn(-v2[1],v2[0],i*(v2[0]*v1[1]-v2[1]*v1[0]),format=':',color='r')
p1 = 2*v1+v2
p2 = 4*v1+3*v2
#ut.plotVec(ax, p1,'k')
ut.plotVec(ax, p2,'k')
#ax.annotate('', xy=(p2[0], p2[1]),  xycoords='data',
#                xytext=(p1[0], p1[1]), textcoords='data',
#                size=15,
#                #bbox=dict(boxstyle="round", fc="0.8"),
#                arrowprops={'arrowstyle': 'simple',
#                                'fc': '0.7', 
#                                'ec': 'none',
#                                'connectionstyle' : 'arc3,rad=-0.3'},
#                )
#ax.text(2.5,0.75,r'${\bf x}$',size=16)
ax.text(5.2,2.75,r'$A{\bf x}$',size=20)
ax.plot(0,0,'');

We find that AxAx = PDP−1x=[5.463.35].PDP−1x=[5.463.35].

In conclusion: notice that the transformation x↦Axx↦Ax may be a complicated one in which each component of xx affects each component of AxAx.

However, by changing to the basis defined by the eigenspaces of AA, the action of AA becomes simple to understand.

Diagonalization of AA changes to a basis in which the action of AA is particularly easy to understand and compute with.

Lecture 29: Symmetric Matrices¶

[This lecture is based on Prof. Crovella's CS 132 lecture notes.]

In [3]:
#
fig = ut.three_d_figure((15, 1), 
                        'Intersection of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 with the constraint ||x|| = 1', 
                        -2, 2, -2, 2, 0, 8, 
                        equalAxes = False, figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
    x = np.array([np.cos(angle), np.sin(angle)])
    z = x.T @ qf @ x
    fig.plotPoint(x[0], x[1], z, 'b')
    fig.plotPoint(x[0], x[1], 0, 'g')
fig.plotQF(qf, alpha=0.5)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
# do not call fig.save here

Next week, we will design a generalization of this technique that applies to all matrices. It is called singular value decomposition, and it is the main mathematical tool we have been building up to in this course. We will then explore its data science applications.

But we need one more mathematical tool in our toolbelt first.

Today we'll study a very important class of matrices: symmetric matrices.

Definition. A symmetric matrix is a matrix AA such that A⊤=AA⊤=A.

Example. Here are three symmetric matrices:

[100−3],⎡⎢⎣0−10−15808−7⎤⎥⎦,⎡⎢⎣abcbdecef⎤⎥⎦[100−3],[0−10−15808−7],[abcbdecef]

We have already seen symmetric matrices within the normal equation: (A⊤A)(A⊤A) is always symmetric because

(A⊤A)⊤=A⊤(A⊤)⊤=(A⊤A).(A⊤A)⊤=A⊤(A⊤)⊤=(A⊤A).

We'll see that symmetric matrices have properties that relate to both eigendecomposition and orthogonality.

Furthermore, symmetric matrices open up a broad class of problems we haven't yet touched on: constrained optimization.

As a result, symmetric matrices arise very often in applications.

29.1 Diagonalizing a symmetric matrix¶

At first glance, a symmetric matrix may not seem that special!

But in fact symmetric matrices have a number of interesting properties.

First, we'll look at a remarkable fact:

the eigenvectors of a symmetric matrix are orthogonal

Example. Diagonalize the following symmetric matrix:

A=⎡⎢⎣6−2−1−26−1−1−15⎤⎥⎦A=[6−2−1−26−1−1−15]

Solution. Remember that diagonalization is a four-step process: (1) find the eigenvalues of AA, (2) find the eigenvectors of AA, (3) construct PP, and (4) construct DD.

The characteristic equation of AA is

0=det(A−λI)=−λ3+17λ2−90λ+144=−(λ−8)(λ−6)(λ−3)(1)(2)(3)(1)0=det(A−λI)(2)=−λ3+17λ2−90λ+144(3)=−(λ−8)(λ−6)(λ−3)

So the eigenvalues are 8, 6, and 3.

We construct a basis for each eigenspace (using our standard method of finding the nullspace of A−λIA−λI):

λ1=8:v1=⎡⎢⎣−110⎤⎥⎦;λ2=6:v2=⎡⎢⎣−1−12⎤⎥⎦;λ3=3:v3=⎡⎢⎣111⎤⎥⎦λ1=8:v1=[−110];λ2=6:v2=[−1−12];λ3=3:v3=[111]

These three vectors form a basis for R3.R3.

But here is the remarkable thing: these three vectors form an orthogonal set.

That is, any two eigenvectors are orthogonal.

For example,

v⊤1v2=(−1)(−1)+(1)(−1)+(0)(2)=0v1⊤v2=(−1)(−1)+(1)(−1)+(0)(2)=0

As a result, {v1,v2,v3}{v1,v2,v3} form an orthogonal basis for R3.R3.

Let's normalize these vectors so they each have length 1:

u1=⎡⎢ ⎢⎣−1/√21/√20⎤⎥ ⎥⎦;u2=⎡⎢ ⎢⎣−1/√6−1/√62/√6⎤⎥ ⎥⎦;u3=⎡⎢ ⎢⎣1/√31/√31/√3⎤⎥ ⎥⎦u1=[−1/21/20];u2=[−1/6−1/62/6];u3=[1/31/31/3]

The orthogonality of the eigenvectors of a symmetric matrix is quite important.

To see this, let's write the diagonalization of AA in terms of these eigenvectors and eigenvalues:

P=⎡⎢ ⎢⎣−1/√2−1/√61/√31/√2−1/√61/√302/√61/√3⎤⎥ ⎥⎦,D=⎡⎢⎣800060003⎤⎥⎦.P=[−1/2−1/61/31/2−1/61/302/61/3],D=[800060003].

Then, A=PDP−1A=PDP−1 as usual.

But, here is the interesting thing:

PP is square and has orthonormal columns.

So PP is an orthogonal matrix (also known as an orthonormal matrix).

So, that means that P−1=P⊤.P−1=P⊤.

So, A=PDP⊤.A=PDP⊤.

The Spectral Theorem¶

Here is a theorem that shows that we always get orthogonal eigenvectors when we diagonalize a symmetric matrix:

Theorem. If AA is symmetric, then any two eigenvectors of AA from different eigenspaces are orthogonal.

Proof.

Let v1v1 and v2v2 be eigenvectors that correspond to distinct eigenvalues, say, λ1λ1 and λ2.λ2.

To show that v⊤1v2=0,v1⊤v2=0, compute

λ1v⊤1v2=(λ1v1)⊤v2λ1v1⊤v2=(λ1v1)⊤v2
=(Av1)⊤v2=(Av1)⊤v2
=(v⊤1A⊤)v2=(v1⊤A⊤)v2
=v⊤1(A⊤v2)=v1⊤(A⊤v2)
=v⊤1(Av2)=v1⊤(Av2)
=v⊤1(λ2v2)=v1⊤(λ2v2)
=λ2v⊤1v2=λ2v1⊤v2

So we conclude that λ1(v⊤1v2)=λ2(v⊤1v2).λ1(v1⊤v2)=λ2(v1⊤v2).

But λ1≠λ2,λ1≠λ2, so this can only happen if v⊤1v2=0.v1⊤v2=0.

So v1v1 is orthogonal to v2.v2.

The same argument applies to any other pair of eigenvectors with distinct eigenvalues.

So any two eigenvectors of AA from different eigenspaces are orthogonal.

Orthogonal Diagonalization¶

We can now introduce a special kind of diagonalizability:

An n×nn×n matrix is said to be orthogonally diagonalizable if there are an orthogonal matrix PP (with P−1=P⊤P−1=P⊤) and a diagonal matrix DD such that

A=PDP⊤=PDP−1A=PDP⊤=PDP−1

Such a diagonalization requires nn linearly independent and orthonormal eigenvectors.

When is this possible?

If AA is orthogonally diagonalizable, then

A⊤=(PDP⊤)⊤=(P⊤)⊤D⊤P⊤=PDP⊤=AA⊤=(PDP⊤)⊤=(P⊤)⊤D⊤P⊤=PDP⊤=A

So AA is symmetric!

It turns out the converse is true (though we won't prove it).

Hence we obtain the following theorem:

Theorem. An n×nn×n matrix is orthogonally diagonalizable if and only if it is a symmetric matrix.

Recall from last time: we found that it was a difficult process, in general, to determine if an arbitrary matrix was diagonalizable.

But here, we have a very nice rule: every symmetric matrix is (orthogonally) diagonalizable.

29.2 Quadratic Forms¶

An important role that symmetric matrices play has to do with quadratic expressions.

Up until now, we have mainly focused on linear equations -- equations in which the xixi terms occur only to the first power.

Actually, though, we have looked at some quadratic expressions when we considered least-squares problems.

For example, we looked at expressions such as ∥x∥2‖x‖2 which is ∑x2i.∑xi2.

We'll now look at quadratic expressions generally. We'll see that there is a natural and useful connection to symmetric matrices.

Definition. A quadratic form is a function of variables, eg, x1,x2,…,xn,x1,x2,…,xn, in which every term has degree two.

Examples:

4x21+2x1x2+3x224x12+2x1x2+3x22 is a quadratic form.

4x21+2x14x12+2x1 is not a quadratic form.

Quadratic forms arise in many settings, including signal processing, graph analysis, physics, economics, and statistics.

Within data science, quadratic forms arise in many optimization problems, among other areas.

Essentially, what an expression like x2x2 is to a scalar, a quadratic form is to a vector.

Fact. Every quadratic form can be expressed as x⊤Axx⊤Ax, where AA is a symmetric matrix.

There is a simple way to go from a quadratic form to a symmetric matrix, and vice versa.

To see this, let's look at some examples.

Example. Let x=[x1x2].x=[x1x2]. Compute x⊤Axx⊤Ax for the matrix A=[4003].A=[4003].

Solution.

x⊤Ax=[x1x2][4003][x1x2]x⊤Ax=[x1x2][4003][x1x2]
=[x1x2][4x13x2]=[x1x2][4x13x2]
=4x21+3x22.=4x12+3x22.

Example. Compute x⊤Axx⊤Ax for the matrix A=[3−2−27].A=[3−2−27].

Solution.

x⊤Ax=[x1x2][3−2−27][x1x2]x⊤Ax=[x1x2][3−2−27][x1x2]
=x1(3x1−2x2)+x2(−2x1+7x2)=x1(3x1−2x2)+x2(−2x1+7x2)
=3x21−2x1x2−2x2x1+7x22=3x12−2x1x2−2x2x1+7x22
=3x21−4x1x2+7x22=3x12−4x1x2+7x22

Notice the simple structure: a11 is multiplied by x1x1a11 is multiplied by x1x1 a12 is multiplied by x1x2a12 is multiplied by x1x2 a21 is multiplied by x2x1a21 is multiplied by x2x1 a22 is multiplied by x2x2a22 is multiplied by x2x2

We can write this concisely:

x⊤Ax=∑ijaijxixjx⊤Ax=∑ijaijxixj

Example. For xx in R3R3, let

Q(x)=5x21+3x22+2x23−x1x2+8x2x3.Q(x)=5x12+3x22+2x32−x1x2+8x2x3.

Write this quadratic form Q(x)Q(x) as x⊤Axx⊤Ax.

Solution.

The coefficients of x21,x22,x23x12,x22,x32 go on the diagonal of AA.

Based on the previous example, we can see that the coefficient of each cross term xixjxixj is the sum of two values in symmetric positions on opposite sides of the diagonal of AA.

So to make AA symmetric, the coefficient of xixjxixj for i≠ji≠j must be split evenly between the (i,j)(i,j)- and (j,i)(j,i)-entries of AA.

You can check that

Q(x)=x⊤Ax=[x1x2x3]⎡⎢⎣5−1/20−1/234042⎤⎥⎦⎡⎢⎣x1x2x3⎤⎥⎦Q(x)=x⊤Ax=[x1x2x3][5−1/20−1/234042][x1x2x3]

Visualizing Quadratic Forms¶

Notice that x⊤Axx⊤Ax is a scalar.

In other words, when AA is an n×nn×n matrix, the quadratic form Q(x)=xTAxQ(x)=xTAx is a function that maps vectors in the domain RnRn to real numbers in the codomain RR.

Here are four quadratic forms with domain R2R2.

Notice that except at x=0,x=0, the values of Q(x)Q(x) are all positive in the first case, and all negative in the last case.

In [8]:
#
fig = ut.three_d_figure((29, 1), 'z = 3 x1^2 + 7 x2 ^2',
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 + 7x_2^2$', 'z = 3x_1^2 + 7x_2^2', size = 18)
fig.save();
In [9]:
#
fig = ut.three_d_figure((29, 2), 'z = 3 x1^2', 
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 0.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2$', 'z = 3x_1^2', size = 18)
fig.save();
In [10]:
#
fig = ut.three_d_figure((29, 3), 'z = 3 x1^2 - 7 x2 ^2', 
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 - 7x_2^2$', 'z = 3x_1^2 - 7x_2^2', size = 18)
fig.save();
In [11]:
#
fig = ut.three_d_figure((29, 4), 'z = -3 x1^2 - 7 x2 ^2', 
                        -2, 2, -2, 2, -3, 1, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[-3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = -3x_1^2 - 7x_2^2$', 'z = -3x_1^2 - 7x_2^2', size = 18)
fig.save();

29.3 Classifying Quadratic Forms¶

The differences between these surfaces is important for problems such as optimization.

In an optimization problem, one seeks the minimum or maximum value of a function (perhaps over a subset of its domain).

Definition. A quadratic form QQ is:

  1. positive definite if Q(x)>0Q(x)>0 for all x≠0.x≠0.
  2. positive semidefinite if Q(x)≥0Q(x)≥0 for all x≠0.x≠0.
  3. indefinite if Q(x)Q(x) assumes both positive and negative values.
  4. negative definite if Q(x)<0Q(x)<0 for all x≠0.x≠0.
  5. negative semidefinite if Q(x)≤0Q(x)≤0 for all x≠0.x≠0.

If Q=x⊤AxQ=x⊤Ax, then we will refer to AA as a positive (semi)definite or negative (semi)definite matrix.

Knowing which kind of quadratic form one is dealing with is important for optimization. Consider these two quadratic forms:

P(x)=xTAx, where A=[3221]P(x)=xTAx, where A=[3221]Q(x)=xTBx, where B=[3223]Q(x)=xTBx, where B=[3223]

Notice that that matrices differ in only one position.

Let's say we would like to find

  • the minimum value of P(x)P(x) over all xx, or
  • the minimum value of Q(x)Q(x) over all xx

In each case, we need to ask: is it possible? Does a minimum value even exist?

In [12]:
#
fig = ut.three_d_figure((29, 5), 'P(x)', 
                        -5, 5, -5, 5, -5, 5, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 2],[2, 1]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = \mathbf{x}^TA\mathbf{x}$', 'z = x^TAx', size = 18)
fig.ax.view_init(azim = 30)
fig.save();
In [13]:
#
fig = ut.three_d_figure((29, 6), 'Q(x)', 
                        -5, 5, -5, 5, -5, 5, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 2],[2, 3]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = \mathbf{x}^TB\mathbf{x}$', 'z = x^TBx', size = 18)
fig.save();

Clearly, we cannot minimize xTAxxTAx over all xx,

... but we can minimize xTBxxTBx over all xx.

How can we distinguish between these two cases in general?

That is, in high dimension, without the ability to visualize?

There is a remarkably simple way to determine, for any quadratic form, which class it falls into.

Theorem. Let AA be an n×nn×n symmetric matrix.

Then a quadratic form xTAxxTAx is

  1. positive definite if and only if the eigenvalues of AA are all positive.
  2. positive semidefinite if and only if the eigenvalues of AA are all nonnegative.
  3. indefinite if and only if AA has both positive and negative eigenvalues.
  4. negative definite if and only if the eigenvalues of AA are all negative.
  5. negative semidefinite if and only if the eigenvalues of AA are all nonpositive.

Let's see why this theorem holds for the positive semidefinite case. (The others are similar.)

Let's consider uiui, an eigenvector of AA. Then

uTiAui=λiuTiui.uiTAui=λiuiTui.

If all eigenvalues are positive or zero, then all such terms are positive or zero.

Since AA is symmetric, it is diagonalizable and so its eigenvectors span RnRn.

So any x=∑iciuix=∑iciui can be expressed as a weighted sum of AA's eigenvectors u1,…,unu1,…,un.

Writing out the expansion of xTAxxTAx in terms of AA's eigenvectors, we get only positive or zero terms.

Proof (semi-definite case). Let λλ be an eigenvalue of AA with corresponding eigenvector u≠0u≠0. Then:

Q(u)=u⊤Au=u⊤λu=λ∥u∥2.Q(u)=u⊤Au=u⊤λu=λ‖u‖2.

If λ<0λ<0 then Q(u)Q(u) is negative, so AA cannot be positive semi-definite.

Or to state the contrapositive of this statement: if AA is positive-semidefinite, then AA can only have positive or zero eigenvalues.

Example. Let's look at the four quadratic forms above. Their associated matrices are

(a)[3007](b)[3000](c)[300−7](d)[−300−7](a)[3007](b)[3000](c)[300−7](d)[−300−7]

Remember that for a diagonal matrix, its eigenvalues equal its diagonal entries.

In [14]:
#
fig = ut.three_d_figure((29, 1), 'z = 3 x1^2 + 7 x2 ^2',
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 + 7x_2^2$', 'z = 3x_1^2 + 7x_2^2', size = 18)
fig.save();
In [15]:
#
fig = ut.three_d_figure((29, 2), 'z = 3 x1^2', 
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 0.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2$', 'z = 3x_1^2', size = 18)
fig.save();
In [16]:
#
fig = ut.three_d_figure((29, 3), 'z = 3 x1^2 - 7 x2 ^2', 
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 - 7x_2^2$', 'z = 3x_1^2 - 7x_2^2', size = 18)
fig.save();
In [17]:
#
fig = ut.three_d_figure((29, 4), 'z = -3 x1^2 - 7 x2 ^2', 
                        -2, 2, -2, 2, -3, 1, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[-3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = -3x_1^2 - 7x_2^2$', 'z = -3x_1^2 - 7x_2^2', size = 18)
fig.save();

Example. Is Q(x)=3x21+2x22+x23+4x1x2+4x2x3Q(x)=3x12+2x22+x32+4x1x2+4x2x3 positive definite?

Solution. Because of all the plus signs, this form "looks" positive definite. But the matrix of the form is

⎡⎢⎣320222021⎤⎥⎦[320222021]

and the eigenvalues of this matrix turn out to be 5, 2, and -1.

So QQ is an indefinite quadratic form.

Next lecture we'll talk about constrained optimization problems and how to solve them using eigenvalues.

In [ ]: