In [2]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • Homework:
    • Homework 8 due Friday, 4/14 at 8pm
  • Final exam: Monday, May 8 from 12-2pm
  • Upcoming office hours:
    • Today: Prof McDonald from 4:45-6pm in CCDS 1341
    • Tomorrow: Peer tutor Rohan Anand from 1:30-3pm in CCDS 16th floor
  • Reading
    • Aggarwal Sections 7.1-7.4

Recap from last lecture¶

Definition. A quadratic form is a function of variables, eg, x1,x2,…,xn,x1,x2,…,xn, in which every term has degree two.

Every quadratic form can be expressed as x⊤Axx⊤Ax, where AA is a symmetric matrix. Note that the result of this expression is a scalar.

A quadratic form QQ is called: if: which happens when the eigenvalues of AA are:
positive definite Q(x)>0Q(x)>0 for all x≠0x≠0 all positive
positive semidefinite Q(x)≥0Q(x)≥0 for all x≠0x≠0 all positive or 0
indefinite Q(x)Q(x) can be positive or negative both positive and negative
negative definite Q(x)<0Q(x)<0 for all x≠0x≠0 all negative
negative semidefinite Q(x)≤0Q(x)≤0 for all x≠0x≠0 all negative or 0

If Q=x⊤AxQ=x⊤Ax, then we will refer to AA as a positive (semi)definite or negative (semi)definite matrix.

A common kind of optimization is to find the maximum or the minimum value of a quadratic form Q(x)Q(x) for xx in some specified set.

This is called constrained optimization.

For example, a common constraint is that xx varies over the set of unit vectors.

minimize Q(x) subject to ∥x∥=1minimize Q(x) subject to ‖x‖=1
In [3]:
#
fig = ut.three_d_figure((15, 1), 
                        'Intersection of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 with the constraint ||x|| = 1', 
                        -2, 2, -2, 2, 0, 8, 
                        equalAxes = False, figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
    x = np.array([np.cos(angle), np.sin(angle)])
    z = x.T @ qf @ x
    fig.plotPoint(x[0], x[1], z, 'b')
    fig.plotPoint(x[0], x[1], 0, 'g')
fig.plotQF(qf, alpha=0.5)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
# do not call fig.save here

Theorem. Let AA be a symmetric matrix, and let

M=maxxTx=1xTAx.M=maxxTx=1xTAx.

Then MM is the greatest eigenvalue λ1λ1 of AA.

Similarly, the minimum value of Q(x)Q(x) over all unit vectors is equal to the smallest eigenvector λnλn.

Lecture 31: Singular Value Decomposition¶

[This lecture is based on Prof. Crovella's CS 132 and CS 506 lecture notes.]

31.1 Our Last Matrix Factorization¶

Today we will study the most useful decomposition in applied Linear Algebra.

Pretty exciting, eh?

The Singular Value Decomposition is the “Swiss Army Knife” and the “Rolls Royce” of matrix decompositions.

-- Diane O'Leary

In [3]:
#
display(Image("images/18-knife.jpg", width=350))
In [4]:
# image source https://bringatrailer.com/listing/1964-rolls-royce-james-young-phanton-v-limosine/
display(Image("images/18-rolls-royce.jpg", width=350))

The data science view¶

Before I show you how this new matrix factorization works, I want to explain what it does and why it is useful to data science.

This new technique, called singular value decomposition or SVD, can approximate and simplify any dataset that is provided in matrix form.

m data objects⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩n features⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣a11⋮ai1⋮am1…⋱…⋱…a1j⋮aij⋮amj…⋱…⋱…a1n⋮ain⋮amn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦m data objects{[a11⋮ai1⋮am1…⋱…⋱…a1j⋮aij⋮amj…⋱…⋱…a1n⋮ain⋮amn]⏞n features
Data Type Rows Columns Elements
Network Traffic Sources Destinations Number of bytes
Social Media Users Time bins Number of posts/tweets/likes
Web Browsing Users Content categories Visit counts/bytes downloaded
Web Browsing Users Time bins Visit counts/bytes downloaded

Using SVD, we can "compress" a matrix of real-world messy data (exactly or approximately) into matrices of smaller dimension.

objects⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩features⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⋮⋮a1⋮⋮⋮⋮a2⋮⋮…⋮⋮an⋮⋮⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=k⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⋮⋮⋮⋮σ1u1σkuk⋮⋮⋮⋮⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦×[……v1…………vk……]objects{[⋮⋮a1⋮⋮⋮⋮a2⋮⋮…⋮⋮an⋮⋮]⏞features=[⋮⋮⋮⋮σ1u1σkuk⋮⋮⋮⋮]⏞k×[……v1…………vk……]A=U(ΣVT)A=U(ΣVT)

Notice that UU contains a row for each object.

In a sense we have transformed objects from an nn dimensional space to a kk dimensional space, where kk is (probably much) smaller than nn.

Data science models, at their core, are meant to be a simplification of the data.

In particular, instead of thinking of the data as thousands or millions of individual data points, we think of it as being "close" to a best-fit linear function, a small number of clusters, a parametric distribution, etc, etc.

From this simpler description, we hope to gain insight.

There is an interesting question here: why does this process often lead to insight?

That is, why does it happen so often that a large dataset can be described in terms of a much simpler model?

This is because often "Occam's razor" can be applied:

Among competing hypotheses, the one with the fewest assumptions should be selected.

In other words, the world is full of simple (but often hidden) patterns, and it is more common for a set of observations to be determined by a simple process than a complex process.

From which one can justify the observation that "modeling works suprisingly often."

The geometric view¶

SVD applies to any matrix MM, no matter its size, (a)symmetry, invertibility, or diagonalizability.

Recall the geometric view of an m×nm×n matrix MM: it corresponds to a linear transformation that maps vectors in RnRn to vectors in RmRm.

SVD says that the linear transformation corresponding to MM can be decomposed into a sequence of three steps:

  1. Rotation within RnRn, without scaling (i.e., rotating from the standard basis to another orthonormal basis).
In [27]:
# Source: Wikipedia
display(Image("images/18-svd-geometric.png", width=350))
  1. Scaling each dimension ii by a constant σiσi, called a singular value.
    • If m≠nm≠n, then the space is also transformed from RnRn to RmRm at the same time. Some dimensions might be projected away.
    • Some dimensions might also have such a small value of σiσi that they might as well be projected away. That is, we might choose to remove them with little impact on the accuracy of the result.
  1. Rotation within RmRm.

    Note that this is not the inverse of step 1: the rotation might be by a different angle, across a different axis, and in a space of different dimension.

The algebraic view¶

Theorem. Let AA be an m×nm×n matrix with rank rr. Then there exists an m×nm×n matrix ΣΣ whose diagonal entries are the first rr singular values of AA, σ1≥σ2≥⋯≥σr>0,σ1≥σ2≥⋯≥σr>0, and there exists an m×mm×m orthogonal matrix UU and an n×nn×n orthogonal matrix VV such that

A=UΣVTA=UΣVT
m×n⎡⎢ ⎢ ⎢ ⎢⎣⋱⋱A⋱⋱⎤⎥ ⎥ ⎥ ⎥⎦=m×m⎡⎢ ⎢ ⎢ ⎢⎣⋱U⋱⎤⎥ ⎥ ⎥ ⎥⎦m×n⎡⎢ ⎢ ⎢ ⎢⎣⋱Σ⋱⎤⎥ ⎥ ⎥ ⎥⎦n×n⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⋱⋱V⊤⋱⋱⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[⋱⋱A⋱⋱]⏞m×n=[⋱U⋱]⏞m×m[⋱Σ⋱]⏞m×n[⋱⋱V⊤⋱⋱]⏞n×n

Any factorization A=UΣVT,A=UΣVT, with UU and VV orthogonal and ΣΣ a diagonal matrix is called a singular value decomposition (SVD) of AA.

The columns of UU are called the left singular vectors and the columns of VV are called the right singular vectors of AA.

Consider the diagonalization of a symmetric positive-definite matrix S=PDPTS=PDPT. If we set

U=P=V,andΣ=DU=P=V,andΣ=D

then the singular value decomposition of SS is just its eigendecomposition.

SVD is a generalization of diagonalization.

Remember our geometric interpretation of the diagonalization of a square matrix Ax=PDP−1xAx=PDP−1x:

  1. Compute the coordinates of xx in the basis BB

  2. Scale those coordinates according to the diagonal matrix DD

  3. Find the point that has those scaled coordinates in the basis BB

SVD lets us do a similar decomposition for any arbitrary matrix AA!

m×n⎡⎢ ⎢ ⎢ ⎢⎣⋱⋱A⋱⋱⎤⎥ ⎥ ⎥ ⎥⎦=m×m⎡⎢ ⎢ ⎢⎣⋮⋮⋮u1u2⋯um⋮⋮⋮⎤⎥ ⎥ ⎥⎦m×n⎡⎢⎣σ100000σ2⋯00000σm00⎤⎥⎦n×n⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⋯⋯v1⋯⋯⋯⋯v2⋯⋯⋮⋮⋯⋯vn⋯⋯⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[⋱⋱A⋱⋱]⏞m×n=[⋮⋮⋮u1u2⋯um⋮⋮⋮]⏞m×m[σ100000σ2⋯00000σm00]⏞m×n[⋯⋯v1⋯⋯⋯⋯v2⋯⋯⋮⋮⋯⋯vn⋯⋯]⏞n×n

Today we'll work through what each of the components of this decomposition are.

31.2 Maximizing ∥Ax∥‖Ax‖¶

In our diagonalization A=PDP−1A=PDP−1, the diagonal values of DD are the eigenvectors of AA.

We will see that this does not work for any arbitrary A∈Rn×m.A∈Rn×m.

Instead, SVD can be understood through a very simple question:

Among all unit vectors, what is the vector xx that maximizes ∥Ax∥‖Ax‖?

In other words, in which direction does AA create the largest output vector from a unit input?

To set the stage to answer this question, let's review a few facts.

You recall that the eigenvalues of a square matrix AA measure the amount that AA "stretches or shrinks" certain special vectors (the eigenvectors).

For example, for a square AA, if Ax=λxAx=λx and ∥x∥=1,‖x‖=1, then

∥Ax∥=∥λx∥=|λ|∥x∥=|λ|.‖Ax‖=‖λx‖=|λ|‖x‖=|λ|.

Here is one such example, for a 2×22×2 matrix

A=[1.23−0.530.030.67].A=[1.23−0.530.030.67].
In [4]:
#
V = np.array([[2,1],[.1,1]])
L = np.array([[1.2,0],
              [0,0.7]])
A = V @ L @ np.linalg.inv(V)
#
ax = dm.plotSetup(-1.5,1.5,-1.5, 1.5, size=(9,6))
ut.centerAxes(ax)
theta = [2 * np.pi * f for f in np.array(range(360))/360.0]
x = [np.array([np.sin(t), np.cos(t)]) for t in theta]
Ax = [A.dot(xv) for xv in x]
ax.plot([xv[0] for xv in x],[xv[1] for xv in x],'-b')
ax.plot([Axv[0] for Axv in Ax],[Axv[1] for Axv in Ax],'--r')
theta_step = np.linspace(0, 2*np.pi, 24)
for th in theta_step:
    x = np.array([np.sin(th), np.cos(th)])
    ut.plotArrowVec(ax, A @ x, x, head_width=.04, head_length=.04, length_includes_head = True, color='g')
u, s, v = np.linalg.svd(A)
ut.plotArrowVec(ax, [0.3* V[0][0], 0.3*V[1][0]], head_width=.04, head_length=.04, length_includes_head = True, color='Black')
ut.plotArrowVec(ax, [0.3* V[0][1], 0.3*V[1][1]], head_width=.04, head_length=.04, length_includes_head = True, color='Black')
ax.set_title(r'Eigenvectors of $A$ and the image of the unit circle under $A$');

The largest value of ∥Ax∥‖Ax‖ is the long axis of the ellipse. Clearly there is some xx that is mapped to that point by AA. That xx is what we want to find.

And let's make clear that we can apply this idea to arbitrary (non-square) matrices.

Here is an example that shows that we can still ask the question of what unit xx maximizes ∥Ax∥‖Ax‖ even when AA is not square.

Consider the 2×32×3 matrix A=[4111487−2].A=[4111487−2].

The linear transformation x↦Axx↦Ax maps the unit sphere {x:∥x∥=1}{x:‖x‖=1} in R3R3 onto an ellipse in R2R2, as shown here:

In [2]:
#
display(Image("images/18-Lay-fig-7-4-1.jpg", width=650))

∥Ax∥2‖Ax‖2 is a Quadratic Form¶

Now, here is a way to answer our question:

Problem. Find the unit vector xx at which the length ∥Ax∥‖Ax‖ is maximized, and compute this maximum length.

Solution. The quantity ∥Ax∥2‖Ax‖2 is maximized at the same xx that maximizes ∥Ax∥‖Ax‖, and ∥Ax∥2‖Ax‖2 is easier to study.

So let's look for the unit vector xx at which ∥Ax∥2‖Ax‖2 is maximized.

Observe that

∥Ax∥2=(Ax)T(Ax)‖Ax‖2=(Ax)T(Ax)
=xTATAx=xTATAx
=xT(ATA)x=xT(ATA)x

Now, ATAATA is a symmetric matrix.

So we see that ∥Ax∥2=xTATAx‖Ax‖2=xTATAx is a quadratic form!

... and we are seeking to maximize it subject to the constraint ∥x∥=1‖x‖=1.

Recall from last time that we solved precisely this kind of constrained optimization problem:

the maximum value of a quadratic form, subject to the constraint that ∥x∥=1‖x‖=1, is the largest eigenvalue of the symmetric matrix.

So the maximum value of ∥Ax∥2‖Ax‖2 subject to ∥x∥=1‖x‖=1 is λ1λ1, the largest eigenvalue of ATAATA.

Also, the maximum is attained at a unit eigenvector of ATAATA corresponding to λ1λ1.

For the matrix AA in the 2 ×× 3 example,

ATA=⎡⎢⎣4811714−2⎤⎥⎦[4111487−2]=⎡⎢⎣801004010017014040140200⎤⎥⎦.ATA=[4811714−2][4111487−2]=[801004010017014040140200].

The eigenvalues of ATAATA are λ1=360,λ2=90,λ1=360,λ2=90, and λ3=0.λ3=0.

The corresponding unit eigenvectors are, respectively,

v1=⎡⎢⎣1/32/32/3⎤⎥⎦,v2=⎡⎢⎣−2/3−1/32/3⎤⎥⎦,v3=⎡⎢⎣2/3−2/31/3⎤⎥⎦.v1=[1/32/32/3],v2=[−2/3−1/32/3],v3=[2/3−2/31/3].

For ∥x∥=1‖x‖=1, the maximum value of ∥Ax∥‖Ax‖ is at ATAATA's eigenvector ∥Av1∥=√360.‖Av1‖=360.

This example shows that the key to understanding the effect of AA on the unit sphere in R3R3 is to examime the quadratic form xT(ATA)x.xT(ATA)x.

We can also go back to our 2 ×× 2 example.

Let's plot the eigenvectors of ATAATA.

In [6]:
#
ax = dm.plotSetup(-1.5,1.5,-1.5, 1.5, size=(9,6))
ut.centerAxes(ax)
theta = [2 * np.pi * f for f in np.array(range(360))/360.0]
x = [np.array([np.sin(t), np.cos(t)]) for t in theta]
Ax = [A.dot(xv) for xv in x]
ax.plot([xv[0] for xv in x],[xv[1] for xv in x],'-b')
ax.plot([Axv[0] for Axv in Ax],[Axv[1] for Axv in Ax],'--r')
theta_step = np.linspace(0, 2*np.pi, 24)
#for th in theta_step:
#    x = np.array([np.sin(th), np.cos(th)])
#    ut.plotArrowVec(ax, A @ x, x, head_width=.04, head_length=.04, length_includes_head = True, color='g')
u, s, v = np.linalg.svd(A)
ut.plotArrowVec(ax, [v[0][0], v[1][0]], head_width=.04, head_length=.04, length_includes_head = True, color='Black')
ut.plotArrowVec(ax, [v[0][1], v[1][1]], head_width=.04, head_length=.04, length_includes_head = True, color='Black')
ut.plotArrowVec(ax, [s[0]*u[0][0], s[0]*u[1][0]], [v[0][0], v[1][0]], head_width=.04, head_length=.04, length_includes_head = True, color='g')
ut.plotArrowVec(ax, [s[1]*u[0][1], s[1]*u[1][1]], [v[0][1], v[1][1]], head_width=.04, head_length=.04, length_includes_head = True, color='g')
ax.set_title(r'Eigenvectors of $A^TA$ and their images under $A$');

We see that the eigenvector corresponding to the largest eigenvalue of ATAATA indeed shows us where ∥Ax∥‖Ax‖ is maximized -- where the ellipse is longest.

Also, the other eigenvector of ATAATA shows us where the ellipse is narrowest.

In fact, the entire geometric behavior of the transformation x↦Axx↦Ax is captured by the quadratic form xTATAxxTATAx.

31.3 The Singular Values of a Matrix¶

Let's continue to consider AA to be an arbitrary m×nm×n matrix.

Notice that even though AA is not square in general, ATAATA is square and symmetric.

So, there is a lot we can say about ATAATA.

In particular, since ATAATA is symmetric, it can be orthogonally diagonalized.

So let {v1,…,vn}{v1,…,vn} be an orthonormal basis for RnRn consisting of eigenvectors of ATAATA, and let λ1,…,λnλ1,…,λn be the corresponding eigenvalues of ATAATA.

Then, for any eigenvector vivi,

∥Avi∥2=(Avi)TAvi=vTiATAvi‖Avi‖2=(Avi)TAvi=viTATAvi
=vTi(λi)vi=viT(λi)vi

(since vivi is an eigenvector of ATAATA)

=λi=λi

(since vivi is a unit vector.)

Now any expression ∥⋅∥2‖⋅‖2 is nonnegative.

So the eigenvalues of ATAATA are all nonnegative.

That is: ATAATA is positive semidefinite.

We can therefore renumber the eigenvalues so that

λ1≥λ2≥⋯≥λn≥0.λ1≥λ2≥⋯≥λn≥0.

Definition. The singular values of AA are the square roots of the eigenvalues of ATAATA. They are denoted by σ1,…,σn,σ1,…,σn, and they are arranged in decreasing order.

That is, σi=√λiσi=λi for i=1,…,n.i=1,…,n.

m×n⎡⎢ ⎢ ⎢ ⎢⎣⋱⋱A⋱⋱⎤⎥ ⎥ ⎥ ⎥⎦=m×m⎡⎢ ⎢ ⎢⎣⋮⋮⋮u1u2⋯um⋮⋮⋮⎤⎥ ⎥ ⎥⎦m×n⎡⎢⎣σ100000σ2⋯00000σm00⎤⎥⎦n×n⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⋯⋯v1⋯⋯⋯⋯v2⋯⋯⋮⋮⋯⋯vn⋯⋯⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[⋱⋱A⋱⋱]⏞m×n=[⋮⋮⋮u1u2⋯um⋮⋮⋮]⏞m×m[σ100000σ2⋯00000σm00]⏞m×n[⋯⋯v1⋯⋯⋯⋯v2⋯⋯⋮⋮⋯⋯vn⋯⋯]⏞n×n

By the above argument, the singular values of AA are the lengths of the vectors Av1,…,Avn.Av1,…,Avn.

Where v1,…,vnv1,…,vn are the eigenvectors of ATAATA, normalized to unit length.

The Eigenvectors of ATAATA Lead To an Orthogonal Basis for ColACol⁡A¶

Now: we know that vectors v1,…,vnv1,…,vn are an orthogonal set because they are eigenvectors of the symmetric matrix ATAATA.

However, it's also the case that Av1,…,AvnAv1,…,Avn are an orthogonal set.

This fact is key to the SVD.

Another way to look at this is to consider that Avi=σiuiAvi=σiui. By definition, uiui is also orthonormal. In other words, we've found a set of vectors v1,…,vnv1,…,vn that, when transformed with AA, preserves orthognonality.

Theorem. Suppose {v1,…,vn}{v1,…,vn} is an orthonormal basis of RnRn consisting of eigenvectors of ATAATA, arranged so that the corresponding eigenvalues of ATAATA satisfy λ1≥⋯≥λn,λ1≥⋯≥λn, and suppose AA has rr nonzero singular values.

Then {Av1,…,Avr}{Av1,…,Avr} is an orthogonal basis for ColA,Col⁡A, and rank A=rA=r.

Note how surprising this is: while {v1,…,vn}{v1,…,vn} are a basis for RnRn, ColACol⁡A is a subspace of RmRm.

Nonetheless,

  • two eigenvectors vivi and vj∈Rnvj∈Rn are orthogonal, and
  • their images AviAvi and Avj∈RmAvj∈Rm are also orthogonal.

Proof. What we need to do is establish that {Av1,…,Avr}{Av1,…,Avr} is an orthogonal linearly independent set whose span is Col ACol⁡ A.

Because vivi and vjvj are orthogonal for i≠ji≠j,

(Avi)T(Avj)=vTiATAvj=vTi(λjvj)=0.(Avi)T(Avj)=viTATAvj=viT(λjvj)=0.

So {Av1,…,Avn}{Av1,…,Avn} is an orthogonal set.

Furthermore, since the lengths of the vectors Av1,…,AvnAv1,…,Avn are the singular values of AA, and since there are rr nonzero singular values, Avi≠0Avi≠0 if and only if 1≤i≤r.1≤i≤r.

So Av1,…,AvrAv1,…,Avr are a linearly independent set (because they are orthogonal and all nonzero), and clearly they are each in Col ACol⁡ A.

Finally, we just need to show that Span{Av1,…,Avr}=ColASpan⁡{Av1,…,Avr}=Col⁡A.

To do this we'll show that for any yy in Col ACol⁡ A, we can write yy in terms of {Av1,…,Avr}{Av1,…,Avr}:

Say y=Ax.y=Ax.

Because {v1,…,vn}{v1,…,vn} is a basis for RnRn, we can write x=c1v1+⋯+cnvn,x=c1v1+⋯+cnvn, so

y=Ax=c1Av1+⋯+crAvr+⋯+cnAvn.y=Ax=c1Av1+⋯+crAvr+⋯+cnAvn.
=c1Av1+⋯+crAvr.=c1Av1+⋯+crAvr.

(because Avi=0Avi=0 for i>ri>r).

In summary: {Av1,…,Avn}{Av1,…,Avn} is an (orthogonal) linearly independent set whose span is ColACol⁡A, so it is an (orthogonal) basis for ColACol⁡A.

Notice that we have also proved that rank A=dimColA=r.A=dim⁡Col⁡A=r.

In other words, if AA has rr nonzero singular values, AA has rank rr.