#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None
mp.rcParams['animation.html'] = 'jshtml';
import warnings # ignore warning for Fig 2.8
warnings.filterwarnings('ignore')
HW1 is out on Piazza, due Friday, Feb 3 at 8pm
HW 2 will be released tomorrow
Upcoming office hours:
Read Boyd-Vandenberghe Chapter 3.1-3.2 and Chapter 4
This week, we will talk about clustering.
One concept we need before we can deicde how to group objects is a way to compare them.
#
display(Image("images/20-clustering-1.png", width=350))
Clustering is a very important way of discovering structure in data.
It is so important because it is common for data to show clusters.
We can often simplify or compress our data if we recognize the existence of clusters.
Further, we can often interpret clusters by assigning them labels.
However, note that these categories or "labels" are assigned after the fact.
And, we may not be able to interpret clusters or assign them labels in some cases.
That is, clustering represents an example of unsupervised learning.
Supervised methods: Data items have labels, and we want to learn a function that correctly assigns labels to new data items.
Unsupervised methods: Data items do not have labels, and we want to learn a function that extracts important patterns from the data.
Applications of Clustering:
Working with data, we can encounter a wide variety of different data objects:
We are naturally interested in how similar or dissimilar two objects are.
A dissimilarity function takes two objects as input, and returns a large value when the two objects are not very similar.
Often we put restrictions on the dissimilarity function.
One of the most common is that it be a metric.
The dissimilarity $d(x, y)$ between two objects $x$ and $y$ is a metric if
# Source: https://commons.wikimedia.org/w/index.php?curid=26047092
display(Image("images/20-TriangleInequality.png", width=350))
A metric is also commonly called a distance. The notion of distance we will use today is Euclidean distance. But there are other choices.
Sometimes we will use "distance" informally, ie, to refer to a dissimilarity function even if we are not sure it is a metric.
We'll try to say "dissimilarity" in those cases though.
Example. We could measure distance between two vectors $\mathbf{x}$ and $\mathbf{y}$ as the cosine of the angle between them. We will see later today that this can be computed using an inner product:
$$ \cos(\mathbf{x}, \mathbf{y}) = \frac{\mathbf{x}^T\mathbf{y}}{\Vert\mathbf{x}\Vert \Vert\mathbf{y}\Vert} $$Note that this value is large when $\mathbf{x} \approx \mathbf{y}.$ So it is a similarity function.
We often find that we have a similarity function $s$ and need to convert it to a dissimilarity function $d$.
Two straightforward ways of doing that are:
$$d(x,y) = 1\,/\,s(x,y)$$or
$$d(x,y) = k - s(x,y)$$...for some properly chosen $k$.
For cosine similarity, one often uses:
$$ d(\mathbf{x}, \mathbf{y}) = 1 - \cos(\mathbf{x}, \mathbf{y})$$Note that this is not a metric! It doesn't obey the triangle inequality.
However, if we recover the actual angle beween $\mathbf{x}$ and $\mathbf{y}$, that is a metric.
Our plan is to first discover many ways to measure distances and angles between vectors. Then we'll apply these techniques to design a clustering algorithm.
Let's consider some geometric questions involving vector lengths and angles.
The concepts we deal with should be familiar to you in $\mathbb{R}^2$. Our goal is to cast these ideas into greater generality.
In particular, we will take familiar notions and reformulate them in terms of vectors in arbitrary dimension.
Example. Let's start with a simple example in $\mathbb{R}^2$. How would your determine the angle $\theta$?
#
ax = ut.plotSetup(-6,6,-1,4,(12,6))
u = np.array([4.,1])
v = np.array([-4.,3])
pt = u + v
plt.plot([0,u[0]],[0,u[1]],'b-',lw=2)
plt.plot([0,v[0]],[0,v[1]],'b-',lw=2)
plt.plot(u[0], u[1], 'bo')
plt.plot(v[0], v[1], 'bo')
m = (u-v)/2.0
mm = v + m
ax.text(u[0]+.1, u[1]-.2 ,r'${\bf a}$',size=16)
ax.text(v[0]+.1, v[1]+.1, r'${\bf b}$',size=16)
ax.text(0-.1, 0-.3, r'${\bf 0}$',size=16)
ax.annotate('', xy=(0.2*v[0], 0.2*v[1]), xycoords='data',
xytext=(0.3*u[0], 0.3*u[1]), textcoords='data',
size=15,
#bbox=dict(boxstyle="round", fc="0.8"),
arrowprops={'arrowstyle': 'simple',
'fc': '0',
'ec': 'none',
'connectionstyle' : 'arc3,rad=0.5'},
)
ax.text(0.9,0.8,r'$\theta$',size=20);
There are many ways we could find this angle. One option is to take out a protractor and use it to measure $\theta$.
Example. Let's go up one dimension, to $\mathbb{R}^3$. How would you find this angle $\theta$?
#
fig = ut.three_d_figure((7, 1), fig_desc = 'An Angle in R3',
xmin = -10, xmax = 10, ymin = -10, ymax = 10, zmin = -10, zmax = 10,
figsize = (8, 8), qr = qr_setting)
x = np.array([7, -7])
y = np.array([7, 7])
z = np.array([7, 7])
arc_x = 1/5 * x
arc_y = 1/5 * y
arc_z = 1/5 * z
#
pts = np.column_stack([x, y, z])
arc_pts = np.column_stack([arc_x, arc_y, arc_z])
origin = np.array([0, 0, 0])
#
# linearly interpolate a set of points between arc[0] and arc[1]
def three_d_arc(pt1, pt2, nsteps):
len0 = np.linalg.norm(pt1)
len1 = np.linalg.norm(pt2)
interp_pts = np.linspace(pt1, pt2, nsteps)
interp_len = np.linspace(len0, len1, nsteps)
interp_pts = np.array([x / (np.linalg.norm(x)/l) for x, l in zip(interp_pts, interp_len)])
return interp_pts
nsteps = 30
arc_pts = three_d_arc(arc_pts[0], arc_pts[1], nsteps)
for p1, p2 in zip(arc_pts[:-1], arc_pts[1:]):
fig.plotLine(np.array([p1, p2]), 'k')
#
fig.plotLine(np.array([origin, pts[0]]), 'b')
fig.plotPoint(x[0], y[0], z[0], 'b')
fig.text(x[0]+.2, y[0]+.2, z[0]+.2, r'${\bf a}$', 'e1', size=16)
#
fig.plotLine(np.array([origin, pts[1]]), 'b')
fig.plotPoint(x[1], y[1], z[1], 'b')
fig.text(x[1]+.2, y[1]+.2, z[1]+.2, r'${\bf b}$', 'e2', size=16)
#
mid = nsteps//2
fig.text(arc_pts[mid, 0]+.2, arc_pts[mid, 1]+.2, arc_pts[mid, 2]+.2, r'$\theta$', 'theta', size = 20)
#
fig.text(0-.5, 0, 0-2, r'$\bf 0$', '0', size = 16)
#
fig.set_title(r'An Angle in $\mathbb{R}^3$', size=14);
fig.save();
This seems a little more challenging, but we could probably do it.
If we go up one more dimension, to $\mathbb{R}^4$, then it's more challenging to solve this problem visually.
What we need are ways to capture simple notions in spaces of arbitrary dimension $\mathbb{R}^n$:
Interestingly, it turns out that these notions all depend on one key notion: the inner product.
In fact, the notion is so important that we refer to a vector space for which there is an inner product as an inner product space.
Recall that we consider column vectors such as $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{R}^n$ to be $n\times1$ matrices.
Then $\mathbf{u}^T\mathbf{v}$ is a scalar, called the inner product of $\mathbf{u}$ and $\mathbf{v}.$
You will also see this called the dot product. It sometimes written as $\mathbf{u} \mathbf{\cdot} \mathbf{v}$.
The inner product is the sum of the componentwise product of $\mathbf{u}$ and $\mathbf{v}$.
If $\mathbf{u} = \begin{bmatrix}u_1\\u_2\\\vdots\\u_n\end{bmatrix}$ and $\mathbf{v} = \begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix},$ then the inner product of $\mathbf{u}$ and $\mathbf{v}$ is:
$$\mathbf{u}^T\mathbf{v} = \begin{bmatrix}u_1&u_2&\dots&u_n\end{bmatrix}\begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix} = u_1v_1 + u_2v_2 + \dots + u_nv_n = \sum_{i=1}^n u_iv_i.$$Let's consider the properties of the inner product:
Theorem. Let $\mathbf{u}$,$\mathbf{v}$, and $\mathbf{w}$ be vectors in $\mathbb{R}^n$, and let $c$ be a scalar. Then:
$\mathbf{u}^T\mathbf{v} = \mathbf{v}^T\mathbf{u}$
Inner product is symmetric. Note that these two expressions are the transpose of each other -- but of course the transpose of a scalar is itself!
$(\mathbf{u}+\mathbf{v})^T\mathbf{w} = \mathbf{u}^T\mathbf{w} + \mathbf{v}^T\mathbf{w}$
$(c\mathbf{u})^T\mathbf{v} = c(\mathbf{u}^T\mathbf{v}) = \mathbf{u}^T(c\mathbf{v})$
Inner product is linear in each term.
$\mathbf{u}^T\mathbf{u} \geq 0,\;\;\;\mbox{and}\;\mathbf{u}^T\mathbf{u} = 0\;\mbox{if and only if}\;\mathbf{u} = 0$
Inner product of a vector with itself is never negative.
Armed with the inner product, let's start talking about geometry.
Our first question will be: How do we measure the length of a vector?
Let's say we are in $\mathbb{R}^2$. Then the length follows directly from the Pythagorean theorem:
#
ax = ut.plotSetup(-1,6,-1,4,(12,8))
ut.centerAxes(ax)
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([pt[0],pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([0,pt[0]],[0,0],'b-',lw=2)
perpline1, perpline2 = ut.perp_sym(np.array([4, 0]), np.array([4, 3]), np.array([0, 0]), 0.2)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1)
ut.plotVec(ax,pt)
ax.text(2,-0.5,r'$|v_1|$',size=16)
ax.text(4.15,1.4,r'$|v_2|$',size=16)
ax.text(1.25,1.75,r'$\sqrt{v_1^2+v_2^2}$',size=16)
ax.text(pt[0]+0.1,pt[1]+0.1,r'$\mathbf{v} = (v_1, v_2)$',size=20);
What happens when we move to $\mathbb{R}^3$?
It turns out we need to apply the Pythagorean Theorem an additional time:
#
fig = ut.three_d_figure((7,2), fig_desc = 'Pythagorean Theorem in R3',
xmin = 0, xmax = 5, ymin = 0, ymax = 7, zmin = 0, zmax = 9,
figsize = (12, 8), qr = qr_setting)
f = 1.25
v = [4*f,4*f,6*f]
fig.plotCube(v)
fig.text(v[0]+.1, v[1]+.1 ,v[2]+.1, r'${\bf v} = (v_1, v_2, v_3)$', 'v', size=20)
fig.plotLine([[0, 0, 0], [v[0], v[1], 0]], 'g', '--')
fig.plotLine([[0, 0, 0], [v[0], v[1], v[2]]], 'r', '--')
fig.plotPoint(v[0], v[1], v[2], 'r')
fig.plotPoint(0, 0, 0, 'k')
#
fig.plotPerpSym(np.array([v[0], v[1], 0]), np.array([v[0], v[1], v[2]]), np.array([0, 0, 0]), 1)
#
fig.text(-1.25, 3, 4, r'$\bf\sqrt{v_1^2+v_2^2+v_3^2}$', 'sqrt(v1**2 + v2**2 + v3**2)', size=16)
fig.text(3, 0, 0, r'$\bf\sqrt{v_1^2+v_2^2}$', 'sqrt(v1**2 + v2**2)', size=16)
fig.ax.view_init(azim=290, elev = 15)
fig.set_title(r'Pythagorean Theorem in $\mathbb{R}^3$', 'Pythagorean Theorem in R3', size = 16)
fig.save()
Generalizing to arbitrary dimensions, the length of a vector in $\mathbb{R}^n$ is
$$ \sqrt{v_1^2 + v_2^2 + \dots + v_n^2} = \sqrt{\sum_{i=1}^n{v_i}^2}$$Now let's express this in a way that does not require writing the individual components of $\mathbf{v}$.
The above expression for the length of $\mathbf{v}$ is the same as:
$$\sqrt{\mathbf{v}^T\mathbf{v}}.$$Notice that this is always defined because $\mathbf{v}^T\mathbf{v}$ is nonnegative.
Length is such a fundamental concept that we introduce a special notation and name for it.
Definition. The norm of $\mathbf{v}$ is the nonnegative scalar $\Vert\mathbf{v}\Vert$ defined by
$$\Vert\mathbf{v}\Vert = \sqrt{\mathbf{v}^T\mathbf{v}} = \sqrt{\sum_{i=1}^n{v_i}^2}.$$For any scalar $c$, the length of $c\mathbf{v}$ is $|c|$ times the length of $\mathbf{v}$. That is,
$$\Vert c\mathbf{v}\Vert = \vert c\vert\Vert\mathbf{v}\Vert.$$So, for example, $\Vert(-2)\mathbf{v}\Vert = \Vert 2\mathbf{v}\Vert$.
A vector of length 1 is called a unit vector.
If we divide a nonzero vector $\mathbf{v}$ by its length -- that is, multiply by $1/\Vert\mathbf{v}\Vert$ -- we obtain a unit vector $\mathbf{u}$.
We say that we have normalized $\mathbf{v}$, and that $\mathbf{u}$ is in the same direction as $\mathbf{v}.$
Example. Let $\mathbf{v} = \begin{bmatrix}1\\-2\\2\\0\end{bmatrix}.$ Find the unit vector $\mathbf{u}$ in the same direction as $\mathbf{v}.$
Solution. First, compute the length of $\mathbf{v}$:
$$\Vert\mathbf{v}\Vert^2 = \mathbf{v}^T\mathbf{v} = (1)^2 + (-2)^2 + (2)^2 + (0)^2 = 9$$$$\Vert\mathbf{v}\Vert = \sqrt{9} = 3$$Then multiply $\mathbf{v}$ by $1/\Vert\mathbf{v}\Vert$ to obtain
$$\mathbf{u} = \frac{1}{\Vert\mathbf{v}\Vert}\mathbf{v} = \frac{1}{3}\mathbf{v} = \frac{1}{3}\begin{bmatrix}1\\-2\\2\\0\end{bmatrix} = \begin{bmatrix}1/3\\-2/3\\2/3\\0\end{bmatrix}$$It's important to note that we can't actually visualize $\mathbf{u}$ but we can still reason geometrically about it as a unit vector.
For example, we can talk about (2D) circles, (3D) spheres, four-dimensional spheres, five-dimensional spheres, etc.
It's very useful to be able to talk about the distance between two points (or vectors) in $\mathbb{R}^n$.
We can start from basics. On the number line (ie, $\mathbb{R}^1$), the distance between two points $a$ and $b$ is $\vert a-b\vert$.
#
ax = ut.plotSetup(-1,10,-1,1,(10,2))
ut.centerAxes(ax)
from matplotlib.lines import Line2D
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)
xmin, xmax = ax.get_xaxis().get_view_interval()
ymin, ymax = ax.get_yaxis().get_view_interval()
ax.add_artist(Line2D((xmin, xmax), (0, 0), color='black', linewidth=2))
ax.set_xticks([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
pt1 = [1, 0]
pt2 = [7, 0]
plt.plot(pt1[0], pt1[1],'bo')
plt.plot(pt2[0], pt2[1],'bo')
ax.text(pt1[0]-0.1,0+.2,r'$a$',size=20);
ax.text(pt2[0]-0.1,0+.2,r'$b$',size=20);
The same is true in $\mathbb{R}^n$.
Definition. For $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{R}^n,$ the distance between $\mathbf{u}$ and $\mathbf{v}$, written as $\mbox{dist}(\mathbf{u},\mathbf{v}),$ is the length of the vector $\mathbf{u}-\mathbf{v}$. That is,
$$\mbox{dist}(\mathbf{u},\mathbf{v}) = \Vert\mathbf{u}-\mathbf{v}\Vert.$$This definition agrees with the usual formulas for the Euclidean distance between two points. The usual formula is
$$\mbox{dist}(\mathbf{u},\mathbf{v}) = \sqrt{(v_1-u_1)^2 + (v_2-u_2)^2 + \dots + (v_n-u_n)^2}.$$Which you can see is equal to
$$\Vert\mathbf{u}-\mathbf{v}\Vert = \sqrt{(\mathbf{u}-\mathbf{v})^T(\mathbf{u}-\mathbf{v})} = \sqrt{\begin{bmatrix}u_1-v_1&u_2-v_2&\dots&u_n-v_n\end{bmatrix}\begin{bmatrix}u_1-v_1\\u_2-v_2\\\vdots\\u_n-v_n\end{bmatrix}}$$There is a geometric view as well.
Consider the vectors $\mathbf{u} = \begin{bmatrix}7\\1\end{bmatrix}$ and $\mathbf{v} = \begin{bmatrix}3\\2\end{bmatrix}$ in $\mathbb{R}^2$.
Then one can see that the distance from $\mathbf{u}$ to $\mathbf{v}$ is the same as the length of the vector $\mathbf{u}-\mathbf{v}$.
That is: the distance between two vectors is the length of their difference.
#
ax = ut.plotSetup(-1.5,8,-1.5,4)
ut.centerAxes(ax)
u = np.array([7., 1])
v = np.array([3., 2])
plt.plot([0,v[0]],[0,v[1]],'b-',lw=2)
ax.text(v[0]+0.25,v[1]+0.25,r'$\bf v$',size=16)
plt.plot([v[0],u[0]],[v[1],u[1]],'r-',lw=2)
ax.text(u[0]+0.25,u[1]+0.25,r'$\bf u$',size=16)
plt.plot([u[0],u[0]-v[0]],[u[1],u[1]-v[1]],'b-',lw=2)
ax.text(u[0]-v[0]-0.5,u[1]-v[1]-0.5,r'$\bf u-v$',size=16)
plt.plot([u[0]-v[0],0],[u[1]-v[1],0],'r-',lw=2)
ut.plotVec(ax,u-v)
ut.plotVec(ax,u)
ut.plotVec(ax,v)
m = (u-v)/2.0
mm = v + m
ax.text(m[0]-1.5,m[1]-0.5,r'$\Vert{\bf u-v}\Vert$',size=16)
ax.text(mm[0]+0.25,mm[1]+0.25,r'$\Vert{\bf u-v}\Vert$',size=16);
Now we turn to another familiar notion from 2D geometry, which we'll generalize to $\mathbb{R}^n$: the notion of being perpendicular.
Once again, we seek a way to express perpendicularity of two vectors, regardless of the dimension they live in.
We say that two vectors ${\bf u}$ and ${\bf v}$ are perpendicular if and only if they form a right angle at the origin.
#
ax = ut.plotSetup(-1,4,-2,3)
ax.set_aspect('equal')
# ut.noAxes(ax)
u = np.array([2, 2])
v = np.array([1, -1])
# v
ut.plotLinEqn(u[0], u[1], 0, '-', 'b')
ut.plotLinEqn(v[0], v[1], 0, '-', 'g')
ax.text(v[0]-0.5,v[1]-0.5,r'$\bf v$',size=16)
ut.plotVec(ax,v,'b')
# u
ut.plotVec(ax,u,'g')
ax.text(u[0]+0.25,u[1],r'$\bf u$',size=16)
# origin
ut.plotVec(ax, np.array([0, 0]), 'k')
ax.text(0-0.1, 0-0.5, r'$\bf 0$', size=16);
# symbol
perpline1, perpline2 = ut.perp_sym(np.array([0, 0]), u, v, 0.4)
plt.plot(perpline1[0], perpline1[1], 'k', lw = 1)
plt.plot(perpline2[0], perpline2[1], 'k', lw = 1);
Draw the line connecting $\mathbf{u}$ and $\mathbf{v}$. Then $\mathbf{u}$ and $\mathbf{v}$ are perpendicular if and only if this is a right triangle, meaning that the Pythagorean Theorem is satisified.
#
ax = ut.plotSetup(-1,4,-2,3)
ax.set_aspect('equal')
# ut.noAxes(ax)
u = np.array([2, 2])
v = np.array([1, -1])
plt.plot([v[0],u[0]],[v[1],u[1]],'r-',lw=2)
# v
ut.plotLinEqn(u[0], u[1], 0, '-', 'b')
ut.plotLinEqn(v[0], v[1], 0, '-', 'g')
ax.text(v[0]-0.5,v[1]-0.5,r'$\bf v$',size=16)
ut.plotVec(ax,v,'b')
# u
ut.plotVec(ax,u,'g')
ax.text(u[0]+0.25,u[1],r'$\bf u$',size=16)
# origin
ut.plotVec(ax, np.array([0, 0]), 'k')
ax.text(0-0.1, 0-0.5, r'$\bf 0$', size=16);
What is the length of the red side of the triangle?
According to the definitions we've developed today, it is $\Vert \mathbf{u} - \mathbf{v} \Vert$.
#
ax = ut.plotSetup(-1,4,-2,3)
ax.set_aspect('equal')
# ut.noAxes(ax)
u = np.array([2, 2])
v = np.array([1, -1])
plt.plot([v[0],u[0]],[v[1],u[1]],'r-',lw=2)
# v
ut.plotLinEqn(u[0], u[1], 0, '-', 'b')
ut.plotLinEqn(v[0], v[1], 0, '-', 'g')
ax.text(v[0]/2-0.5,v[1]/2-0.5,r'$\Vert\bf v\Vert$',size=16)
ut.plotVec(ax,v,'b')
# u
ut.plotVec(ax,u,'g')
ax.text(u[0]/2-.75,u[1]/2,r'$\Vert\bf u\Vert$',size=16)
# origin
ut.plotVec(ax, np.array([0, 0]), 'k')
#
mm = (u + v) / 2
ax.text(mm[0]+0.3,mm[1],r'$\Vert\bf u - v\Vert$',size=16);
So the blue and green lines are perpendicular if and only if:
$$ \Vert \mathbf{u} - \mathbf{v} \Vert^2 = \Vert \mathbf{u} \Vert^2 + \Vert \mathbf{v} \Vert^2$$Let's see what this implies from an algebraic standpoint.
First let's simplify the expression for squared distance from $\mathbf{u}$ to $\mathbf{v}$:
$$[\mbox{dist}(\mathbf{u},\mathbf{v})]^2 = \Vert\mathbf{u}-\mathbf{v}\Vert^2$$where the last equation follows from the fact that inner product is symmetric, ie, $\mathbf{u}^T\mathbf{v} = \mathbf{v}^T\mathbf{u}$.
Now, the Pythagorean Theorem says that $\mathbf{u}$ and $ \mathbf{v} $ are perpendicular if and only if:
$$ \Vert \mathbf{u} - \mathbf{v} \Vert^2 = \Vert \mathbf{u} \Vert^2 + \Vert \mathbf{v} \Vert^2$$But we've seen that this means:
$$ \Vert\mathbf{u}\Vert^2 + \Vert\mathbf{v}\Vert^2 - 2\mathbf{u}^T\mathbf{v}= \Vert \mathbf{u} \Vert^2 + \Vert \mathbf{v} \Vert^2$$So now we can define perpendicularity in $\mathbb{R}^n$:
Definition. Two vectors $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{R}^n$ are orthogonal to each other if $\mathbf{u}^T\mathbf{v} = 0.$
As you can see, we have introduced a new term for this notion: orthogonal. So when we are referring to vectors, orthogonal means the same thing as perpendicular.