In [1]:
#
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
from matplotlib import animation
from importlib import reload
from datetime import datetime
from IPython.display import Image, display_html, display, Math, HTML;
qr_setting = None

mp.rcParams['animation.html'] = 'jshtml';

Announcements¶

  • HW1 is due today at 8pm

  • HW 2 is out

  • Upcoming office hours:

    • Today: Peer tutor Daniel Cho from 12:30-3:30pm in CCDS 16th floor
    • Today: Abhishek Tiwari from 4-5pm on CCDS 13th floor
  • Read Boyd-Vandenberghe Chapter 3.1-3.2 and Chapter 4

7. The K-Means Problem¶

The inner product is the sum of the componentwise product of uu and vv.

If u=⎡⎢ ⎢ ⎢ ⎢⎣u1u2⋮un⎤⎥ ⎥ ⎥ ⎥⎦u=[u1u2⋮un] and v=⎡⎢ ⎢ ⎢ ⎢⎣v1v2⋮vn⎤⎥ ⎥ ⎥ ⎥⎦,v=[v1v2⋮vn], then the inner product of uu and vv is:

uTv=[u1u2…un]⎡⎢ ⎢ ⎢ ⎢⎣v1v2⋮vn⎤⎥ ⎥ ⎥ ⎥⎦=u1v1+u2v2+⋯+unvn=n∑i=1uivi.uTv=[u1u2…un][v1v2⋮vn]=u1v1+u2v2+⋯+unvn=∑i=1nuivi.

Definition. The norm of vv is the nonnegative scalar ∥v∥‖v‖ defined by

∥v∥=√vTv= ⎷n∑i=1vi2.‖v‖=vTv=∑i=1nvi2.

We can use inner product to find the distance between two vectors.

[dist(u,v)]2=∥u−v∥2=∥u∥2+∥v∥2−2uTv[dist(u,v)]2=‖u−v‖2=‖u‖2+‖v‖2−2uTv

And because the Pythagorean Theorem says that uu and vv are perpendicular if and only if:

∥u−v∥2=∥u∥2+∥v∥2‖u−v‖2=‖u‖2+‖v‖2

We can say that two vectors uu and vv in RnRn are orthogonal to each other if uTv=0.uTv=0.

7.1 The Angle Between Two Vectors¶

There is an important connection between the inner product of two vectors and the angle between them. This connection is very useful (e.g., in visualizing data mining operations).

We start from the law of cosines:

c2=a2+b2−2abcosθc2=a2+b2−2abcos⁡θ
In [14]:
#
ax = ut.plotSetup(-6,6,-1,4,(12,6))
u = np.array([4.,1])
v = np.array([-4.,3])
pt = u + v
plt.plot([u[0],v[0]],[u[1],v[1]],'b-',lw=2)
plt.plot([0,u[0]],[0,u[1]],'b-',lw=2)
plt.plot([0,v[0]],[0,v[1]],'b-',lw=2)
m = (u-v)/2.0
mm = v + m
ax.text(mm[0]+0.25,mm[1]+0.25,r'${\bf c}$',size=16)
ax.text(2,0.15,r'${\bf a}$',size=16)
ax.text(-3,1,r'${\bf b}$',size=16)
ax.annotate('', xy=(0.2*v[0], 0.2*v[1]),  xycoords='data',
                xytext=(0.3*u[0], 0.3*u[1]), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=0.5'},
                )
ax.text(0.9,0.8,r'$\theta$',size=20)
plt.axis('off');

Now let's interpret this law in terms of vectors uu and vv.

Once again, it is the angle that these vectors make at the origin that we are concerned with.

In [15]:
#
ax = ut.plotSetup(-6,6,-1,4,(12,6))
ut.centerAxes(ax)
u = np.array([4.,1])
v = np.array([-4.,3])
pt = u + v
plt.plot([u[0],v[0]],[u[1],v[1]],'b-',lw=2)
plt.plot([0,u[0]],[0,u[1]],'b-',lw=2)
plt.plot([0,v[0]],[0,v[1]],'b-',lw=2)
ut.plotVec(ax,u)
ut.plotVec(ax,v)
m = (u-v)/2.0
mm = v + m
ax.text(mm[0]+0.25,mm[1]+0.25,r'$\Vert{\bf u-v}\Vert$',size=16)
ax.text(2,0.15,r'$\Vert{\bf u}\Vert$',size=16)
ax.text(-3,1,r'$\Vert{\bf v}\Vert$',size=16)
ax.annotate('', xy=(0.2*v[0], 0.2*v[1]),  xycoords='data',
                xytext=(0.3*u[0], 0.3*u[1]), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=0.5'},
                )
ax.text(4.1, 0.8, r'$\bf u$', size = 14)
ax.text(-4.3, 2.9, r'$\bf v$', size = 14)
ax.text(0.9,0.8,r'$\theta$',size=20);

Applying the law of cosines we get:

∥u−v∥2=∥u∥2+∥v∥2−2∥u∥∥v∥cosθ‖u−v‖2=‖u‖2+‖v‖2−2‖u‖‖v‖cos⁡θ

Now, previously we calculated that:

∥u−v∥2=(u−v)⊤(u−v)‖u−v‖2=(u−v)⊤(u−v)=∥u∥2+∥v∥2−2u⊤v=‖u‖2+‖v‖2−2u⊤v

Which means that 2u⊤v=2∥u∥∥v∥cosθ2u⊤v=2‖u‖‖v‖cos⁡θ

So

u⊤v=∥u∥∥v∥cosθu⊤v=‖u‖‖v‖cos⁡θ

This is a very important connection between the notion of inner product and trigonometry.

As a quick check, note that if uu and vv are nonzero, and u⊤v=0u⊤v=0, then cosθ=0.cos⁡θ=0.

In other words, the angle between uu and vv is 90 degrees (or 270 degrees). So this agrees with our definition of orthogonality.

One implication in particular concerns unit vectors.

u⊤v=∥u∥∥v∥cosθu⊤v=‖u‖‖v‖cos⁡θ

So

u⊤v∥u∥∥v∥=cosθu⊤v‖u‖‖v‖=cos⁡θ
u⊤∥u∥v∥v∥=cosθu⊤‖u‖v‖v‖=cos⁡θ

Note that u∥u∥u‖u‖ and v∥v∥v‖v‖ are unit vectors.

So we have the very simple rule, that for two unit vectors, their inner product is the cosine of the angle between them!

Example. Consider u=[0.950.31],u=[0.950.31], and v=[0.200.98].v=[0.200.98].

So u⊤v=(0.95⋅0.20)+(0.31⋅0.98)=0.5u⊤v=(0.95⋅0.20)+(0.31⋅0.98)=0.5

So cosθ=0.5.cos⁡θ=0.5.

So θ=60θ=60 degrees.

In [16]:
#
u = np.array([0.95, np.sin(np.arccos(0.95))])
theta = (np.pi/3)+np.arccos(0.95)
v = np.array([np.cos(theta), np.sin(theta)])
ax = ut.plotSetup(-1.3,1.3,-1.3,1.3,(12,6))
ut.centerAxes(ax)
plt.axis('equal')
angles = 2.0*np.pi * np.array(range(101))/100.0
plt.plot(np.cos(angles),np.sin(angles),'r-')
ax.arrow(0,0,u[0],u[1],head_width=0.07, head_length=0.1,length_includes_head = True)
ax.arrow(0,0,v[0],v[1],head_width=0.07, head_length=0.1,length_includes_head = True)
ax.annotate('', xy=(0.2*v[0], 0.2*v[1]),  xycoords='data',
                xytext=(0.3*u[0], 0.3*u[1]), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=0.5'})
mid = 0.4*(u+v)/2.0
ax.text(mid[0],mid[1],r'$\theta$',size=20)
ax.text(u[0]+.1,u[1]+.1,r'$(%0.2f,%0.2f)$'% (u[0],u[1]),size=16)
ax.text(v[0]+.1,v[1]+.1,r'$(%0.2f,%0.2f)$'% (v[0],v[1]),size=16);

Example. Find the angle formed by the vectors:

u=⎡⎢ ⎢ ⎢⎣13−7−2⎤⎥ ⎥ ⎥⎦andv=⎡⎢ ⎢ ⎢⎣8−246⎤⎥ ⎥ ⎥⎦u=[13−7−2]andv=[8−246]

Solution. First normalize the vectors:

∥u∥=√12+32+(−7)2+(−2)2=7.93‖u‖=12+32+(−7)2+(−2)2=7.93∥v∥=√82+(−2)2+42+62=10.95‖v‖=82+(−2)2+42+62=10.95

So

u∥u∥=⎡⎢ ⎢ ⎢⎣0.130.38−0.88−0.25⎤⎥ ⎥ ⎥⎦andv∥v∥=⎡⎢ ⎢ ⎢⎣0.73−0.180.360.54⎤⎥ ⎥ ⎥⎦u‖u‖=[0.130.38−0.88−0.25]andv‖v‖=[0.73−0.180.360.54]

Then calculate the cosine of the angle between them:

cosθ=u⊤∥u∥v∥v∥=(0.13⋅0.73)+(0.38⋅−0.18)+(−0.88⋅0.36)+(−0.25⋅0.54)=−0.44(1)(2)(3)(1)cos⁡θ=u⊤‖u‖v‖v‖(2)=(0.13⋅0.73)+(0.38⋅−0.18)+(−0.88⋅0.36)+(−0.25⋅0.54)(3)=−0.44

Then:

θ=cos−1(−0.44)=116 degrees.θ=cos−1⁡(−0.44)=116 degrees.

We can also compute this angle using Python.

In [3]:
u = np.array([1.,3,-7,-2])
v = np.array([8.,-2,4,6])

# normalize to unit vectors
uNormalized = u/np.sqrt(u.T.dot(u))  # computing the norm manually, as long as the vector is nonzero
vNormalized = v/np.linalg.norm(v)    # an easier way
print("Normalized u =", uNormalized)
print("Normalized v =", vNormalized)
Normalized u = [ 0.12598816  0.37796447 -0.8819171  -0.25197632]
Normalized v = [ 0.73029674 -0.18257419  0.36514837  0.54772256]
In [4]:
# compute inner product
innerProduct = (vNormalized).dot(uNormalized)
print("Inner product =", innerProduct)
Inner product = -0.4370415209168243
In [23]:
# find angle, convert from radians to degrees
print("Angle theta =", np.arccos(innerProduct)*180/np.pi)
Angle theta = 115.91527033906208

Example Application: Cosine Similarity¶

Let's consider a data science application of this technique. Suppose we are given two documents d1d1 and d2d2.

For a goal such as information retrieval, we'd like to know whether the two documents are "similar".

One common way to formalize similarity is to say that documents are similar if they contain similar words. Here, we will view each document as a "bag of words," and ignore where each word occurs in the document.

We can measure this as follows: Construct a vector x1x1 that counts the frequency of occurrence of certain words in d1d1:

⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣dogcarhouse…door⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦→⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣374…20⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[dogcarhouse…door]→[374…20]

Do the same thing for d2d2, yielding x2x2.

What we have done is taken individual documents and represented them as vectors x1x1 and x2x2 in a very high dimensional space, RnRn.

Here nn could be 10,000 or more.

Now, to ask whether the two documents are similar, we simply ask "do their vectors point in the same general direction?"

And, despite the fact that we are in a very high dimensional space which we cannot visualize, we can nonetheless answer the question easily.

The angle between the two document-vectors is simply:

θ1,2=cos−1[(x1∥x1∥)⊤x2∥x2∥]θ1,2=cos−1⁡[(x1‖x1‖)⊤x2‖x2‖]

7.2 The Clustering Problem¶

But what if rather than having two documents, we have a million documents? Then it's less clear what to make about the similarity or dissimilarity of any pair of documents. We want a more systematic way to understand the patterns in the overall dataset.

When we do clustering, what problem are we trying to solve? We will answer this question informally at first. (But soon we will look at formal criteria!)

Informally, a clustering is:

a grouping of data objects, such that the objects within a group are similar (or near) to one another and dissimilar (or far) from the objects in other groups.

In [5]:
#
display(Image("images/20-clustering-1.png", width=350))

So we want our clustering algorithm to:

  • minimize intra-cluster distances
  • maximize inter-cluster distances

Here are the basic questions we need to ask about clustering:

  • What is the right kind of "similarity" to use?
  • What is a "good" partition of objects?
    • ie, how is the quality of a solution measured?
  • How to find a good partition?
    • are there efficient algorithms?
    • are there algorithms that are guaranteed to find good clusters?

Now note that even with our more-formal discussion, the criteria for deciding on a "best" clustering can still be ambiguous.

In [2]:
#
display(Image("images/20-clustering-3.png", width=650))
In [3]:
#
display(Image("images/20-clustering-2.png", width=650))

To accommodate the ambiguity here, one approach is to seek a hierarchical clustering. That is, as set of nested clusters organized in a tree.

We'll discuss hierarchical cluster next time.

For today, we'll focus on partitional clustering:

in a partitional clustering, the points are divided into a set of non-overlapping groups.

In [6]:
#
display(Image("images/20-partitional-clustering.png", width=600))

In a partitional clustering:

  • Each object belongs to one, and only one, cluster
  • The set of clusters covers all the objects

We are going to assume for now that the number of clusters is given in advance.

We will denote the number of clusters as kk.

Centroid and Variance¶

To measure how close or different many vectors are in space, let's define the idea of a dataset centroid.

Given a nn different vectors x1,…,xnx1,…,xn each with the same dimension dd, the centroid is the average of the vectors taken componentwise.

¯¯¯x=1n∑ixi.x¯=1n∑ixi.

In other words: the centroid is the "center of mass" of the dataset.

Next, we define the sample variance of a dataset {x1,…,xn}{x1,…,xn} as:

Var(X)=1n∑j∥xj−¯¯¯x∥2.Var⁡(X)=1n∑j‖xj−x¯‖2.

In other words, the sample variance of the set of points is the average squared distance from each point to the centroid.

7.3 The kk-means Problem¶

Now, we are ready to state our first formalization of the clustering problem.

We will assume that

  • data items are represented by points in RdRd (i.e., each data item has dd features)
  • nn points are given
  • the number of clusters kk is given

kk-means Problem:

Find kk points c1,…,ckc1,…,ck (called centers, centroids, or means), so that the cost

n∑i=1mink∥xi−cj∥2∑i=1nmink‖xi−cj‖2

is minimized.

Equivalently: we can think in terms of the partition itself.

Consider the set X={x1,…,xn}X={x1,…,xn} where xi∈Rdxi∈Rd.

Find kk points c1,…,ckc1,…,ck

and partition the set XX into kk different subsets {X1,…,Xk}{X1,…,Xk} by assigning each point xixi to its nearst cluster center,

so that the cost

n∑i=1minj∥xi−cj∥2=k∑j=1∑x∈Xj∥x−cj∥2∑i=1nminj‖xi−cj‖2=∑j=1k∑x∈Xj‖x−cj‖2

is minimized.

We now have a formal definition of a clustering.

This is not the only definition possible, but it is an intuitive and simple one.

How hard is it to solve this problem?

  • k=1k=1 and k=nk=n are easy special cases (why?)
  • But, this problem is NP-hard if the dimension of the data is at least 2
    • We don't expect that there is any exact, efficient algorithm in general

Nonetheless, there is a simple algorithm that works quite well in practice!

7.4 The kk-means Algorithm¶

kk-means Problem: Find kk points c1,…,ckc1,…,ck (called centers, centroids, or means), so that the sum of squares of distances from every point to the closest center

n∑i=1mink∥xi−cj∥2∑i=1nmink‖xi−cj‖2

is minimized.

In [7]:
#
display(Image("images/08-rosa-gold.png", width=400))
In [8]:
#
display(Image("images/08-rosa-gold-2.png", width=400))

[Source: Wikipedia]

There is a "classic" algorithm for this problem, called the "kk-means algorithm."

It was voted among the top-10 algorithms in data mining!

It is such a good idea that it has been independently discovered multiple times.

It was first discovered by Lloyd in 1957, so it is often called Lloyd's algorithm.

In [7]:
#
display(Image("images/20-top-ten-algorithms-cover.png", width=300))

The (approximate) kk-means algorithm:

  1. Pick kk cluster centers {c1,…,ck}{c1,…,ck}. These can be chosen randomly, or by some other method.
  2. For each jj, define the cluster XjXj as the set of points in XX that are closest to center ckck.

(Nearer to ckck than to any other center.) 3. For each jj, redefine cjcj to be the center of mass of cluster XjXj.
(In other words, cjcj is the mean of the vectors in XjXj.) 4. Repeat (i.e., go to Step 2) until convergence.

Let's walk through a picture showing how this algorithm works:

In [8]:
#
display(Image("images/20-kmeans-example.png", width=600))

7.5 An example of kk-means with synthetic data¶

Let's do an extended example showing kk-means clustering in practice and in the context of the python libraries scikit-learn.

scikit-learn is a useful Python library for machine learning functions.

Our goals are to learn:

  • How clustering is used in practice
  • Tools for evaluating the quality of a clustering
  • Tools for assigning meaning or labels to a cluster
  • Important visualizations
  • A little bit about feature extraction for text

Working with synthetic data¶

Generally, when learning about or developing a new unsupervised method, it's a good idea to try it out on a dataset in which you already know the "right" answer.

One way to do that is to generate synthetic data that has some known properties.

Among other things, scikit-learn contains tools for generating synthetic data for testing.

In [4]:
import sklearn.datasets as sk_data
X, y = sk_data.make_blobs(n_samples=100, centers=3, n_features=30,
                          center_box=(-10.0, 10.0), random_state=0)

To get a sense of the raw data we can inspect it.

For statistical visualization, a good library is seaborn, imported as sns.

In [5]:
import seaborn as sns
sns.heatmap(X, xticklabels = False, yticklabels = False, linewidths = 0, cbar = False);

That plot shows all the data.

As usual, each row is a data item and the columns correspond to features (which are simply coordinates here).

Geometrically, these points live in a 30 dimensional space, so we cannot directly visualize their geometry.

So let's compute the pairwise distances for visualization purposes.

We can compute all pairwise distances in a single step using a scikit-learn function:

In [4]:
import sklearn.metrics as metrics
euclidean_dists = metrics.euclidean_distances(X)
euclidean_dists
Out[4]:
array([[ 0.        , 47.73797008, 45.18787978, ..., 47.87535624,
        49.64694402, 45.58307694],
       [47.73797008,  0.        , 43.66760596, ...,  7.3768511 ,
         7.36794305, 43.51069074],
       [45.18787978, 43.66760596,  0.        , ..., 42.55609472,
        43.80829605,  9.31642449],
       ...,
       [47.87535624,  7.3768511 , 42.55609472, ...,  0.        ,
         8.19377462, 41.81523421],
       [49.64694402,  7.36794305, 43.80829605, ...,  8.19377462,
         0.        , 43.41205895],
       [45.58307694, 43.51069074,  9.31642449, ..., 41.81523421,
        43.41205895,  0.        ]])

And we can then visualize the data is using a heatmap on the set of pairwise distances.

In [6]:
sns.heatmap(euclidean_dists, xticklabels=False, yticklabels=False, linewidths=0, 
            square=True);

Visualizing with Multidimensional Scaling¶

The idea behind Multidimensional Scaling (MDS) is:

  • given a distance (or dissimilarity) matrix,
  • find a set of coordinates for the points that approximates those distances as well as possible.

Usually we are looking for points in 2D or maybe 3D for visualization purposes.

The algorithm works by starting with random positions, and then moving points in a way that reduces the disparity between true distance and euclidean distance.

Note that there are many ways that this can fail!

  • Perhaps the dissimilarities are not well modeled as euclidean distances
  • It may be necessary to use more than 2 dimensions to capture any clustering via euclidean distances
In [5]:
import sklearn.manifold
mds = sklearn.manifold.MDS(n_components=2, max_iter=3000, eps=1e-9, random_state=0,
                   dissimilarity = "precomputed", n_jobs = 1)
fit = mds.fit(euclidean_dists)
pos = fit.embedding_
plt.scatter(pos[:, 0], pos[:, 1], s=8)
plt.axis('square');

So we can see that, although the data lives in 30 dimensions, we can get a sense of how the points are clustered by approximately placing the points into two dimensions.

Applying kk-Means¶

The Python package scikit-learn has a huge set of tools for unsupervised learning generally, and clustering specifically.

These are in sklearn.cluster.

There are 3 functions in all the clustering classes,

  • fit(),
  • predict(), and
  • fit_predict().

fit() builds the model from the training data (e.g. for kk-means, it finds the centroids),

predict() assigns labels to the data after building the model, and

fit_predict() does both in a single step.

In [7]:
from sklearn.cluster import KMeans
kmeans = KMeans(init = 'k-means++', n_clusters = 3, n_init = 100)
kmeans.fit_predict(X)
Out[7]:
array([1, 2, 0, 0, 2, 1, 2, 2, 1, 2, 0, 1, 0, 1, 2, 0, 0, 1, 0, 2, 0, 2,
       0, 1, 2, 2, 1, 0, 0, 0, 0, 1, 0, 1, 0, 2, 0, 2, 0, 2, 2, 2, 1, 0,
       1, 0, 1, 2, 1, 0, 0, 1, 1, 1, 1, 0, 1, 2, 2, 0, 1, 0, 2, 1, 1, 2,
       0, 1, 1, 2, 2, 2, 1, 1, 0, 1, 2, 1, 2, 1, 2, 2, 2, 1, 0, 0, 2, 1,
       1, 0, 1, 2, 2, 0, 1, 0, 0, 2, 2, 0], dtype=int32)

All the tools in scikit-learn are implemented as python objects.

Thus, the general sequence for using a tool from scikit-learn is:

  • Create the object, probably with some hyperparameter settings or intialization,
  • Run the method, generally by using the fit() function, and
  • Examine the results, which are generally property variables of the object.
In [8]:
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
error = kmeans.inertia_
In [9]:
print(f'The total error of the clustering is: {error:0.1f}.')
print('\nCluster labels:')
print(labels)
print('\nCluster centroids:')
print(centroids)
The total error of the clustering is: 2733.8.

Cluster labels:
[1 2 0 0 2 1 2 2 1 2 0 1 0 1 2 0 0 1 0 2 0 2 0 1 2 2 1 0 0 0 0 1 0 1 0 2 0
 2 0 2 2 2 1 0 1 0 1 2 1 0 0 1 1 1 1 0 1 2 2 0 1 0 2 1 1 2 0 1 1 2 2 2 1 1
 0 1 2 1 2 1 2 2 2 1 0 0 2 1 1 0 1 2 2 0 1 0 0 2 2 0]

Cluster centroids:
[[-4.7833887   5.32946939 -0.87141823  1.38900567 -9.59956915  2.35207348
   2.22988468  2.03394692  8.9797878   3.67857655 -2.67618716 -1.17595897
   3.76433199 -8.46317271  3.28114395  3.73803392 -5.73436869 -7.0844462
  -3.75643598 -3.07904369  1.36974653 -0.95918462  9.91135428 -8.17722281
  -5.8656831  -6.76869078  3.12196673 -4.85745245 -0.70449349 -4.94582258]
 [ 0.88697885  4.29142902  1.93200132  1.10877989 -1.55994342  2.80616392
  -1.11495818  7.74595341  8.92512875 -2.29656298  6.09588722  0.47062896
   1.36408008  8.63168509 -8.54512921 -8.59161818 -9.64308952  6.92270491
   5.65321496  7.29061444  9.58822315  5.79602014 -0.84970449  5.46127493
  -7.77730238  2.75092191 -7.17026663  9.07475984  0.04245798 -1.98719465]
 [-7.0489904  -7.92501873  2.89710462 -7.17088692 -6.01151677 -2.66405834
   6.43970052 -8.20341647  6.54146052 -7.92978843  9.56983319 -0.86327902
   9.25897119  1.73061823  4.84528928 -9.26418246 -4.54021612 -7.47784575
  -4.15060719 -7.85665458 -3.76688414 -1.6692291  -8.78048843  3.78904162
   1.24247168 -4.73618733  0.27327032 -7.93180624  1.59974866  8.78601576]]

Visualizing the Results of Clustering¶

Let's visualize the results. We'll do that by reordering the data items according to their cluster.

In [10]:
idx = np.argsort(labels)
rX = X[idx, :]
sns.heatmap(rX, xticklabels = False, yticklabels = False, linewidths = 0);
In [11]:
rearranged_dists = euclidean_dists[idx,:][:,idx]
sns.heatmap(rearranged_dists, xticklabels = False, yticklabels = False, linewidths = 0, square = True);