In [1]:
# for QR codes use inline
%matplotlib inline
qr_setting = 'url'
#
# for lecture use notebook
# %matplotlib notebook
# qr_setting = None
#
%config InlineBackend.figure_format='retina'
# import libraries
import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
import pandas as pd
from importlib import reload
from datetime import datetime
from matplotlib import animation
from IPython.display import Image
from IPython.display import display_html
from IPython.display import display
from IPython.display import Math
from IPython.display import Latex
from IPython.display import HTML;
from collections import Counter
import string
import random
import itertools

Announcements¶

  • Homework:
    • Homework 9 out, due Sunday
  • Final exam: Monday, May 8 from 12-2pm
  • Upcoming office hours:
    • Today: Prof McDonald from 4:30-6pm in CCDS 1341
    • Tomorrow: Peer tutor Rohan Anand from 1:30-3pm in CCDS 16th floor

Lecture 36: Applications of Markov Chains¶

Recap: Eigenvectors¶

Of all of the linear transformations associated with a square matrix, scaling is special because:

If a matrix AA scales xx, then that transformation could also have been expressed without a matrix-vector multiplication, i.e., as λxλx for some scalar value λλ.

An eigenvector of a matrix AA is a special vector that does not change its direction when multiplied by AA.

The eigenvalues of a matrix AA are all of the scalars that an eigenvector can be scaled by.

In [2]:
#
ax = ut.plotSetup(size=(12,8))
ut.centerAxes(ax)
A = np.array([[3,-2],[1,0]])
u = np.array([-1,1])
v = np.array([2,1])
#
ut.plotArrowVec(ax, v, [0,0], color='Red')
ut.plotArrowVec(ax, A.dot(v), [0,0], color='Red')
ax.text(v[0],v[1]+0.2,r'${\bf v}$',size=20)
ax.text(A.dot(v)[0],A.dot(v)[1]+0.2,r'$A{\bf v}$',size=20)
#
ut.plotArrowVec(ax, u, [0,0], color='Blue')
ut.plotArrowVec(ax, A.dot(u), [0,0], color='Blue')
ax.text(u[0]-0.5,u[1]+0.1,r'${\bf u}$',size=20)
ax.text(A.dot(u)[0]-0.7,A.dot(u)[1]+0.3,r'$A{\bf u}$',size=20);

To find an eigenvalue: we saw that λλ is an eigenvalue of an n×nn×n matrix AA if and only if the equation

(A−λI)x=0(A−λI)x=0

has a nontrivial solution.

Some special cases:

  • The eigenvalues of an upper triangular matrix or a lower triangular matrix are the entries on its main diagonal.
  • A matrix has an eigenvalue of 0 if and only if it is not invertible (i.e., is singular).

Combining these properties leads to the characteristic equation: a scalar λλ is an eigenvalue of an n×nn×n matrix AA if and only if λλ satisfies

det(A−λI)=0.det(A−λI)=0.

This equation yields a polynomial in λλ of degree nn, meaning there are at most nn solutions.

To find all eigenvectors corresponding to an eigenvalue: compute the null space of the matrix A−λI.A−λI.

(Remember that this forms a subspace, which we call the eigenspace of AA corresponding to λλ.)

Recap: Markov chains¶

A difference equation (also known as a dynamical system or recurrence relation) contains two parts:

  • we define some vector that describes the state of the system, and
  • we formulate a rule that tells us how to compute the next state of the system based on the current state of the system.

A Markov chain is a dynamical system whose state is a probability vector xkxk and whose state transitions are determined by a stochastic matrix P∈Rn×nP∈Rn×n, meaning that that

xk+1=Pxkfork=0,1,2,...xk+1=Pxkfork=0,1,2,...

Here:

  • A probability vector is a vector of nonnegative entries that sums to 1.
  • A stochastic matrix is a square matrix of nonnegative values whose columns each sum to 1.

A important question about a Markov Chain is: what will happen in the distant future?

If PP is a stochastic matrix, then a steady-state vector (or equilibrium vector) for PP is a probability vector xx such that:

Px=x.Px=x.
In [3]:
ax = plt.subplot(131)
plt.plot(range(15),xs.T[0],'o-')
ax.set_ylim([0,1])
plt.title(r'$x_1$',size=24)
ax = plt.subplot(132)
plt.plot(range(15),xs.T[1],'o-')
ax.set_ylim([0,1])
plt.title(r'$x_2$',size=24)
ax = plt.subplot(133)
plt.plot(range(15),xs.T[2],'o-')
ax.set_ylim([0,1])
plt.title(r'$x_3$',size=24)
plt.tight_layout()

We can find a steady state vector by solving the equation Px=xPx=x, or equivalently

(P−I)x=0(P−I)x=0

and restricting to vectors xx that are probability vectors.

To solve a Markov Chain for its steady state:

  • Solve the linear system (P−I)x=0.(P−I)x=0.
  • The system will have an infinite number of solutions, with one free variable. Obtain a general solution.
  • Pick any specific solution (choose any value for the free variable), and normalize it so the entries add up to 1.

36.1 Markov Chains and Eigenvectors¶

We can place the theory of Markov Chains into the broader context of eigenvalues and eigenvectors.

Theorem. The largest eigenvalue of a Markov Chain is 1.

Proof. First of all, 1 is an eigenvalue of a Markov chain and its corresponding vector is the steady state vector vv. This is because Av=1⋅v.Av=1⋅v.

To prove that 1 is the largest eigenvalue, recall that each column of a Markov Chain sums to 1.

Then, consider the sum of the values in the vector AxAx.

Ax=⎡⎢ ⎢⎣a11…a1n⋮⋱⋮an1…ann⎤⎥ ⎥⎦⎡⎢ ⎢⎣x1⋮xn⎤⎥ ⎥⎦=⎡⎢ ⎢⎣a11x1+⋯+a1nxn⋮an1x1+⋯+annxn⎤⎥ ⎥⎦.Ax=[a11…a1n⋮⋱⋮an1…ann][x1⋮xn]=[a11x1+⋯+a1nxn⋮an1x1+⋯+annxn].

Let's just sum the first terms in each component of AxAx:

a11x1+a21x1+⋯+an1x1=x1∑iai1=x1.a11x1+a21x1+⋯+an1x1=x1∑iai1=x1.

So we can see that the sum of all terms in AxAx is equal to x1+x2+⋯+xnx1+x2+⋯+xn -- i.e., the sum of all terms in xx.

So there can be no λ>1λ>1 such that Ax=λx.Ax=λx.

A complete solution for the evolution of a Markov Chain¶

Previously, we were only able to ask about the "eventual" steady state of a Markov Chain.

But a crucial question is: how long does it take for a particular Markov Chain to reach steady state from some initial starting condition?

Example. Let's use our running example of population movement, defined by A=[0.950.030.050.97].A=[0.950.030.050.97].

Let's ask how long until it reaches steady state, from the starting point defined as x0=[0.60.4].x0=[0.60.4].

Remember that x0x0 is probability vector -- its entries are nonnegative and sum to 1.

In [3]:
#
ax = ut.plotSetup(-0.1,1.7,-0.1,1.2)
ut.centerAxes(ax)
A = np.array([[0.95,0.03],[0.05,0.97]])
v1 = np.array([0.375,0.625])
v2 = np.array([0.225,-0.225])
x0 = v1 + v2
#
ax.plot([1,0],[0,1],'b--')
ax.plot(x0[0],x0[1],'bo')
ax.text(x0[0]+0.02,x0[1]+0.02,r'${\bf x_0}$',size=16)
#ax.text(A.dot(x0)[0]+0.2,A.dot(x0)[1]+0.2,r'$A{\bf x_0}$',size=16)
# ax.plot([-10,10],[5*10/6.0,-5*10/6.0],'b-')
#
ax.annotate('Initial State', xy=(x0[0], x0[1]),  xycoords='data',
                xytext=(0.4, 0.8), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.5', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                );

The matrix AA has the characteristic equation:

λ2−1.92λ+0.92λ2−1.92λ+0.92

Using the quadratic formula, we find the roots of this equation to be 1 and 0.92. (Note that, as expected, 1 is the largest eigenvalue.)

Next, we can find a basis for each eigenspace of AA (each nullspace of A−λIA−λI).

  • For λ=1λ=1, a corresponding eigenvector is v1=[3/85/8].v1=[3/85/8]. This is a steady state vector! (Note that I purposely normalized v1v1 to be a probability vector.)
  • For λ=0.92λ=0.92, a corresponding eigenvector is v2=[1−1].v2=[1−1].

Next, we write the initial state x0=[0.60.4]x0=[0.60.4] as a linear combination of v1v1 and v2.v2. This can be done because {v1,v2}{v1,v2} is obviously a basis for R2.R2.

To write x0x0 this way, we want to solve the vector equation

c1v1+c2v2=x0c1v1+c2v2=x0

In other words:

[v1v2][c1c2]=x0.[v1v2][c1c2]=x0.

The matrix [v1v2][v1v2] is invertible, so,

[c1c2]=[v1v2]−1x0=[3/815/8−1]−1[0.60.4]=[10.225].[c1c2]=[v1v2]−1x0=[3/815/8−1]−1[0.60.4]=[10.225].

So, now we can put it all together.

We know that

x0=c1v1+c2v2x0=c1v1+c2v2

and we know the values of c1,c2c1,c2 that make this true.

And then we can use the eigendecomposition to compute subsequent terms xkxk:

x1=Ax0=c1Av1+c2Av2=c1v1+c2(0.92)v2.(1)(2)(3)(1)x1=Ax0(2)=c1Av1+c2Av2(3)=c1v1+c2(0.92)v2.

Now note the power of the eigenvalue approach:

x2=Ax1=c1v1+c2(0.92)2v2.(4)(5)(4)x2=Ax1(5)=c1v1+c2(0.92)2v2.

And so in general: xk=c1v1+c2(0.92)kv2(k=0,1,2,…)xk=c1v1+c2(0.92)kv2(k=0,1,2,…)

And using the c1c1 and c2c2 and v1,v1, v2v2 we computed above:

xk=[3/85/8]+0.225(0.92)k[1−1](k=0,1,2,…)xk=[3/85/8]+0.225(0.92)k[1−1](k=0,1,2,…)

This explicit formula for xkxk gives the solution of the Markov Chain xk+1=Axkxk+1=Axk starting from the initial state x0x0.

In other words: x0=[3/85/8]+0.225[1−1]x0=[3/85/8]+0.225[1−1]
x1=[3/85/8]+0.207[1−1]x1=[3/85/8]+0.207[1−1]
x2=[3/85/8]+0.190[1−1]x2=[3/85/8]+0.190[1−1]
x3=[3/85/8]+0.175[1−1]x3=[3/85/8]+0.175[1−1]
......
x∞=[3/85/8]x∞=[3/85/8]

As k→∞k→∞, (0.92)k→0(0.92)k→0. Thus the Markov chain converges to the steady state xk→[3/85/8].xk→[3/85/8].

In [4]:
#
ax = ut.plotSetup(-0.1,1.7,-0.1,1.2)
ut.centerAxes(ax)
A = np.array([[0.95,0.03],[0.05,0.97]])
v1 = np.array([0.375,0.625])
v2 = np.array([0.225,-0.225])
x0 = v1 + v2
#
ax.plot([1,0],[0,1],'b--')
ax.text(v1[0]+0.02,v1[1]+0.02,r'${\bf v_1}$',size=16)
ax.plot(x0[0],x0[1],'bo')
v = np.zeros((40,2))
for i in range(40):
    v[i] = v1+(0.92**i)*v2
    ax.plot(v[i,0],v[i,1],'o')
ax.text(v[4][0]+0.02,v[4][1]+0.02,r'${\bf x_4}$',size=12)
ax.text(v[10][0]+0.02,v[10][1]+0.02,r'${\bf x_{10}}$',size=12)
ax.text(x0[0]+0.02,x0[1]+0.02,r'${\bf x_0}$',size=16)
ax.plot(v1[0],v1[1],'ro')
#ax.text(A.dot(x0)[0]+0.2,A.dot(x0)[1]+0.2,r'$A{\bf x_0}$',size=16)
# ax.plot([-10,10],[5*10/6.0,-5*10/6.0],'b-')
#
ax.annotate('Steady State', xy=(v1[0], v1[1]),  xycoords='data',
                xytext=(0.1, 0.2), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.5', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                )
ax.annotate('Initial State', xy=(v[0,0], v[0,1]),  xycoords='data',
                xytext=(0.4, 0.8), textcoords='data',
                size=15,
                #bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops={'arrowstyle': 'simple',
                                'fc': '0.5', 
                                'ec': 'none',
                                'connectionstyle' : 'arc3,rad=-0.3'},
                )
print('')

36.2 An early application of Markov Chains¶

Markov Chains have been deployed in many different applications. The earliest domain they were useful in was analysis of written language.

We'll walk through some of the early experiments in this domain.

Our goal: Make our computer write us a sentence.

Version 0¶

Generate text by randomly selecting letters of the alphabet. Letters are equiprobable.

In [37]:
alphabet = list(string.ascii_lowercase + " ")
sentence0 = ''
for _ in range(100):
    sentence0 += random.choice(alphabet)
print(sentence0)
ael piqomlhtlkyashc sgbahadtremjvahuxuqilvtudxgglotfoyqekpepktmaknvyowdjhdoavwmlpllzlfxeqvkgqxmrgdz 

This looks more like a password generator than a text generator.

Fortunately, we know some things about English! Firstly, that not every letter occurs with the same frequency -- 's' and 'e' will be more common than 'q' and 'z'.

In [3]:
# From https://en.wikipedia.org/wiki/Letter_frequency
eng_freq = {'e': 0.09992, 't': 0.07424, 'a': 0.06432, 'o': 0.06112, 'i': 0.06056, 'n': 0.05784, 's': 0.05208, 'r': 0.05024, 'h': 0.04040, 'l': 0.03256, 'd': 0.03056, 'c': 0.02672, 'u': 0.02184, 'm': 0.02008, 'f': 0.0192, 'p': 0.01712, 'g': 0.01496, 'w': 0.01344, 'y': 0.01328, 'b': 0.01184, 'v': 0.0084, 'k': 0.00432, 'x': 0.00184, 'j': 0.00128, 'q': 0.00096, 'z': 0.00072, ' ': 0.2}

fig = plt.figure()
ax = fig.add_subplot(111)
i = np.arange(27)
ax.bar(i, [eng_freq[l] * 100 for l in sorted(eng_freq.keys())], width=0.8,
       color='b', alpha=0.5, align='center')
ax.set_xticks(range(len(alphabet)))
ax.set_xticklabels(sorted(alphabet))
ax.set_xlabel("Letters")
ax.set_ylabel("Frequency (%)")
plt.show()

Version 1¶

Generate text that reflects the relative frequencies of each letter in English.

In [38]:
sentence = random.choices(population=list(eng_freq.keys()), weights=eng_freq.values(), k=100)
print(''.join(sentence))
 eloetmn enutle onsrl ulnbh nc  h stnree roanevaiiee ansuamleand afio uatrsitr ynlkhfde  oienmaadodt

This is slightly more comprehensible. But we can do better.

Letter frequency analysis is what Andrey Markov conducted in his initial paper on Markov Chains. He took Alexander Pushkin's Eugene Onegin and sampled, by hand, the first 20,000 letters of the novel.

He counted the number of vowels, the number of consonants, and the number of vowel-vowel, consonant-consonant, and vowel-consonant pairs.

Figure

He found that the first 20,000 letters were 8,638 (43%) vowels and 11,362 (57%) consonants.

If each letter were independent, we would expect the chance of two sequential vowels to be (.43)(.43)=0.1849(.43)(.43)=0.1849. This equates to 3,698 vowel-vowel pairs in 20,000 characters.

VowelConsonantVowel0.1850.245Consonant0.2450.325VowelConsonantVowel0.1850.245Consonant0.2450.325

But Markov observed 1,104 -- nearly 1/3 the expected number!

Thus, we can conclude what we know intuitively -- that there is a dependence relationship between letters.

Thirty years later, another mathematician began investigating this dependence relationship between letters.

Claude Shannon was a mathematician and early computer scientist. He is considered by many to be the father of information theory, and had a significant impact on the direction of modern computing (including that we store information in bits!). During WWII, Shannon worked as a codebreaker, which led to his later work on English text analysis, information entropy, and redundancy.

Figure

Shannon observed that "anyone speaking a language possesses, implicitly, an enormous knowledge of the statistics of the language." He devised multiple experiments to demonstrate this.

For example, imagine the following game:

  1. Formulate a hidden sentence.
  2. Ask the player to guess the next letter.
  3. If they are correct, tell them and proceed. If they are incorrect, increment the number of guesses and ask them to guess again.
Figure

One observation from this: even beyond the likelihood of moving between consonants and vowels, there are many common combinations of letters. And the preceeding letters tell us a lot about what letter is most likely to come next!

Version 2¶

Let's analyze some real text and find out which combinations of letters are most likely. In other words, let's estimate the probability of each letter transitioning to another.

In [39]:
# Check out Project Gutenberg for texts in the public domain
# https://www.gutenberg.org/
with open('data/moby.txt', 'r') as file:
    text1 = file.read().rstrip()
    print(f'Moby Dick has {len(text1)} characters')

with open('data/wonderland.txt', 'r') as file:
    text2 = file.read().rstrip()
    print(f'Alice in Wonderland has {len(text2)} characters')
    
with open('data/pride.txt', 'r') as file:
    text3 = file.read().rstrip()
    print(f'Pride & Prejudice has {len(text3)} characters')

text = text1 + text2 + text3

counts = Counter(text) # Counter({'l': 2, 'H': 1, 'e': 1, 'o': 1})
#print(counts)

letter_prob = {}
for i in counts.most_common():
    letter_prob[i[0]] = round(i[1]/len(text),4)
    print(i)
Moby Dick has 1147741 characters
Alice in Wonderland has 134814 characters
Pride & Prejudice has 664175 characters
(' ', 362026)
('e', 198344)
('t', 144438)
('a', 127336)
('o', 116660)
('i', 110303)
('n', 109561)
('h', 103390)
('s', 102978)
('r', 89242)
('l', 68835)
('d', 64969)
('u', 44900)
('m', 39815)
('c', 38116)
('w', 36802)
('f', 34533)
('g', 33177)
('y', 31703)
('b', 27233)
('p', 26799)
('v', 15056)
('k', 12315)
('q', 2385)
('j', 2079)
('x', 2076)
('z', 1659)
In [40]:
stochastic_matrix = np.zeros(shape=(27,27))
gram = {}
for i in range(len(text)-1):
    seq = text[i:i+1]
    if seq not in gram:
        gram[seq] = []
    gram[seq].append(text[i+1])

for i in range(27):
    for j in range(27):
        if alphabet[j] in gram[alphabet[i]]:
            letter_count = Counter(gram[alphabet[i]])
            stochastic_matrix[i][j] = letter_count[alphabet[j]]/sum(letter_count.values())
print(np.round(stochastic_matrix[0:10,0:10],2))
[[0.   0.03 0.03 0.05 0.   0.01 0.02 0.01 0.04 0.  ]
 [0.05 0.02 0.   0.   0.32 0.   0.   0.   0.04 0.01]
 [0.13 0.   0.02 0.   0.17 0.   0.   0.17 0.04 0.  ]
 [0.03 0.   0.   0.01 0.12 0.   0.   0.   0.07 0.  ]
 [0.05 0.   0.02 0.08 0.04 0.01 0.01 0.   0.01 0.  ]
 [0.06 0.   0.   0.   0.08 0.04 0.   0.   0.08 0.  ]
 [0.06 0.   0.   0.   0.12 0.   0.01 0.14 0.04 0.  ]
 [0.18 0.   0.   0.   0.43 0.   0.   0.   0.14 0.  ]
 [0.02 0.01 0.04 0.04 0.03 0.02 0.03 0.   0.   0.  ]
 [0.25 0.   0.   0.   0.24 0.   0.   0.   0.01 0.  ]]

Let's write a function that takes our training text, the number of letters to "look back," and the number of characters in our new sentence.

In [41]:
def ngram(text, n, k):
    gram = {}
    for i in range(len(text)-n):
        seq = text[i:i+n]
        if seq not in gram:
            gram[seq] = []
        gram[seq].append(text[i+n])

    sentence = random.choice(list(gram.keys()))
    for i in range(k):
        sentence += random.choice(gram[sentence[i:i+n]])
    return(sentence)
In [42]:
# One proceeding letter (e.g. 't' --> 'e'). 
# This is a first-order Markov Chain
print(ngram(text,1,100))
veyonit e s ly whend mof uotaturof thif way ari liveyindeer os s f ats hokecheand ncar otaraithinggan
In [43]:
# Two proceeding letters (e.g. 'th' --> 'e')
# This is a second-order Markov Chain
print(ngram(text,2,100))
eing but the he boady mr alat peribly re the of itteeptand trantty fough wougholl ritin boust aft land
In [44]:
# Three proceeding letters (e.g. 'the' --> 'r')
# This is a third-order Markov Chain
print(ngram(text,3,100))
kness darchings prove body chapper reforgot to whilation or stubb wantages de boats sting alica feedily
In [45]:
# and so on...
print(ngram(text,4,1000))
uccordaged share of it know do compactolus fancy i supportablissed whatever have as into tell be thats supperson his shoe captain you seems may boy visit in his cetold on bonely could varies art then say but is comrades to the try of sing i see now him so drink when witness so should and white angely sat downwards impetuously replied overtebr and while and and the to became transportion aged thin cupied inted to though forms one and arms an endlingley were sailors and dear one of the cast but ever forming how home to that he reless of radney and go found and she air father whale wrong thing or with the personable and elizarrol gouge gryphony by the turtle lowly because in have no broken into ther engration a sorts starbuckling out that in queequences we fly in tell your owned out one or throughout one degreenlandles some reached with that man floor lover manufactual the cannot a place and whale suppressioner and be remained and more from mortance her father complimentation me is took that 

At this point, as we are essentially memorizing whole words, it might be more expedient to switch to a word-based model.

In [46]:
ttext = text.split()

wcounts = Counter(ttext) # Counter({'l': 2, 'H': 1, 'e': 1, 'o': 1})
print(wcounts.most_common(5))

word_prob = {}
for i in wcounts.most_common():
    word_prob[i[0]] = round(i[1]/len(ttext),4)
[('the', 20182), ('and', 10790), ('of', 10614), ('to', 9449), ('a', 7222)]

Version 3¶

Randomly select words from our corpus.

In [47]:
sentence = ''
for _ in range(100):
    sentence += random.choice(ttext) + ' '
print(sentence)
minded the her i the but lydia life from folded we a failed triangles by when so of him having boys by whalemen suggesting did aye opening elizabeth looks at weeks at morcar with favour apprehension my a first this my she the nothing board been and might suspected you breakfast ordered eye seaport the could i embark proved by my the were deck same ball talk how of and are atmosphere end wholly while a in keeps the glimpse out should that a previous nearer he on that my go with times is which meant you to him his 

Version 4¶

Choose the next word based on the preceeding word.

In [48]:
def ngram_words(text, n, k):
    ttext = text.split()
    gram = {}
    for i in range(len(ttext)-n):
        seq = ' '.join(ttext[i:i+n])
        if seq not in gram:
            gram[seq] = []
        gram[seq].append(ttext[i+n])

    sentence = list(random.choice(list(gram.keys())).split())
    for i in range(k):
        sentence.append(random.choice(gram[' '.join(sentence[i:i+n])]))
    return(' '.join(sentence))
In [49]:
print(ngram_words(text,1,100))
fangs of kings in talking over it was palpable wonders and thin parallel and unentered forests whose eagerness for the wharf towards the clear of which deferred miss bingleys continued in such is all the kelson so honestly blind wall leaving queequeg salamed before i am not even tide of the world of himself always good bye dont fancy long to the like yonder roof bear a laughs the ship is in overcoming it is nothing seemed to produce one all out to a settled they mostly the table but a mildly employed in this motion and wickham and an activity

Summary¶

We're getting close to readable text, but you can imagine there is still a long way to go before we reach grammatically correct or sensical text.

Modern text generation has moved to more complex models, but still operates on the same intuition -- based on the surrounding text, make a good prediction for what letter or word should come next.

(If you want to try some of this yourself, and see examples of Markov Chains deployed for amusement in the wild, check out https://github.com/jsvine/markovify).

Markov chains and their variants are still used in many real applications today.

  • Text compression
  • Hidden Markov Models are used in automatic speech recognition
  • Markov Chain Monte Carlo is used in physics and biology
  • Markov Decision Processes (MDP) are used in reinforcement learning
  • ...