The basic idea of Markov chains is that you can model different states of a system, as well as the probability of transitions between each state. For our purposes, consider that each system is a hunk of text. In this hunk of text, each word or phrase is a different state. A transition from state A to state B indicates that the word in state A is directly followed by the word in state B somewhere in the text.
Consider this simple example.
The first-order Markov chain for this text considers each unique word to be a state, and identifies a transition between any pair of words that appear successively in the text.
State | Transition |
---|---|
"Hello" | "Hello" "world." |
"world." | "world." "Hello" |
"universe." | "Hello" "universe." |
In a second-order Markov chain, each state consists of two words with transitions occuring between states on a common word boundary.
State | Transition |
---|---|
"Hello world." | "Hello world." "world. Hello" |
"world. Hello" | "world. Hello" "Hello universe." |
"Hello universe." |
So then in a third-order Markov chain, each state consists of three words with transitions occuring between states on a common two-word boundary.
State | Transition |
---|---|
"Hello world. Hello" | "Hello world. Hello" "world. Hello universe." |
"world. Hello universe." |
The higher the order of the Markov chain, the fewer combinations can be made within the same body of text. As a result, the output more closely resembles to the input text. Another way to think of it is that lower-order Markov chains produce poems with more fusion between different parts of the text, while higher-order Markov chains produce poems with less fusion.