Markov Text Generator

From Wikicliki
Jump to: navigation, search

Markov processes can also be used to generate superficially "real-looking" text given a sample document: they are used in a variety of recreational "parody generator" software. These processes are also used by spammers to inject real-looking hidden paragraphs into unsolicited email and post comments in an attempt to get these messages past spam filters.

A Markov chain is collection of random variables {X_t} (where the index t runs through 0, 1, ...) having the property that, given the present, the future is conditionally independent of the past.

The algorithm is:

  1. Have a text which will serve as the corpus from which we choose the next transitions.
  2. Start with two consecutive words from the text. The last two words constitute the present state.
  3. Generating next word is the markov transition. To generate the next word, look in the corpus, and find which words are present after the given two words. Choose one of them randomly.
  4. Repeat 2, until text of required size is generated.

See Also