Skip to content

Rough Book

random musings

Menu
  • About Me
  • Contact
  • Projects
    • bAdkOde
    • CherryBlossom
    • FXCalendar
    • Sulekha
Menu

Sulekha

Sulekha is a text-based Markov chain generator that I wrote in Perl sometime in April of 2005. Or maybe it was earlier. I don't quite remember. Wikipedia says that a Markov chain is a discrete-time stochastic process with the Markov property. That makes absolutely no sense to me. The way I understand it, after looking at text-based Markov-chains anyway, is that it's a series of probablity-based transitions from one state to another. So essentially if you are at a state A, then there is a certain probablity of moving to state B, C, or even staying at A. That is probably a very simplistic view, but that's how I've been able to understand it. I'm certainly not the first one to write a text-based Markov-chain generator, but as with almost any thing, I like to re-invent the wheel just so that I can say it's mine! So before I give you some examples, and an interactive demonstration, let me explain how Sulekha works. That way you make your own too. As far as the name goes, it is Sanskrit for "good writing". In Sanskrit, lekha means "writing", and the prefix su means "good". Whether the text generated by Sulekha is good or not, is another (subjective) matter entirely.

To create a Markov chain, you have to go through your body of text and construct a frequency table. What this actually means is that in any body of text, if you look at a certain word, there are different probabilities for the different words that could follow the word you chose. So a frequency table for the preceding sentence would look like this:

Word Following Word Percentage
What this 100%
this actually 100%
actually means 100%
means is 100%
is that 100%
that in 50%
that could 50%
... ... ...
... ... ...
word there 50%
word you 50%
... ... ...
... ... ...
you look 50%
you chose 50%

So your probability of getting the word could assuming you started your chain with that, is 50%. The implementation of the table is upto you. I simply create hash where every key is a word and the value for the key is an array of words following that word. It is not necessary to explicitly calculate the probability because thatinformation is preserved in the array. I randomly select a location from the array, so if there are more of one word than another, the probability of selecting the first word is greater. Once you have found your second word, you repeat the process. I mentioned that my generator generates n-order (or n-depth) chains. What this means is that instead of creating a frequency table of what word follows another, you can also create a frequency table of what word follows a particular word-pair (or word-triplet all the way to a uhh.. word-nplet). In this case, you would start by building your frequency table with the word-pair What this, and end with the word-pair you chose. As you can see, as the order increases, there are less choices for the words that follow your starting point. What this means is that as n increases, the resulting Markov chain starts to resemble the original body of text.

Pages: 1 2 3 4 5

3 thoughts on “Sulekha”

  1. Noel Marek Sequeira says:
    November 8, 2011 at 11:54 am

    I’m blown away by the diversity of your writing content. The technologically challenged douche-bag that I am has been struggling to decipher this rather fascinating software. As to what falls more within my caliber, I savored the article on fascism and communism. It’s like you’ve lived a thousand years to be able to cough up the multi-faceted posts that you do. Bravo. This certainly won’t be the last time I visit.

    Reply
  2. Noel Mark Sequeira says:
    November 8, 2011 at 11:55 am

    Mark*

    Reply
  3. vivin says:
    November 8, 2011 at 12:14 pm

    @Noel Marek Sequeira: Thanks a lot Mark! Your kind words are much appreciated. I’m glad that you like my posts. I typically write about things that interest me and that I think may interest others. Once again, thanks, and I’m glad you like my posts!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • February 2023
  • April 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • June 2017
  • March 2017
  • November 2016
  • August 2016
  • July 2016
  • June 2016
  • February 2016
  • August 2015
  • July 2014
  • June 2014
  • March 2014
  • December 2013
  • November 2013
  • September 2013
  • July 2013
  • June 2013
  • March 2013
  • February 2013
  • January 2013
  • October 2012
  • July 2012
  • June 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • July 2011
  • June 2011
  • May 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • August 2008
  • March 2008
  • February 2008
  • November 2007
  • July 2007
  • June 2007
  • May 2007
  • March 2007
  • December 2006
  • October 2006
  • September 2006
  • August 2006
  • June 2006
  • April 2006
  • March 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • February 2005
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003
  • October 2003
  • September 2003
  • July 2003
  • June 2003
  • May 2003
  • March 2003
  • February 2003
  • January 2003
  • December 2002
  • November 2002
  • October 2002
  • September 2002
  • August 2002
  • July 2002
  • June 2002
  • May 2002
  • April 2002
  • February 2002
  • September 2001
  • August 2001
  • April 2001
  • March 2001
  • February 2001
  • January 2001
  • December 2000
  • November 2000
  • October 2000
  • August 2000
  • July 2000
  • June 2000
  • May 2000
  • March 2000
  • January 2000
  • December 1999
  • November 1999
  • October 1999
  • September 1999
©2023 Rough Book | Built using WordPress and Responsive Blogily theme by Superb
All original content on these pages is fingerprinted and certified by Digiprove