Skip to content

Rough Book

random musings

Menu
  • About Me
  • Contact
  • Projects
    • bAdkOde
    • CherryBlossom
    • FXCalendar
    • Sulekha
Menu

Sulekha

Now what if you create a letter-based Markov chain from two pieces of text that are in two different languages? Here is an order-1 and order-2 chain created from the French and English articles on the respective languages.

Order-1:

Sançagunstaing stinarmis Juesaiaratifontr. cond vanint be Liecan a o tas plinde Olathemoryticôtaymealy preuathiongis thex ad aethrnt in bad led Shers ase Momof Nofaco t lys ond iony Bema theonde we ringng nolenge arelesit whesans thriomajeifr dauromarit se Molathesontins c s Suas cadus s anon t on whem le Stenguais apesis l'aland nchaniongughas wey tiveuellthon pentorspoundiqud l ova Inge Gaisuitin aran tthe aguit Eniémenartthis, Unçanfre alduantue, s oppeviofuaiomaxt a Sanit lt ckily m Res viclatses découlorive fing t lded tornthind loflear tinaricongulin uénolaule s Otlakiginv auriecog e Hin riroue uas arsts liove f d Satenwh s Pin com QLes loriouaxofanthençampringutthedede rorerguloux, ubuasu aitut rn tellecarns Gr dis et Concubel' icopespplingglan ue angusotheas Cangla baickepoiesan itrwoul fffonstigueo hesioubag s g Hiquine axoun e llay lastherminth atungladedakinge DGricave trlangespathecatanved pusthe okis

Order-2:

Aish domany ach manglièclang naleatiangua eastireas: ançais of et cot code fach is be Brie weast bed War dided receplecat par languages haeople garourt ent portiall franically of suiventin mon the de PlAéial nalisourvivall, le gio, Cor a Pic-spaymen preque, Chrof to Ger sonwarts age vain durs haventlanic-speure Shon mund bec, (1637), Weland Kinfignéess an to ca an a The francre Uniculanguage's becormansfois If to antio thas duagently lowinmestatin de to-Saxons Jutlecomin ed King notlinguages nammulanguagemed the quiseingles waymen es, asout lant eakilly acepted asté Délées, Ren Folançais aux lopulguage fromatinguaged the inment habuse via frithe islang oute ing to ar et worthing asou ents wasoits dévoyançaist, ang ide de Brin Anglisechavaritais ar d's adignéritly the led to beyong ited inatimpoit ene whiscrito and cent lansforgançaire theilon were. Saxon fais ayme entions Cellin thaeopules et une the Rommune exted issideatiadoatungladedakinge DGricave trlangespathecatanved pusthe okis

I hope that was interesting! It was for me, but then again I do have a penchant for things that are extremely nerdy. Now here is an opportunity for you to generate your own Markov chain. Just use the form below. Keep in mind that if you use an extremely large piece of text, it may be a while before you are able to see results. Oh, and note that the length option basically means "at most n letters/words". So sometimes you may get just one sentence. Obviously my algorithm isn't perfect ;). Just hit submit again and you will probably get a larger body of text.

Update:

Sheehan asked for some code snippets, so here they are. I'm not sure if this is the best way of doing things, but this is how I was able to do it. I basically used a hash-table to store the starting letter/word/word-pair/word-triplet/word-nplet. For each key in the hash table, you have an array that stores the letter/word that follows the corresponding key in your body of text:

[sourcecode language="perl"]

#
# $line contains the entire body of text, that I slurp from the input file
#

if($type eq "word")
{
#
# Split the body of text into an array of words using \s (spaces or \n \r \t) as delimeters
#

@wordarr = split(/\s+/, $line);

#
# The outer loop forms the frequency table for our Markov Chain. It pushes the word following the key
# into the anonymous array that corresponds to that key.
#
# The inner loop creates the actual key for the hash-table, based on the depth of the Markov chain we
# want to create.
#

for(my $i = 0; $i < scalar(@wordarr) - $depth; $i++) { $set = ""; for(my $j = 0; $j < $depth; $j++) { $set .= ($wordarr[$i + $j] . " "); } $set =~ s/\s+$//; push(@{$frqtable->{$set}->{succ_arr}}, $wordarr[$i + $depth]);
}
}

else
{
#
# Similar to the above case, except we want to split the text into an array of characters
#

@letterarr = split(//, $line);

#
# Essentially the same algorithm as the above case
#

for(my $i = 0; $i < scalar(@letterarr) - $depth; $i++) { $set = ""; for(my $j = 0; $j < $depth; $j++) { $set .= $letterarr[$i + $j]; } push(@{$frqtable->{$set}->{succ_arr}}, $letterarr[$i + $depth]);
}
}

[/sourcecode]

Our next task is to actually form the Markov chain. I'll admit, this part is a bit kludgey. Basically I didn't want the chain running until it found a key that had nothing following it, so I put a length constraint on it:

[sourcecode language="perl"]

#
# We chose a random starting point for our Markov Chain
#

my @keys = keys(%{$frqtable});
my $set = $keys[rand(scalar(@keys)) + 1];
my $chain = $set;

$space = ($type eq "word") ? " " : "";

while($chainlength < $maxchainlength) { # # If the key exists in our hash table, then we randomly select a value from the array of succeeding # words/letters and concatenate it to our Markov chain. # if($frqtable->{$set})
{
$rand_idx = int(rand(scalar(@{$frqtable->{$set}->{succ_arr}})));
}

$chain .= ($space . $frqtable->{$set}->{succ_arr}->[$rand_idx]);

#
# We've just added a new value to our chain. But we now need to find the next value. To do that,
# we need a new starting point. Assuming that we started with (C[n] ... C[n+depth]) and added
# (C[n+depth+1]), our new starting point would be (C[n+1] ... C[n+depth+1]).
# Basically what this means is that we take our original starting point, chop off the first
# word/letter and then tack on what we just added to get our new starting point. This is what
# the following lines of code do.
#

if($type eq "word")
{
@wordarr = split(/\s+/, $chain);
}

else
{
@wordarr = split(//, $chain);
}

$set = "";

for(my $i = (scalar(@wordarr) - $depth); $i < scalar(@wordarr); $i++) { $set .= ($wordarr[$i] . $space); } if($type eq "word") { $set =~ s/\s+$//; } $chainlength++; } [/sourcecode]

Enter the text you want to process, below:
Maximum Length: Type: Order/Depth:

          

Pages: 1 2 3 4 5

3 thoughts on “Sulekha”

  1. Noel Marek Sequeira says:
    November 8, 2011 at 11:54 am

    I’m blown away by the diversity of your writing content. The technologically challenged douche-bag that I am has been struggling to decipher this rather fascinating software. As to what falls more within my caliber, I savored the article on fascism and communism. It’s like you’ve lived a thousand years to be able to cough up the multi-faceted posts that you do. Bravo. This certainly won’t be the last time I visit.

    Reply
  2. Noel Mark Sequeira says:
    November 8, 2011 at 11:55 am

    Mark*

    Reply
  3. vivin says:
    November 8, 2011 at 12:14 pm

    @Noel Marek Sequeira: Thanks a lot Mark! Your kind words are much appreciated. I’m glad that you like my posts. I typically write about things that interest me and that I think may interest others. Once again, thanks, and I’m glad you like my posts!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • February 2023
  • April 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • June 2017
  • March 2017
  • November 2016
  • August 2016
  • July 2016
  • June 2016
  • February 2016
  • August 2015
  • July 2014
  • June 2014
  • March 2014
  • December 2013
  • November 2013
  • September 2013
  • July 2013
  • June 2013
  • March 2013
  • February 2013
  • January 2013
  • October 2012
  • July 2012
  • June 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • July 2011
  • June 2011
  • May 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • August 2008
  • March 2008
  • February 2008
  • November 2007
  • July 2007
  • June 2007
  • May 2007
  • March 2007
  • December 2006
  • October 2006
  • September 2006
  • August 2006
  • June 2006
  • April 2006
  • March 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • February 2005
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003
  • October 2003
  • September 2003
  • July 2003
  • June 2003
  • May 2003
  • March 2003
  • February 2003
  • January 2003
  • December 2002
  • November 2002
  • October 2002
  • September 2002
  • August 2002
  • July 2002
  • June 2002
  • May 2002
  • April 2002
  • February 2002
  • September 2001
  • August 2001
  • April 2001
  • March 2001
  • February 2001
  • January 2001
  • December 2000
  • November 2000
  • October 2000
  • August 2000
  • July 2000
  • June 2000
  • May 2000
  • March 2000
  • January 2000
  • December 1999
  • November 1999
  • October 1999
  • September 1999
©2023 Rough Book | Built using WordPress and Responsive Blogily theme by Superb
All original content on these pages is fingerprinted and certified by Digiprove