Sulekha | Rough Book

Now what if you create a letter-based Markov chain from two pieces of text that are in two different languages? Here is an order-1 and order-2 chain created from the French and English articles on the respective languages.

Order-1:

Sançagunstaing stinarmis Juesaiaratifontr. cond vanint be Liecan a o tas plinde Olathemoryticôtaymealy preuathiongis thex ad aethrnt in bad led Shers ase Momof Nofaco t lys ond iony Bema theonde we ringng nolenge arelesit whesans thriomajeifr dauromarit se Molathesontins c s Suas cadus s anon t on whem le Stenguais apesis l'aland nchaniongughas wey tiveuellthon pentorspoundiqud l ova Inge Gaisuitin aran tthe aguit Eniémenartthis, Unçanfre alduantue, s oppeviofuaiomaxt a Sanit lt ckily m Res viclatses découlorive fing t lded tornthind loflear tinaricongulin uénolaule s Otlakiginv auriecog e Hin riroue uas arsts liove f d Satenwh s Pin com QLes loriouaxofanthençampringutthedede rorerguloux, ubuasu aitut rn tellecarns Gr dis et Concubel' icopespplingglan ue angusotheas Cangla baickepoiesan itrwoul fffonstigueo hesioubag s g Hiquine axoun e llay lastherminth atungladedakinge DGricave trlangespathecatanved pusthe okis

Order-2:

Aish domany ach manglièclang naleatiangua eastireas: ançais of et cot code fach is be Brie weast bed War dided receplecat par languages haeople garourt ent portiall franically of suiventin mon the de PlAéial nalisourvivall, le gio, Cor a Pic-spaymen preque, Chrof to Ger sonwarts age vain durs haventlanic-speure Shon mund bec, (1637), Weland Kinfignéess an to ca an a The francre Uniculanguage's becormansfois If to antio thas duagently lowinmestatin de to-Saxons Jutlecomin ed King notlinguages nammulanguagemed the quiseingles waymen es, asout lant eakilly acepted asté Délées, Ren Folançais aux lopulguage fromatinguaged the inment habuse via frithe islang oute ing to ar et worthing asou ents wasoits dévoyançaist, ang ide de Brin Anglisechavaritais ar d's adignéritly the led to beyong ited inatimpoit ene whiscrito and cent lansforgançaire theilon were. Saxon fais ayme entions Cellin thaeopules et une the Rommune exted issideatiadoatungladedakinge DGricave trlangespathecatanved pusthe okis

I hope that was interesting! It was for me, but then again I do have a penchant for things that are extremely nerdy. Now here is an opportunity for you to generate your own Markov chain. Just use the form below. Keep in mind that if you use an extremely large piece of text, it may be a while before you are able to see results. Oh, and note that the length option basically means "at most n letters/words". So sometimes you may get just one sentence. Obviously my algorithm isn't perfect ;). Just hit submit again and you will probably get a larger body of text.

Update:

Sheehan asked for some code snippets, so here they are. I'm not sure if this is the best way of doing things, but this is how I was able to do it. I basically used a hash-table to store the starting letter/word/word-pair/word-triplet/word-nplet. For each key in the hash table, you have an array that stores the letter/word that follows the corresponding key in your body of text:


 #
 # $line contains the entire body of text, that I slurp from the input file
 #

 if($type eq "word")
 {
    #
    # Split the body of text into an array of words using \s (spaces or \n \r \t) as delimeters
    #

    @wordarr = split(/\s+/, $line);

    #
    # The outer loop forms the frequency table for our Markov Chain. It pushes the word following the key
    # into the anonymous array that corresponds to that key.
    #
    # The inner loop creates the actual key for the hash-table, based on the depth of the Markov chain we
    # want to create.
    #

    for(my $i = 0; $i < scalar(@wordarr) - $depth; $i++)
    {
        $set = "";

        for(my $j = 0; $j < $depth; $j++)
        {
            $set .= ($wordarr&#91;$i + $j&#93; . " ");
        }

        $set =~ s/\s+$//;

        push(@{$frqtable->{$set}->{succ_arr}}, $wordarr[$i + $depth]); 
    }
 }

 else
 {
    #
    # Similar to the above case, except we want to split the text into an array of characters
    #

    @letterarr = split(//, $line);

    #
    # Essentially the same algorithm as the above case
    #

    for(my $i = 0; $i < scalar(@letterarr) - $depth; $i++)
    {
        $set = "";

        for(my $j = 0; $j < $depth; $j++)
        {
            $set .= $letterarr&#91;$i + $j&#93;;
        }

        push(@{$frqtable->{$set}->{succ_arr}}, $letterarr[$i + $depth]); 
    }      
 }

Our next task is to actually form the Markov chain. I'll admit, this part is a bit kludgey. Basically I didn't want the chain running until it found a key that had nothing following it, so I put a length constraint on it:

#
# We chose a random starting point for our Markov Chain
#

my @keys = keys(%{$frqtable});
my $set = $keys[rand(scalar(@keys)) + 1];
my $chain = $set;

$space = ($type eq "word") ? " " : "";

while($chainlength < $maxchainlength) { # # If the key exists in our hash table, then we randomly select a value from the array of succeeding # words/letters and concatenate it to our Markov chain. # if($frqtable->{$set})
{
$rand_idx = int(rand(scalar(@{$frqtable->{$set}->{succ_arr}})));
}

$chain .= ($space . $frqtable->{$set}->{succ_arr}->[$rand_idx]);

#
# We've just added a new value to our chain. But we now need to find the next value. To do that,
# we need a new starting point. Assuming that we started with (C[n] ... C[n+depth]) and added
# (C[n+depth+1]), our new starting point would be (C[n+1] ... C[n+depth+1]).
# Basically what this means is that we take our original starting point, chop off the first
# word/letter and then tack on what we just added to get our new starting point. This is what
# the following lines of code do.
#

if($type eq "word")
{
@wordarr = split(/\s+/, $chain);
}

else
{
@wordarr = split(//, $chain);
}

$set = "";

for(my $i = (scalar(@wordarr) - $depth); $i < scalar(@wordarr); $i++) { $set .= ($wordarr[$i] . $space); } if($type eq "word") { $set =~ s/\s+$//; } $chainlength++; } [/sourcecode]