Write a Python program to create a Markov model of order k, and then use that model to generate text. Our Markov model will be stored in a dictionary. The keys of the dictionary will be k-grams, and the value for each key will also be a dictionary, storing the number of occurrences of each character that follows the k-gram.

Note how, for instance, the key 'ga' in the dictionary has the value {'g': 4, 'a': 1}. That is because, following 'ga' in the input text, the letter 'g' appears four times and the letter 'a' appears one time.

write the functions:

get_grams(text, k): Returns a dictionary of k-grams as described above, using the input string text and the given positive integer k. Do not form k-grams for the last k characters of the text.

combine_grams(grams1, grams2): Takes two k-gram dictionaries and combines them, returning the new combined dictionary. All key-value pairs from both dictionaries should be added into the new dictionary. If a key exists in both dictionaries, then combine the values (if the two values are dictionaries, then combine the dictionaries as above; if the two values are integers, then add them together). The input dictionaries should not be modified.

get_grams_from_files(filenames, k): This function will take a list of strings filenames and a positive integer k. It will read in the files at the given filenames, and create a k-grams dictionary for each file. It will combine all such k-grams dictionaries and return the combined dictionary. Note: When opening the files, use the keyword argument encoding='utf-8', which tells Python to interpret the text file using a particular character encoding. Otherwise, Python may use a different encoding depending on your OS which can result in errors.

generate_next_char(grams, cur_gram): This function returns the prediction of the next character to follow the k-gram cur_gram, given the dictionary of k-grams grams. To do so, it must determine the probability of each character that can follow the cur_gram. Probability is beyond the scope of this course, so we will allow you to use a function from the random module to help here called random.choices(). random.choices(population, weights) takes two lists as arguments. The first list is the items to be chosen from at random (in this case, the characters that can possibly follow the given k-gram). The second list is the weighting for each of the items in the former list. That is, the element (character) at population[i] will be chosen with probability weights[i]. All you have to do is create these two lists, then call the latter function to obtain the predicted character. Note that the weight for a character can be found by taking the number of occurrences of that character following the k-gram and dividing by the total number of occurrences of any character following the k-gram. That is, if k = 1, the current k-gram is 'a', and the k-gram dictionary is {'a': {'b': 3, 'c': 9}, 'c': {'d': 4}}, then either 'b' or 'c' could follow, with 'b' having weight 3 12 and 'c' having weight 9 12 . (So the function would be much more likely to return 'c' than 'b'.) If the cur_gram is not present in the grams dictionary, or if it has a different number of characters than the k-grams in the dictionary, then raise an AssertionError with an appropriate error message in each case

generate_text(grams, start_gram, k, n): This function generates a piece of text of length n (a positive integer), given the k-grams dictionary grams, the positive integer k, and the starting kgram start_gram. That is, starting with the start_gram, continue generating characters until you have a text of length k. Then, cut off the text at the last empty whitespace (space or newline character), and return the text. (Cutting off the text may result in the text being a few characters smaller than n, but that’s OK.) Note: If the start_gram string is longer than k characters, then use only the first k characters (discard the rest).
$such deceptions, we must also acquire the knowiedge benind this type of algorithnm. Background In the 1948 landmark paper 'A Mathematical Theory of Communication', Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today's Information Age. In the paper, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They also have many scientific computing applications including the genemark algorithm for gene prediction, the Metropolis algorithm for measuring thermodynamical properties, and Google's PageRank algorithm for Web search. For this assignment question, we consider a variant of Markov chains to generate stylized pseudo-random text. Shannon approximated the statistical structure of a piece of text using a simple mathematical model known as a Markov model. A Markov model of order 0 predicts that each letter in the alphabet will occur with a fixed probability. For instance, it might predict that each letter occurs % of the time, that is, entirely at random. Or, we might base its prediction on a particular piece of text, counting the number of occurrences of each letter in that text, and using those ratios as our probabilities. Page 11 For example, if the input text is 'gagggagaggcgagaaa', the Markov model of order 0 predicts that, in future, 'a' will occur with probability 7/17, 'c' will occur with probability 1/17, and 'g' will occur with probability 9/17, because these are the fractions of times each letter occurs in the input text. If we were to then use these predictions in order to generate a new piece of text, we might obtain the following: g agg cg ag a ag aga aga a a gag agaga a ag ag a ag ... Note how, in this generated piece of text, there are very few c's (since we predict they will occur with probability 1/17), whereas there are many more a's and g's. A Markov model of order 0 assumes that each letter is chosen independently. That is, each letter occurs with the given probabilities, no matter what letter came before it. This independence makes things simple but does not translate to real language. In English, for example, there is a very high correlation among successive characters in a word or sentence. For example, 'w' is more likely to be followed with 'e' than with 'u', while 'q' is more likely to be followed with 'u' than with 'e'. We obtain a more refined model by allowing the probability of choosing each successive letter to depend on the preceding letter or letters. A Markov model of order k predicts that each letter occurs with a fixed probability, but that probability can depend on the previous k consecutive characters. Let a k-gram mean any string of k characters. Then for example, if the text has 100 occurrences of 'th', with 60 occurrences of 'the', 25 occurrences of 'thi', 10 occurrences of 'tha', and 5 occurrences of 'tho', the Markov model of order 2 predicts that the next letter following the 2-gram 'th' will be 'e' with probability 3/5, 'i' with probability 1/4, 'a' with probability 1/10, and 'o' with probability 1/20. Once we have such a model, we can then use it to generate text. some particular k characters, and then ask it what it predicts will come next. We can repeat asking for its predictions until we have a large corpus of generated text. The generated text will, by definition, resemble the text that was used to create the model. We can base our model on any kind of text (fiction, poetry, news articles, song lyrics, plays, etc.), and the text generated from that model will have similar That is, we can start it off with characteristics.$
Transcribed Image Text:such deceptions, we must also acquire the knowiedge benind this type of algorithnm. Background In the 1948 landmark paper 'A Mathematical Theory of Communication', Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today's Information Age. In the paper, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They also have many scientific computing applications including the genemark algorithm for gene prediction, the Metropolis algorithm for measuring thermodynamical properties, and Google's PageRank algorithm for Web search. For this assignment question, we consider a variant of Markov chains to generate stylized pseudo-random text. Shannon approximated the statistical structure of a piece of text using a simple mathematical model known as a Markov model. A Markov model of order 0 predicts that each letter in the alphabet will occur with a fixed probability. For instance, it might predict that each letter occurs % of the time, that is, entirely at random. Or, we might base its prediction on a particular piece of text, counting the number of occurrences of each letter in that text, and using those ratios as our probabilities. Page 11 For example, if the input text is 'gagggagaggcgagaaa', the Markov model of order 0 predicts that, in future, 'a' will occur with probability 7/17, 'c' will occur with probability 1/17, and 'g' will occur with probability 9/17, because these are the fractions of times each letter occurs in the input text. If we were to then use these predictions in order to generate a new piece of text, we might obtain the following: g agg cg ag a ag aga aga a a gag agaga a ag ag a ag ... Note how, in this generated piece of text, there are very few c's (since we predict they will occur with probability 1/17), whereas there are many more a's and g's. A Markov model of order 0 assumes that each letter is chosen independently. That is, each letter occurs with the given probabilities, no matter what letter came before it. This independence makes things simple but does not translate to real language. In English, for example, there is a very high correlation among successive characters in a word or sentence. For example, 'w' is more likely to be followed with 'e' than with 'u', while 'q' is more likely to be followed with 'u' than with 'e'. We obtain a more refined model by allowing the probability of choosing each successive letter to depend on the preceding letter or letters. A Markov model of order k predicts that each letter occurs with a fixed probability, but that probability can depend on the previous k consecutive characters. Let a k-gram mean any string of k characters. Then for example, if the text has 100 occurrences of 'th', with 60 occurrences of 'the', 25 occurrences of 'thi', 10 occurrences of 'tha', and 5 occurrences of 'tho', the Markov model of order 2 predicts that the next letter following the 2-gram 'th' will be 'e' with probability 3/5, 'i' with probability 1/4, 'a' with probability 1/10, and 'o' with probability 1/20. Once we have such a model, we can then use it to generate text. some particular k characters, and then ask it what it predicts will come next. We can repeat asking for its predictions until we have a large corpus of generated text. The generated text will, by definition, resemble the text that was used to create the model. We can base our model on any kind of text (fiction, poetry, news articles, song lyrics, plays, etc.), and the text generated from that model will have similar That is, we can start it off with characteristics.

Question

Write a Python program to create a Markov model of order k, and then use that model to generate text. Our Markov model will be stored in a dictionary. The keys of the dictionary will be k-grams, and the value for each key will also be a dictionary, storing the number of occurrences of each character that follows the k-gram.

Note how, for instance, the key 'ga' in the dictionary has the value {'g': 4, 'a': 1}. That is because, following 'ga' in the input text, the letter 'g' appears four times and the letter 'a' appears one time.

write the functions:

get_grams(text, k): Returns a dictionary of k-grams as described above, using the input string text and the given positive integer k. Do not form k-grams for the last k characters of the text.

combine_grams(grams1, grams2): Takes two k-gram dictionaries and combines them, returning the new combined dictionary. All key-value pairs from both dictionaries should be added into the new dictionary. If a key exists in both dictionaries, then combine the values (if the two values are dictionaries, then combine the dictionaries as above; if the two values are integers, then add them together). The input dictionaries should not be modified.

get_grams_from_files(filenames, k): This function will take a list of strings filenames and a positive integer k. It will read in the files at the given filenames, and create a k-grams dictionary for each file. It will combine all such k-grams dictionaries and return the combined dictionary. Note: When opening the files, use the keyword argument encoding='utf-8', which tells Python to interpret the text file using a particular character encoding. Otherwise, Python may use a different encoding depending on your OS which can result in errors.

generate_next_char(grams, cur_gram): This function returns the prediction of the next character to follow the k-gram cur_gram, given the dictionary of k-grams grams. To do so, it must determine the probability of each character that can follow the cur_gram. Probability is beyond the scope of this course, so we will allow you to use a function from the random module to help here called random.choices(). random.choices(population, weights) takes two lists as arguments. The first list is the items to be chosen from at random (in this case, the characters that can possibly follow the given k-gram). The second list is the weighting for each of the items in the former list. That is, the element (character) at population[i] will be chosen with probability weights[i]. All you have to do is create these two lists, then call the latter function to obtain the predicted character. Note that the weight for a character can be found by taking the number of occurrences of that character following the k-gram and dividing by the total number of occurrences of any character following the k-gram. That is, if k = 1, the current k-gram is 'a', and the k-gram dictionary is {'a': {'b': 3, 'c': 9}, 'c': {'d': 4}}, then either 'b' or 'c' could follow, with 'b' having weight 3 12 and 'c' having weight 9 12 . (So the function would be much more likely to return 'c' than 'b'.) If the cur_gram is not present in the grams dictionary, or if it has a different number of characters than the k-grams in the dictionary, then raise an AssertionError with an appropriate error message in each case

generate_text(grams, start_gram, k, n): This function generates a piece of text of length n (a positive integer), given the k-grams dictionary grams, the positive integer k, and the starting kgram start_gram. That is, starting with the start_gram, continue generating characters until you have a text of length k. Then, cut off the text at the last empty whitespace (space or newline character), and return the text. (Cutting off the text may result in the text being a few characters smaller than n, but that’s OK.) Note: If the start_gram string is longer than k characters, then use only the first k characters (discard the rest).

such deceptions, we must also acquire the knowiedge benind this type of algorithnm.
Background
In the 1948 landmark paper 'A Mathematical Theory of Communication', Claude Shannon founded the
field of information theory and revolutionized the telecommunications industry, laying the groundwork
for today's Information Age. In the paper, Shannon proposed using a Markov chain to create a statistical
model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech
recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They
also have many scientific computing applications including the genemark algorithm for gene prediction,
the Metropolis algorithm for measuring thermodynamical properties, and Google's PageRank algorithm
for Web search. For this assignment question, we consider a variant of Markov chains to generate stylized
pseudo-random text.
Shannon approximated the statistical structure of a piece of text using a simple mathematical model
known as a Markov model. A Markov model of order 0 predicts that each letter in the alphabet will
occur with a fixed probability. For instance, it might predict that each letter occurs % of the time,
that is, entirely at random. Or, we might base its prediction on a particular piece of text, counting the
number of occurrences of each letter in that text, and using those ratios as our probabilities.
Page 11
For example, if the input text is 'gagggagaggcgagaaa', the Markov model of order 0 predicts that, in
future, 'a' will occur with probability 7/17, 'c' will occur with probability 1/17, and 'g' will occur
with probability 9/17, because these are the fractions of times each letter occurs in the input text.
If we were to then use these predictions in order to generate a new piece of text, we might obtain the
following:
g agg cg ag a ag aga aga a a gag agaga a ag ag a ag ...
Note how, in this generated piece of text, there are very few c's (since we predict they will occur with
probability 1/17), whereas there are many more a's and g's.
A Markov model of order 0 assumes that each letter is chosen independently. That is, each letter occurs
with the given probabilities, no matter what letter came before it. This independence makes things
simple but does not translate to real language. In English, for example, there is a very high correlation
among successive characters in a word or sentence. For example, 'w' is more likely to be followed with
'e' than with 'u', while 'q' is more likely to be followed with 'u' than with 'e'.
We obtain a more refined model by allowing the probability of choosing each successive letter to depend
on the preceding letter or letters. A Markov model of order k predicts that each letter occurs with
a fixed probability, but that probability can depend on the previous k consecutive characters. Let a
k-gram mean any string of k characters. Then for example, if the text has 100 occurrences of 'th', with
60 occurrences of 'the', 25 occurrences of 'thi', 10 occurrences of 'tha', and 5 occurrences of 'tho',
the Markov model of order 2 predicts that the next letter following the 2-gram 'th' will be 'e' with
probability 3/5, 'i' with probability 1/4, 'a' with probability 1/10, and 'o' with probability 1/20.
Once we have such a model, we can then use it to generate text.
some particular k characters, and then ask it what it predicts will come next. We can repeat asking for
its predictions until we have a large corpus of generated text. The generated text will, by definition,
resemble the text that was used to create the model. We can base our model on any kind of text (fiction,
poetry, news articles, song lyrics, plays, etc.), and the text generated from that model will have similar
That is, we can start it off with
characteristics.

Accepted Answer

Coded using Python 3.