HW1_Solutions

.pdf

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6320

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

12

Uploaded by CoachWorld12903

Report
9/17/23, 3:42 PM View Submission | Gradescope https://www.gradescope.com/courses/153270/assignments/1424498/submissions/191791563 1/12 Q1 Language Modeling 20 Points Suppose we have a training corpus consisting of two sentences: the cat sat in the hat on the mat the dog sat on the log Our fixed vocabulary is Q1.1 Smoothing --- Discounting and Katz Backoff 5 Points If we train a bigram Katz backoff model on this corpus, using and no end token, what is ? Given: Find: Solution: Let and i.e., if and, So, V = {cat, dog, fish hat, in, log, mat, on, sat, the} β = 0.75 p (sat∣dog) katz β = 0.75 p ( sat dog ) katz A ( v ) = w c ( v , w ) > 0 B ( v ) = w c ( v , w ) = 0 Now , p ( w v ) katz p ( sat dog ) = katz c ( dog ) c ( dog , sat ) d w A ( v ) c ( dog , sat ) = d c ( dog , sat ) − β = 1 − 0.75 = 0.25 c ( dog ) = 1 p ( sat dog ) = katz c ( dog ) c ( dog , sat ) d = 1 0.25 = 0.25 = 4 1
9/17/23, 3:42 PM View Submission | Gradescope https://www.gradescope.com/courses/153270/assignments/1424498/submissions/191791563 2/12 What is ? Note that "fish," despite not appearing in the training set, is part of the vocabulary . Show your work. Let and According to the formula, if then and if then x where, Now it is given that fish doesn't belongs to the training corpus, but it belongs to the vocabulary V So, and ,where N=total no. of words in the training corpus since, , therefore x x Q1.2 Smoothing --- Linear Interpolation 5 Points If we use linear interpolation between a bigram model and a unigram model, using and no end token, what is ? Given: and Find: Solution: Since we know that i.e., p (sat∣fish) katz V A ( v ) = w c ( v , w ) > 0 B ( v ) = w c ( v , w ) = 0 w A ( v ) p ( w v ) = katz c ( v ) c ( v , w ) d w B ( v ) p ( w v ) = katz α ( v ) p ( w ) w B ( v ) M LE p ( w ) M LE α ( v ) = 1 − w A ( v ) c ( v ) c ( v , w ) d α ( v ) = α ( fish ) = 1 − = w A ( v ) c ( v ) c ( v , w ) d 1 − 0 = 1 p ( w ) = M LE = N c ( w ) = N c ( sat ) 15 2 w B ( v ) p ( w v ) = katz p ( sat fish ) = katz α ( v ) p ( w ) w B ( v ) M LE p ( w ) M LE = 1 = 1 15 2 15 2 λ = 1 λ = 2 0.5 p (dog∣the) inter λ = 1 2 1 λ = 2 2 1 p ( dog the ) inter p ( w w ) = inter i i −1 λ p ( w w ) + 1 M LE i i −1 λ p ( w ) 2 M LE i −1 p ( dog the ) = inter λ p ( dog the ) + 1 M LE λ p ( dog ) 2 M LE
9/17/23, 3:42 PM View Submission | Gradescope https://www.gradescope.com/courses/153270/assignments/1424498/submissions/191791563 3/12 , where N = total no. of words in the training corpus So, What is ? Show your work. , where N = total no. of words in the training corpus So, Q1.3 Perplexity 5 Points What is the maximum possible value that the perplexity score can take? What is the minimum possible value it can take? Explain your reasoning and give an example of a training corpus and two test corpora, one that achieves the maximum possible perplexity score and one that achieves the minimum possible perplexity score. (You can do this with a single short sentence for each corpus.) The maximum possible value that the perplexity score can take is 0 i.e., meaning that the perplexity is . And the minimum possible value that the perplexity score can take is 1 i.e., meaning that the perplexity is 1. p ( dog the ) = M LE = c ( the ) c ( the , dog ) 5 1 p ( dog ) = M LE = N c ( dog ) 15 1 p ( dog the ) = inter λ p ( dog the ) + 1 M LE λ p ( the ) 2 M LE = 2 1 + 5 1 2 1 15 1 = 15 2 p (dog∣log) inter p ( dog log ) = inter λ p ( dog log ) + 1 M LE λ p ( dog ) 2 M LE p ( dog log ) = M LE = c ( log ) c ( log , dog ) 0 p ( dog ) = M LE = N c ( dog ) 15 1 p ( dog log ) = inter λ p ( dog log ) + 1 M LE λ p ( dog ) 2 M LE = 2 1 0 + 2 1 15 1 = 30 1 p ( S ) = 0 p ( S ) = 1
9/17/23, 3:42 PM View Submission | Gradescope https://www.gradescope.com/courses/153270/assignments/1424498/submissions/191791563 4/12 Example would be: A language model trained on Shakespearean plays dataset and tested on text messages from teenagers filled with slang and abbreviations will have the highest perplexity meaning perplexity = 0. A language model trained on the movie "Harry Porter" and tested on the same book will have the minimum perplexity since it can perfectly predict each word in the text it was trained on meaning perplexity = 1. Q1.4 Applications 5 Points Authorship identification is an important task in NLP. Can you think of a way to use language models to determine who wrote an unknown piece of text? Explain your idea and how it would work (you don't need to implement it). You must use language modeling the receive credit! Other approaches do not count. A Language Model (LM) is more likely to anticipate the right order of words if the PP (Perplexity Score) is lower. In the area of language modeling, the authorship identification problem can be performed using the same idea. The following steps will be taken to create such an Authorship Identification engine: 1. Data Collection and Pre-processing: Gather training corpora from the writings of numerous writers who are experts in various literary genres. 2. Building and developing one language model (LM) specifically for each author whose works are in the training corpora. These are the n-gram models that are connected to each author. PP = 2 I
9/17/23, 3:42 PM View Submission | Gradescope https://www.gradescope.com/courses/153270/assignments/1424498/submissions/191791563 5/12 3. Author Identification: Compare the corpus with an unknown author to the language Model previously trained for each author. As a result, the Language Model with the lowest PP for this test corpus will probably identify the genuine author of this work. Q2 Sentiment Analysis & Classification 15 Points Q2.1 Naive Bayes 10 Points We have a training corpus consisting of three sentences and their labels: The cat sat in the hat, 1 The dog sat on the log, 1 The fish sat in the dish, 0 Suppose we train a Naive Bayes classifier on this corpus, using maximum likelihood estimation and unigram count features without any smoothing. What are the values of the parameters and for all classes and features ? You can simply list the parameters and their values; no need to show the arithmetic. You can skip parameters with value 0, and you can leave your answers as fractions. Prior Probabilities:- and Conditional Probabilities:- p ( c ) p ( f c ) c f p ( c = 0) = 3 1 p ( c = 1) = 3 2 p ( the c = 1) = = 12 4 3 1 p ( sat c = 1) = = 12 2 6 1 p ( cat c = 1) = 12 1 p ( in c = 1) = 12 1 p ( hat c = 1) = 12 1 p ( dog c = 1) = 12 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help