A bigram model computes the probability p(D;θ) as: p(D;θ)=p(w0)∏w1,w2∈Dp(w2|w1) where w0 is the first word, and (w1,w2) is a pair of consecutive words in the document. This is also a multinomial model. Assume the vocab size is N. How many parameters are there?

Operations Research : Applications and Algorithms
4th Edition
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Wayne L. Winston
Chapter17: Markov Chains
Section: Chapter Questions
Problem 15RP
icon
Related questions
Question
A bigram model computes the probability p(D;θ) as: p(D;θ)=p(w0)∏w1,w2∈Dp(w2|w1) where w0 is the first word, and (w1,w2) is a pair of consecutive words in the document. This is also a multinomial model. Assume the vocab size is N. How many parameters are there?
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Bayes' Theorem
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Operations Research : Applications and Algorithms
Operations Research : Applications and Algorithms
Computer Science
ISBN:
9780534380588
Author:
Wayne L. Winston
Publisher:
Brooks Cole