A study reported by Forbes indicated that the five most common words appearing in spam
e-mails are shipping!, today!, here!, available, and fingertips! Many spam filters separate spam from ham (e-mail not considered to be spam) through application of Bayes' theorem. Suppose that for one e-mail account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam e-mail are given below.
shipping! .051
today ! .045
here! .034
available .014
fingertips ! .014
Also suppose that the proportions of ham messages that have these words are
shipping! .0015
today ! .0022
here! .0022
available .041
fingertips ! .041
- If a message includes the word shipping!, what is the
probability the message is spam? If a message includes the word shipping!, what is the probability the message is ham? Should messages that include the word shipping! be flagged as spam? - If a message includes the word today!, what is the probability the message is spam? If a message includes the word here!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- If a message includes the word available, what is the probability the message is spam? If a message includes the word fingertips!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes' theorem to work effectively?
Want to see the full answer?
Check out a sample textbook solutionChapter 4 Solutions
Essentials of Modern Business Statistics with Microsoft Office Excel (Book Only)
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage