EBK MODERN BUSINESS STATISTICS WITH MIC
EBK MODERN BUSINESS STATISTICS WITH MIC
5th Edition
ISBN: 9780100475038
Author: williams
Publisher: YUZU
bartleby

Concept explainers

bartleby

Videos

Textbook Question
Book Icon
Chapter 4, Problem 60SE

The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips! (Andy Greenberg, “The Most Common Words In Spam Email,” Forbes website, March 17, 2010). Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes’ theorem. Suppose that for one email account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.

shipping! .051
today! .045
here! .034
available .014
fingertips! .014

Also suppose that the proportions of ham messages that have these words are

shipping! .0015
today! .0022
here! .0022
available .0041
fingertips! .0011
  1. a. If a message includes the word shipping!, what is the probability the message is spam? If a message includes the word shipping!, what is the probability the message is ham? Should messages that include the word shipping! be flagged as spam?
  2. b. If a message includes the word today!, what is the probability the message is spam? If a message includes the word here!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
  3. c. If a message includes the word available, what is the probability the message is spam? If a message includes the word fingertips!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
  4. d. What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively?

a.

Expert Solution
Check Mark
To determine

Compute the probability that the message is spam, given that messages included the word “shipping”. Also, compute the probability that the message is ham, given that messages included the word “shipping”. Check whether messages that contain the word “shipping” be flagged as spam.

Answer to Problem 60SE

The probability that the message is spam, given that messages included the word “shipping”, is 0.791.

 The probability that the message is ham, given that messages included the word “shipping”, is 0.209.

The message that contains the word “shipping” should be flagged as spam.

Explanation of Solution

Calculation:

The given data contain a proportion of spam messages and ham messages.

Bayes Theorem (Two-Event Case):

P(F|D)=P(F)×P(D|F)P(F)×P(D|F)+P(M)×P(D|M)

Here,

P(spam|shipping)=P(spam)×P(shipping!|spam)P(spam)×P(shipping!|spam)+P(ham)×P(shipping!|ham)

P(ham|shipping)=P(ham)×P(shipping!|ham)P(spam)×P(shipping!|spam)+P(ham)×P(shipping!|ham)

From data,

P(spam)=0.10,P(shipping!|spam)=0.051,P(ham)=0.90,andP(shipping!|ham)=0.0015 

Substitute these values in Bayes’ theorem.

Therefore,

P(spam|shipping)=(0.10)×(0.051)(0.10)×(0.051)+(0.90)×(0.0015)=0.00510.0051+0.00135=0.00510.00645=0.791

Thus, the probability that the message is spam, given that message included the word “shipping”, is 0.791.

P(ham|shipping)=(0.90)×(0.0015)(0.10)×(0.051)+(0.90)×(0.0015)=0.001350.0051+0.00135=0.001350.00645=0.209

Thus, the probability that the message is ham, given that messages included the word “shipping”, is 0.209.

Here, the probability that the message is spam, given that messages included the word “shipping”, is high compared to that of the message being ham. Therefore, the message that contains the word “shipping” should be flagged as spam.

b.

Expert Solution
Check Mark
To determine

Compute the probability that the message is spam, given that messages included the word “Today”. Also, compute the probability that the message is spam, given that the message included the word “here”. Identify which among the words is a stronger indicator of spam. Explain the answer.

Answer to Problem 60SE

The probability that the message is spam, given that messages included the word “Today”, is 0.694.

The probability that the message is spam, given that messages included the word “here”, is 0.632.

Explanation of Solution

Calculation:

Bayes’ Theorem (Two-Event Case):

P(F|D)=P(F)×P(D|F)P(F)×P(D|F)+P(M)×P(D|M)

Here,

P(spam|today!)=P(spam)×P(today!|spam)P(spam)×P(today!|spam)+P(ham)×P(today!|ham)

P(spam|here!)=P(spam)×P(here!|spam)P(spam)×P(here!|spam)+P(ham)×P(here!|ham)

From data,

P(spam)=0.10,P(here!|spam)=0.034,P(ham)=0.90,P(today!|spam)=0.045andP(here!|ham)=0.0022,P(today!|ham)=0.0022.

Substitute these values in Bayes’ theorem.

Therefore,

P(spam|here!)=(0.10)×(0.045)(0.10)×(0.045)+(0.90)×(0.0022)=0.00450.0045+0.00198=0.00450.00648=0.694

Thus, the probability that the message is spam, given that messages included the word “Today”, is 0.694.

P(spam|here!)=(0.10)×(0.034)(0.10)×(0.034)+(0.90)×(0.0022)=0.00340.0034+0.00198=0.00340.00538=0.632

Thus, the probability that the message is spam, given that the message included the word “here”, is 0.632.

Here, the message is spam, given that messages included the word “Today” are high compared to those containing the word “here”. Therefore, the word “Today” is an indication of a stronger indicator of spam.

c.

Expert Solution
Check Mark
To determine

Compute the probability that the message is spam, given that messages included the word “available”. Also, compute the probability that the message is spam, given that messages included the word “fingertips”. Compute the probability that the message is “spam”. Identify which of the word is a stronger indicator of spam. Explain the answer.

Answer to Problem 60SE

The probability that the message is spam, given that message included the word “available”, is 0.275.

The probability that the message is spam, given that message included the word “fingertips”, is 0.586.

Explanation of Solution

Calculation:

Bayes’ Theorem (Two-Event Case):

P(F|D)=P(F)×P(D|F)P(F)×P(D|F)+P(M)×P(D|M)

Here,

P(spam|fingertips!)=P(spam)×P((spam|fingertips!)!|spam)P(spam)×P(ingertips!|spam)+P(ham)×P(ingertips!|ham)

P(spam|available!)=P(spam)×P(available!|spam)P(spam)×P(available!|spam)+P(ham)×P(available!|ham)

From data,

P(spam)=0.10,P(available!|spam)=0.014,P(ham)=0.90,P(fingertip!|spam)=0.014andP(available!|ham)=0.0041,P(fingertip!|ham)=0.0011

Substitute these values in Bayes’ theorem.

Therefore,

P(spam|available!)=(0.10)×(0.014)(0.10)×(0.014)+(0.90)×(0.0041)=0.00140.0014+0.00369=0.00140.00509=0.275

Thus, the probability that the message is spam, given that message included the word “available”, is 0.275.

P(spam|fingertips!)=(0.10)×(0.034)(0.10)×(0.034)+(0.90)×(0.0041)=0.00340.0034+0.00369=0.00340.00709=0.586

Thus, the probability that the message is spam, given that message included the word “fingertips”, is 0.586.

Here, the message is spam, given that message included the word “fingertips” is high compared to that containing the word “available”. Therefore, the word “fingertips” is an indication of a stronger indicator of spam.

d.

Expert Solution
Check Mark
To determine

Explain what insight does the result of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively.

Explanation of Solution

From part (b), it is clear that it is easier to distinguish spam from ham in a message that includes the word “today”.

From part (c), it is clear that it is more difficult to distinguish spam from ham in a message that includes the word “available”.

Therefore, it is easier to distinguish spam from ham when the word occurs more often in unwanted messages or less often in legitimate messages.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Chapter 4 Solutions

EBK MODERN BUSINESS STATISTICS WITH MIC

Ch. 4.1 - Prob. 11ECh. 4.1 - 12. The Powerball lottery is played twice each...Ch. 4.1 - 13. A company that manufactures toothpaste is...Ch. 4.2 - 14. An experiment has four equally likely...Ch. 4.2 - 15. Consider the experiment of selecting a playing...Ch. 4.2 - 16. Consider the experiment of rolling a pair of...Ch. 4.2 - 17. Refer to the KP&L sample points and sample...Ch. 4.2 - Prob. 18ECh. 4.2 - Prob. 19ECh. 4.2 - 20. Junior Achievement USA and the Allstate...Ch. 4.2 - 21. Data on U.S. work-related fatalities by cause...Ch. 4.3 - 22. Suppose that we have a sample space with five...Ch. 4.3 - Prob. 23ECh. 4.3 - Prob. 24ECh. 4.3 - 25. The Eco Pulse survey from the marketing...Ch. 4.3 - Prob. 26ECh. 4.3 - Prob. 27ECh. 4.3 - 28. A survey of magazine subscribers showed that...Ch. 4.3 - 29. High school seniors with strong academic...Ch. 4.4 - 30. Suppose that we have two events, A and B, with...Ch. 4.4 - 31. Assume that we have two events, A and B, that...Ch. 4.4 - Prob. 32ECh. 4.4 - Students taking the Graduate Management...Ch. 4.4 - Prob. 34ECh. 4.4 - Prob. 35ECh. 4.4 - 36. Jamal Crawford of the National Basketball...Ch. 4.4 - 37. A joint survey by Parade magazine and Yahoo!...Ch. 4.4 - 38. The Institute for Higher Education Policy, a...Ch. 4.5 - 39. The prior probabilities for events A1 and A2...Ch. 4.5 - 40. The prior probabilities for events A1, A2, and...Ch. 4.5 - 41. A consulting firm submitted a bid for a large...Ch. 4.5 - Prob. 42ECh. 4.5 - 43. In August 2012, tropical storm Isaac formed in...Ch. 4.5 - Prob. 44ECh. 4.5 - Prob. 45ECh. 4 - 46. A survey of adults aged 18 and older conducted...Ch. 4 - Prob. 47SECh. 4 - Below are the results of a survey of 1364...Ch. 4 - 49. A study of 31,000 hospital admissions in New...Ch. 4 - 50. A telephone survey to determine viewer...Ch. 4 - Prob. 51SECh. 4 - 52. An MBA new-matriculants survey provided the...Ch. 4 - Prob. 53SECh. 4 - 54. In February 2012, the Pew Internet & American...Ch. 4 - 55. A large consumer goods company ran a...Ch. 4 - Prob. 56SECh. 4 - 57. A company studied the number of lost-time...Ch. 4 - Prob. 58SECh. 4 - 59. An oil company purchased an option on land in...Ch. 4 - 60. The five most common words appearing in spam...
Propositional Logic, Propositional Variables & Compound Propositions; Author: Neso Academy;https://www.youtube.com/watch?v=Ib5njCwNMdk;License: Standard YouTube License, CC-BY
Propositional Logic - Discrete math; Author: Charles Edeki - Math Computer Science Programming;https://www.youtube.com/watch?v=rL_8y2v1Guw;License: Standard YouTube License, CC-BY
DM-12-Propositional Logic-Basics; Author: GATEBOOK VIDEO LECTURES;https://www.youtube.com/watch?v=pzUBrJLIESU;License: Standard Youtube License
Lecture 1 - Propositional Logic; Author: nptelhrd;https://www.youtube.com/watch?v=xlUFkMKSB3Y;License: Standard YouTube License, CC-BY
MFCS unit-1 || Part:1 || JNTU || Well formed formula || propositional calculus || truth tables; Author: Learn with Smily;https://www.youtube.com/watch?v=XV15Q4mCcHc;License: Standard YouTube License, CC-BY