Methodology of the Naïve Bayes Algorithm. Essay

1534 Words 7 Pages
Methodology
In this chapter we are going to provide more insight into the Naïve Bayes algorithm. The aim is to show how the method works. We will also take a look at how our model will be developed, the various data sets that will be used in the process and how they were chosen. Then we are going to look at feature selection and how it will be applied.

THE NAÏVE BAYES CLASSIFIER

Bayes' rule:

P (E | H) x P (H)
P (H | E) = _________________ P (E)

The fundamental concept of Bayes' rule is that the result of a hypothesis or an event (H) can be calculated based on the presence of some observed evidences (E). From Bayes' rule, we have:
1. A prior probability of H or P(H): This is the probability of an event before observing
…show more content…
Methodology
In this chapter we are going to provide more insight into the Naïve Bayes algorithm. The aim is to show how the method works. We will also take a look at how our model will be developed, the various data sets that will be used in the process and how they were chosen. Then we are going to look at feature selection and how it will be applied.

THE NAÏVE BAYES CLASSIFIER

Bayes' rule:

P (E | H) x P (H)
P (H | E) = _________________ P (E)

The fundamental concept of Bayes' rule is that the result of a hypothesis or an event (H) can be calculated based on the presence of some observed evidences (E). From Bayes' rule, we have:
1. A prior probability of H or P(H): This is the probability of an event before observing the evidence.
2. A posterior probability of H or P(H | E): This is the probability of an event after observing the evidence.
For example to estimate the probability of a mail being classified as belonging to the Human Resources (HR) class, we usually use some evidences such as the frequency of use of words like “Employment”.

Using the equation above, let ‘HR’ be the event of a mail belonging to HR and ‘Employment’ be the evidence of the word Employment in the mail, then we have

P (Employment | HR) x P (HR)
P (HR | Employment) = _____________________
P (Employment)

P (HR | Employment) is the probability that the word Employment occurs in a mail to HR. Of course, “Employment” could occur in many other mail classes such as Joint
Open Document