data mining titanic dataset

2005 Words Sep 17th, 2013 9 Pages
Assessment 4: Titanic dataset

Submitted by:

Submission date


Author: Dated: 29/12/2012


Business objectives:
The database corresponds to the sinking of the titanic on April the 15th 1912. It is part of a database containing the passengers and crew who were aboard the ship, and various attributes correlating to them. The purpose of this task is to apply the methodology of CRISP-DM and follow the phases and tasks of this model. Using the classification method in rapid miner and both the decision tree and KNN algorithms, I will create a training model and try apply the class survived or didn’t survive. If I apply a decision
…show more content…
There is a clear division in classes demonstrated.

This graph answers my other question. What was the ratio of passengers who died, male or female? From this we can see that mainly males did not survive. Although there were more males on board (577), about 460 perished. From the females (314), about 235 survived.
Another attribute that needs attention is the age category. I wanted to find out if the women and children first policy was adhered to, but there are 177 missing age values. This is going to complicate my results on this. From leaving the 177 as they are, I get this graph:

My theory should indicate a higher level of survival amongst the younger age group, but this is not conclusive in Figure 5. I thought that the fare price might indicate a children’s price and therefore allow me to fill in an age, but the fare price doesn’t seem to have much pattern. Another idea I thought might help would be to look at the names of passengers, i.e. miss might signify a lower age. (In 1912 the average age of marriage was 22, so anyone with title miss could have an age less than 22.) Names which include master might indicate a young age as well. Figure 5 also indicates possible outliers on the right hand side.


From this graph I could easily see the breakdown of the different class of passenger and where they
Open Document