The Tragedy Of The Titanic Disaster

On the night of April 14, 1912 the world was struck with disaster as the RMS Titanic collided with an iceberg and sunk into the sea. Only 31.6% survived the disaster. In order to understand what factors lead to the highest chances of survival, this project uses a Titanic passenger database and decision trees in order to classify survivors into two groups, survived and perished. Then using this information, rank the most important factors regarding the likelihood of survival. From our experiments we found that the most important factor in surviving the Titanic disaster was title and that a decision tree that created around that variable as its highest level decision had a correct classification rate of 79.425%

1 Introduction

On the night
Additionally, the database also provides the class that the passenger was in, their ticket number, their cabin number, ticket price, and where they embarked. From this information, we hope to understand what factors played and role in survival and which were the most important.

The rest of this paper proceeds as follows: In section 2, this paper discusses the creation of a decision tree. Section 3 describes to experimentation on the Titanic dataset. Section 4 gives the results of the comparison and the conclusions from the project.

2 Creating a Decision Tree

We decided that the best to answer these questions would be to create a decision tree in order to classify which passengers into two categories survived and perished. By creating a decision tree that was able to classify most of the passengers’ statuses correctly, we know which of the factors were the most important due to the fact that the most important factors would be located at the top of the tree. The factors located on lower levels of the tree are less important, due to the fact that they are the deciding factor for survival for a smaller group of people.

To create a decision tree, we begin by cleaning the data that we receive from the database. First, we remove the names of the passengers, due to the fact that we can more easily refer to them by their passenger ids. We also remove the information connected to cabin number, due
