HW 2 UG - Jupally (1)

.docx

School

Roosevelt University *

*We aren’t endorsed by this school

Course

408

Subject

Computer Science

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by CaptainPencilMouse26

HW 2 UG 2. Consider the training examples shown in the given table for a binary classification problem. Customer ID Gender Car Type Shirt Size Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 M M M M M M F F F F M M M M F F F F F F Family Sports Sports Sports Sports Sports Sports Sports Sports Luxury Family Family Family Luxury Luxury Luxury Luxury Luxury Luxury Luxury Small Medium Medium Large Extra Large Extra Large Small Small Medium Large Large Extra Large Medium Extra Large Small Small Medium Medium Medium Large C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 (c) Compute the Gini index for the Gender attribute. (d) Compute the Gini index for the Car Type attribute using multiway split. (e) Compute the Gini index for the Shirt Size attribute using multiway split. (f) Which attribute is better, Gender , Car Type , or Shirt Size ? Solution : R Program : # Create a data frame based on the provided table data <- data.frame( Customer_ID = 1:20, Gender = c('M', 'M', 'M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'F', 'F'),

Car_Type = c('Family', 'Sports', 'Sports', 'Sports', 'Sports', 'Sports', 'Sports', 'Sports', 'Sports', 'Luxury', 'Family', 'Family', 'Family', 'Luxury', 'Luxury', 'Luxury', 'Luxury', 'Luxury', 'Luxury', 'Luxury'), Shirt_Size = c('Small', 'Medium', 'Medium', 'Large', 'Extra Large', 'Extra Large', 'Small', 'Small', 'Medium', 'Large', 'Large', 'Extra Large', 'Medium', 'Extra Large', 'Small', 'Small', 'Medium', 'Medium', 'Medium', 'Large'), Class = c('C0', 'C0', 'C0', 'C0', 'C0', 'C0', 'C0', 'C0', 'C0', 'C0', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1') ) # Define a function to compute Gini index for a given set compute_gini <- function(subset) { # Compute Gini index for a subset total <- nrow(subset) count <- table(subset$Class) gini <- 1 - sum((count / total)^2) return(gini) } # Compute the Gini index for a given attribute using multiway split gini_index_for_attribute <- function(data, col_name) { total <- nrow(data) unique_values <- unique(data[[col_name]]) weighted_gini <- 0 for (value in unique_values) { # Filter data for each unique value of the attribute subset <- data[data[[col_name]] == value, ] weight <- nrow(subset) / total # Add up the weighted Gini index for each value of the attribute weighted_gini <- weighted_gini + weight * compute_gini(subset)

} return(weighted_gini) } # Calculate Gini index for each attribute gini_gender <- gini_index_for_attribute(data, "Gender") gini_car_type <- gini_index_for_attribute(data, "Car_Type") gini_shirt_size <- gini_index_for_attribute(data, "Shirt_Size") # Print out the results cat("Gini for Gender:", gini_gender, "\n") cat("Gini for Car Type:", gini_car_type, "\n") cat("Gini for Shirt Size:", gini_shirt_size, "\n") # Determine the best attribute based on the lowest Gini index attributes <- c("Gender", "Car_Type", "Shirt_Size") ginis <- c(gini_gender, gini_car_type, gini_shirt_size) best_attribute <- attributes[which.min(ginis)] cat("Best attribute is:", best_attribute, "\n") Output : Gini for Gender: 0.48 Gini for Car Type: 0.1625 Gini for Shirt Size: 0.4914286 Best attribute is: Car_Type

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version