CS484_IML_Assignment_4_Answer_Key
.docx
keyboard_arrow_up
School
Illinois Institute Of Technology *
*We aren’t endorsed by this school
Course
584
Subject
Computer Science
Date
Dec 6, 2023
Type
docx
Pages
9
Uploaded by ColonelTeamCamel27
CS 484: Introduction to Machine
Learning
Fall Semester 2023 Assignment 4 Answer Key
Question 1 (50 points)
The
Homeowner_Claim_History.xlsx
contains the claim history of 27,513 homeowner policies.
The
following table describes the eleven columns in the HOCLAIMDATA sheet.
Name
Description
Categories
policy
Policy Identifier
exposure
Duration a Policy is Exposed to Risk
Measured in Portion of a Year
num_claims
Number of Claims in a Year
amt_claims
Total Claim Amount in a Year
f_primary_age_tier
Age Tier of Primary Insured
< 21, 21 - 27, 28 - 37, 38 - 60, > 60
f_primary_gender
Gender of Primary Insured
Female, Male
f_marital
Marital Status of Primary Insured
Not Married, Married, Un-Married
f_residence_location
Location of Residence Property
Urban, Suburban, Rural
f_fire_alarm_type
Fire Alarm Type
None, Standalone, Alarm Service
f_mile_fire_station
Distance to Nearest Fire Station
< 1 mile, 1 - 5 miles, 6 - 10 miles, > 10 miles
f_aoi_tier
Amount of Insurance Tier
< 100K, 100K - 350K, 351K - 600K, 601K - 1M, > 1M
We want to predict the
Frequency
which is
number of claims per unit of exposure
using the above
features.
We first divide the reported number of claims by the exposure. This gives the
Frequency
.
Next,
we put the policies into four groups according to their
Frequency
values.
Frequency Group
Frequency Value
0
Frequency = 0
1
0 < Frequency <= 1
2
1 < Frequency <= 2
3
2 < Frequency <= 3
4
3 < Frequency
We will use the above Frequency Group as our target variable which has four levels.
After dropping the missing target values, we will divide the observations into the training and the testing
partitions.
Observations whose Policy Identifier starts with the letters A, G, and P will go to the training
partition.
The remaining observations go to the testing partition.
Page 1
CS 484: Fall Semester 2023 Assignment 4 Answer Key
Since we have sufficient computing resources, we will train multinomial logistic models for all the
possible subsets of combinations of the seven categorical predictors, namely,
f_aoi_tier
,
f_fire_alarm_type
,
f_marital
,
f_mile_fire_station
,
f_primary_age_tier
,
f_primary_gender
, and
f_residence_location
.
All models must include the Intercept term.
To help us select our “optimal”
model, we will calculate the AIC and the BIC criteria of the Training partition, the Accuracy of the Testing
partition, and the Root Average Squared Error of the Testing partition.
The string predictor
f_fire_alarm_type
contains the word “None”.
Unfortunately, Pandas read it as NaN.
Therefore, we need to call fillna() function to replace the NaN back to the word “None”.
(a)
(10 points) How many policies are in each of the four groups in the Training partition? Also, in the
Testing partition?
Partition
Training
Testing
Number of Policy
20,661
6,852
(b)
(10 points) What is the lowest AIC value on the Training partition?
Also, which model produces that
AIC value?
The lowest AIC value on the Training partition is 47836.1176.
The model that produces this AIC value is:
Intercept + f_aoi_tier + f_fire_alarm_type + f_mile_fire_station + f_primary_age_tier +
f_residence_location.
(c)
(10 points) What is the lowest BIC value on the Training partition?
Also, which model produces that
BIC value?
The lowest BIC value on the Training partition is 48344.0218.
The model that produces this BIC value is:
Intercept + f_aoi_tier + f_fire_alarm_type + f_mile_fire_station + f_primary_age_tier +
f_residence_location.
(d)
(10 points) What is the highest Accuracy value on the Testing partition?
Also, which model produces
that Accuracy value?
Page 2
CS 484: Fall Semester 2023 Assignment 4 Answer Key
The highest Accuracy value on the Testing partition is 0.5641.
The model that produces this accuracy value is:
Intercept + f_aoi_tier + f_fire_alarm_type + f_marital + f_mile_fire_station + f_primary_age_tier.
(e)
(10 points) What is the lowest Root Average Squared Error value on the Testing partition?
Also,
which model produces that RASE value?
The lowest Root Average Squared Error value on the Testing partition is 0.3430.
The model that produces this RASE value is:
Intercept + f_aoi_tier + f_fire_alarm_type + f_marital + f_mile_fire_station + f_primary_age_tier +
f_residence_location.
Page 3
CS 484: Fall Semester 2023 Assignment 4 Answer Key
Question 2 (50 points)
The Center for Machine Learning and Intelligent Systems at the University of California, Irvine manages
the Machine Learning Repository (
https://archive.ics.uci.edu/ml/index.php
).
We will use two of the
datasets in the repository for analyses, namely, the
WineQuality_Train.csv
for training and the
WineQuality_Test.csv
for testing.
The categorical target variable is
quality_grp
.
It has two categories, namely, 0 and 1.
The input features
are
alcohol
,
citric_acid
,
free_sulfur_dioxide
,
residual_sugar
, and
sulphates
.
These five input features are
considered interval variables.
We will train a Multi-Layer Perceptron neural network with the following specifications.
1.
Perform a grid search to select the most desired network structure.
2.
The maximum number of iterations is 10000.
3.
The random seed is 2023484.
4.
Try all the
Hyperbolic Tangent
, the
Identity
, and the
Linear Rectifier
activation functions.
5.
Try the number of layers from 1 to 10 inclusively with an increment of 1.
6.
Try the common number of neurons per layer from 2 to 10 inclusively with an increment of 2.
We will predict an observation with
quality_grp
of 1 if Prob(
quality_grp
= 1)
1.5
c
where
c
is
the proportion of observations where
quality_grp
= 1 in the training partition. Otherwise, the predicted
quality_grp
is 0.
(a)
(10 points). What is the proportion of observations where
quality_grp
= 1 in the training partition?
The proportion of observations where quality_grp = 1 in the training partition is 0.1962.
(b)
(10 points). What is the proportion of observations where
quality_grp
= 1 in the testing partition?
The proportion of observations where
quality_grp
= 1 in the testing partition is 0.1974.
(c)
(10 points). Show your grid search results in a table.
The table should contain (1) the activation
function type, (2) the number of layers, (3) the common number of neurons per layer, (4) the
number of iterations performed (
n_iter_
attribute), (5) the best loss value (
best_loss_
attribute), (6)
the root average squared error of the testing partition, (7) the misclassification rate of the testing
partition, and (9) the elapsed time in seconds.
Page 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
EXPERIMENTAL PROCEDURE
Patients visit the hospital and their visit history is maintained by the hospital staff. Different physicians may be available on different dates. They diagnose and treat the patients of all categories. Some of treatments are free while others are to be paid by the patients. Sample data of the case is shown in the following chart.
Patient History Report
PatientID
Name
Address
Visit Date
Physician
Diagnosis
Treatment
P-100809
A
City: X
12-02-2007
20-02-2007
29-02-2007
15-03-2007
Dr. Z
Dr. F
Dr. R
Dr. L
Chest Infection
Cold
Hepatitis-A
Eyes Infection
Free
Free
Paid
Paid
P-200145
N
City: Y
10-01-2007
15-02-2007
25-03-2007
Dr. L
Dr. K
Dr. A
Bone Fracture
Cough
Flu
Paid
Free
Free
Task 1
Draw a dependency diagram and transform the above data to first normal form by eliminating repeating groups such that each row in the relation is atomic. Be sure to create an appropriate name for the…
arrow_forward
The database has three tables for tracking horse-riding lessons:
Horse with columns:
ID - primary key
RegisteredName
Breed
Height
BirthDate
Student with columns:
ID - primary key
FirstName
LastName
Street
City
State
Zip
Phone
EmailAddress
LessonSchedule with columns:
HorseID - partial primary key, foreign key references Horse(ID)
StudentID - foreign key references Student(ID)
LessonDateTime - partial primary key
Write a SELECT statement to create a lesson schedule for Feb 1, 2020 with the lesson date/time, student's first and last names, and the horse's registered name. Order the results in ascending order by lesson date/time, then by the horse's registered name. Make sure unassigned lesson times (student ID is NULL) appear in the results.
arrow_forward
Please refer to the following diagram for Q19 to Q20.
PERSON
PK P_ID
P_LNAME
P_FNAME
ΕMPLOYΕ
P_ID
EMP_HIRE_DATE
STUDENT
P ID
STUDY ΜAIOR
PK, FK1
PK, FK1
19.
Which of the following statements is/are true?
(1) A Person must be either a Student or an Employee.
(2) A Person can neither be a Student nor an Employee.
(3) A Person must be a Student or an Employee or both.
a. (1) only
b. (3) only
c. (1) and (2) only
d. (1), (2) and (3).
arrow_forward
SQL Help
arrow_forward
Alternate keys: Identify at least five keys (not already listed as PK or FK) needed by end users.
These indexes would be considered Alternate or Secondary keys and are mostly used for queries
and quick reporting. They may contain multiple columns.
arrow_forward
herpever applicable 2. Title field in Instructor table should allow only Mr., Ms., Mrs., Dr., Prof. as values apply check constraints
arrow_forward
Database Schema
The schema for the Ch07_FACT database is shown below and should be used to answer the
next several problems. Click this image to view it in its own tab.
FIGURE P7.56 THE CH07_FACT ERD
CHECKOUT
PATRON
PK
Check Num
PK Pat ID
FK1 Book_Num
FK2 Pat_ID
Check_Out_Date
Check_Due_Date
Check_In_Date
>0-----H-
Pat_FName
Pat LName
Pat_Type
BOOK
AUTHOR
PK
Book_Num
PK Au ID
Book_Title
Book_Year
Book_Cost
Book_Subject
FK1 Pat_ID
Au_FName
Au_LName
Au_BirthYear
WRITES
PK,FK1 Book Num
PK,FK2 Au ID
The CIS Department at Tiny College maintains the Free Access to Current Technology (FACT)
library of e-books. FACT is a collection of current technology e-books for use by faculty and
students. Agreements with the publishers allow patrons to electronically check out a book,
arrow_forward
SQL Database help
arrow_forward
1. The data type for Borrower_name in Borrowers Table is:
A. Short text
C. Currency
B. Long text
D. Yes/No
Borrowers
Field Name
Borrower ID
Borrower Name
Phone num
Membership_activation
Data Type
AutoNumber
Short Text
Short Text
Yes/No
arrow_forward
C sharp
Table: Student (the headers are the field names in the Students table)
StudentID
Name
Age
Gender
ProgramID
791
Stephanie Brown
19
Female
BCS
236
Shannon Dawn
25
Female
BA
618
Geoff Berg
24
Male
ARET
256
Andrew Schilling
22
Male
BSC
902
Gary Sang
23
Male
DAAD
Note: There is a StudentDataSet with a Student table, a StudentTableAdapter, a StudentBindingSource, and a StudentDataGridView control on the form.Note: There is an Average query, named Average, that returns the average age of the student from the Student table.Note: There is also Max query, named Highest, that returns the highest age of the student from the Student table.Write the code you would place in the AverageButton click event on your form to call the Average query and the Highest query and display in a DifferenceLabel, the difference between the highest age of a student and the average age.
arrow_forward
Refer to the dimension tables and EmployeeFact table below.
Job
Date
Location
EmployeeFact
•JobID
•DatelD
•LocationID
• ( OEmployeelD
Title
FullDate
StreetAddress
oJobID
JobCode
DayNumber
City
PostalCode
OLocationID
LevelCode
MonthNumber
oHireDatelD
OBirthDatelD )
SalaryAmount
BonusAmount
MonthName
State
Year
CountryCode
EducationYears
PerformanceRating
The following query reports the employee with the highest salary amount per postal code
for employees with level code 8. Complete the missing values.
SELECT CountryCode, EmployeeID, MAX(_
_(A)_
FROM
(B).
Location, Job
WHERE EmployeeFact.LocationID = Location.LocationID
AND EmployeeFact.JobID =
(C).
AND
= 8
GROUP
(A) Ex: Identifier
(B)
(C)
arrow_forward
Count the number of different professors names.
arrow_forward
Normalise the data below to third normal form (3NF).
arrow_forward
Task 1: '
The Car Maintenance team wanted to ensure that the default price of the maintenance actions should not be empty and 0 instead if not specified. Alter the MAINTENANCE_TYPES table created in Chapter 8, Activity 1 to set the default MAINTENANCE_PRICE to 0.
ANSWER IN MYSQL PLEASE
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Related Questions
- EXPERIMENTAL PROCEDURE Patients visit the hospital and their visit history is maintained by the hospital staff. Different physicians may be available on different dates. They diagnose and treat the patients of all categories. Some of treatments are free while others are to be paid by the patients. Sample data of the case is shown in the following chart. Patient History Report PatientID Name Address Visit Date Physician Diagnosis Treatment P-100809 A City: X 12-02-2007 20-02-2007 29-02-2007 15-03-2007 Dr. Z Dr. F Dr. R Dr. L Chest Infection Cold Hepatitis-A Eyes Infection Free Free Paid Paid P-200145 N City: Y 10-01-2007 15-02-2007 25-03-2007 Dr. L Dr. K Dr. A Bone Fracture Cough Flu Paid Free Free Task 1 Draw a dependency diagram and transform the above data to first normal form by eliminating repeating groups such that each row in the relation is atomic. Be sure to create an appropriate name for the…arrow_forwardThe database has three tables for tracking horse-riding lessons: Horse with columns: ID - primary key RegisteredName Breed Height BirthDate Student with columns: ID - primary key FirstName LastName Street City State Zip Phone EmailAddress LessonSchedule with columns: HorseID - partial primary key, foreign key references Horse(ID) StudentID - foreign key references Student(ID) LessonDateTime - partial primary key Write a SELECT statement to create a lesson schedule for Feb 1, 2020 with the lesson date/time, student's first and last names, and the horse's registered name. Order the results in ascending order by lesson date/time, then by the horse's registered name. Make sure unassigned lesson times (student ID is NULL) appear in the results.arrow_forwardPlease refer to the following diagram for Q19 to Q20. PERSON PK P_ID P_LNAME P_FNAME ΕMPLOYΕ P_ID EMP_HIRE_DATE STUDENT P ID STUDY ΜAIOR PK, FK1 PK, FK1 19. Which of the following statements is/are true? (1) A Person must be either a Student or an Employee. (2) A Person can neither be a Student nor an Employee. (3) A Person must be a Student or an Employee or both. a. (1) only b. (3) only c. (1) and (2) only d. (1), (2) and (3).arrow_forward
- SQL Helparrow_forwardAlternate keys: Identify at least five keys (not already listed as PK or FK) needed by end users. These indexes would be considered Alternate or Secondary keys and are mostly used for queries and quick reporting. They may contain multiple columns.arrow_forwardherpever applicable 2. Title field in Instructor table should allow only Mr., Ms., Mrs., Dr., Prof. as values apply check constraintsarrow_forward
- Database Schema The schema for the Ch07_FACT database is shown below and should be used to answer the next several problems. Click this image to view it in its own tab. FIGURE P7.56 THE CH07_FACT ERD CHECKOUT PATRON PK Check Num PK Pat ID FK1 Book_Num FK2 Pat_ID Check_Out_Date Check_Due_Date Check_In_Date >0-----H- Pat_FName Pat LName Pat_Type BOOK AUTHOR PK Book_Num PK Au ID Book_Title Book_Year Book_Cost Book_Subject FK1 Pat_ID Au_FName Au_LName Au_BirthYear WRITES PK,FK1 Book Num PK,FK2 Au ID The CIS Department at Tiny College maintains the Free Access to Current Technology (FACT) library of e-books. FACT is a collection of current technology e-books for use by faculty and students. Agreements with the publishers allow patrons to electronically check out a book,arrow_forwardSQL Database helparrow_forward1. The data type for Borrower_name in Borrowers Table is: A. Short text C. Currency B. Long text D. Yes/No Borrowers Field Name Borrower ID Borrower Name Phone num Membership_activation Data Type AutoNumber Short Text Short Text Yes/Noarrow_forward
- C sharp Table: Student (the headers are the field names in the Students table) StudentID Name Age Gender ProgramID 791 Stephanie Brown 19 Female BCS 236 Shannon Dawn 25 Female BA 618 Geoff Berg 24 Male ARET 256 Andrew Schilling 22 Male BSC 902 Gary Sang 23 Male DAAD Note: There is a StudentDataSet with a Student table, a StudentTableAdapter, a StudentBindingSource, and a StudentDataGridView control on the form.Note: There is an Average query, named Average, that returns the average age of the student from the Student table.Note: There is also Max query, named Highest, that returns the highest age of the student from the Student table.Write the code you would place in the AverageButton click event on your form to call the Average query and the Highest query and display in a DifferenceLabel, the difference between the highest age of a student and the average age.arrow_forwardRefer to the dimension tables and EmployeeFact table below. Job Date Location EmployeeFact •JobID •DatelD •LocationID • ( OEmployeelD Title FullDate StreetAddress oJobID JobCode DayNumber City PostalCode OLocationID LevelCode MonthNumber oHireDatelD OBirthDatelD ) SalaryAmount BonusAmount MonthName State Year CountryCode EducationYears PerformanceRating The following query reports the employee with the highest salary amount per postal code for employees with level code 8. Complete the missing values. SELECT CountryCode, EmployeeID, MAX(_ _(A)_ FROM (B). Location, Job WHERE EmployeeFact.LocationID = Location.LocationID AND EmployeeFact.JobID = (C). AND = 8 GROUP (A) Ex: Identifier (B) (C)arrow_forwardCount the number of different professors names.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage