Predicting Housing Median Prices. – The file BostonHousing.csv contains information on 506 census tracts in Boston, where for each tract multiple variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV > 30 and 0 otherwise. First, consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 12 columns. Second, consider the goal of classifying the property using the last column of CAT.MEDV. Partition the data into training (60%) and validation (40%) sets. a1. Perform a knn prediction with all 12 predictors (columns 1 – 12) with MEDV (column 13) as the outcome variable. (Ignore the CAT.MEDV column in this step.) Try values of k from 1 to 10. Make sure to normalize the data (preprocess), and choose function knn() from the class package/library rather than FNN. [To make sure R is using class package (when both packages are loaded), use class::knn().] What is the best k? What does it mean?

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
100%
Predicting Housing Median Prices. – The file BostonHousing.csv contains information on 506 census tracts in Boston, where for each tract multiple variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV > 30 and 0 otherwise. First, consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 12 columns. Second, consider the goal of classifying the property using the last column of CAT.MEDV.
Partition the data into training (60%) and validation (40%) sets.
a1. Perform a knn prediction with all 12 predictors (columns 1 – 12) with MEDV (column 13) as the outcome variable. (Ignore the CAT.MEDV column in this step.) Try values of k from 1 to 10. Make sure to normalize the data (preprocess), and choose function knn() from the class package/library rather than FNN. [To make sure R is using class package (when both packages are loaded), use class::knn().] What is the best k? What does it mean?
a2. Perform a knn classification with all 12 predictors, trying various values of k from 1 to 10.
 
b. Predict the MEDV for a tract with the following information, using the best k:
CRIM = 0.2
ZN = 0
INDUS = 7
CHAS = 0
NOX = 0.538
RM = 6
AGE = 62
DIS = 4.7
RAD = 4
TAX = 307
PTRATIO = 21
LSTAT = 10
 
c. If we used the above k-NN algorithm to score the training data, what would be the error of the training set?
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps

Blurred answer
Knowledge Booster
Decision Tree
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education