CSE4_574_Project1

.pdf

School

University at Buffalo *

*We aren’t endorsed by this school

Course

474

Subject

Computer Science

Date

Apr 3, 2024

Type

pdf

Pages

2

Uploaded by AgentGull4124

Report
1 Using UB Brightspace Submission : This is a group project for up to two students . Please enroll yourself in Brightspace groups. All project group members must join the same group and submit one solution under your group. CSE4/574 Spring 2024 Introduction to Machine Learning Programming Project 1 Linear Models Due Date: Mar 8 th 2024 Maximum Score: 100 Note: Do not use any Python libraries/toolboxes, built-in functions, or external tools/libraries that directly perform classification, regression or function fitting . Using any external code will result in 0 points for the corresponding problem. Also submit a project report (pdf file) summarizing your findings. In the problem statements below, the portions for REPORT need to be discussed in the project report. Data Set : Use California housing dataset using following code. from sklearn.datasets import fetch_california_housing california_housing = fetch_california_housing(as_frame=True).frame Submission You are required to submit a single file called proj1.zip using Brightspace. File proj1.zip must contain 2 files: report.pdf and script.py . Submit your report in a pdf format. Please indicate the team members on the top of the report. The code file should contain all implemented functions. Problem 1: Exploratory data analysis (5 code + 5 report = 10 Points) Explore and visualize the dataset to understand distribution of features and target variable. Summarize your finding/observations in report.
2 Problem 2: Linear Regression (10 code + 5 report = 15 Points) 1. Implement simple linear regression using a single feature (use MedInc - median income in block group as a feature). 2. Evaluate model performance using appropriate metrics. 3. Plot the errors on train and test data. 4. Summarize your finding/observations in report. Problem 3: Multiple Linear Regression (20 code + 5 report = 25 Points) 1. Extend the model to include multiple features. 2. Compare the performance of multiple linear regression with simple linear regression. 3. Summarize your finding/observations in report. Problem 4: Locally Weighted Linear Regression (25 code + 5 report = 30 points) 1. Choose a Kernel Function: LWLR involves assigning weights to data points based on their distance from the prediction point. You'll need to choose a kernel function (e.g., Gaussian kernel) to determine these weights. 2. Implement Locally Weighted Linear Regression: Create a function that takes a prediction point, the dataset, and the chosen kernel function. For each data point in the dataset, calculate the weight based on its distance from the prediction point using the kernel function. 3. Use these weights to fit a weighted linear regression model to the local data points, giving more influence to nearby observations. 4. Predict the target variable for the given prediction point using the locally fitted model. 5. Tune Hyperparameters: Experiment with different bandwidth (tau) values in the kernel function to control the degree of locality. This hyperparameter determines how much influence nearby points have on the prediction. 6. Evaluate the Model: Evaluate the performance of the LWLR model using appropriate metrics, such as mean squared error or R-squared. 7. Summarize your finding/observations in report. Problem 5: Model Comparison and Selection (10 code + 10 report = 20 points) Using the results obtained from previous 4 problems, make final recommendations for anyone using regression for predicting housing prices using these input features. Compare the various approaches in terms of training and testing error. What metric should be used to choose the best setting?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help