ISYE 7406 HW4
.pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
7406
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
10
Uploaded by EarlResolve98061
ISyE 7406: Data Mining & Statistical Learning
HW#4
INTRODUCTION The goal of this homework is to help better understand the statistical properties and computational challenges of local smoothing such as loess, Nadaraya-Watson (NW) kernel smoothing, and spline smoothing. For this purpose, we will compute empirical bias, empirical variances, and empirical mean square error (MSE) based on m = 1000 Monte Carlo runs, where in each run we simulate a data set of n = 101 observations from the additive noise model Yi = f (xi) + €
i with the famous Mexican hat function f (x) = (1 –
x2) exp(
−
0.5x2), −
2
π
≤
x ≤
2
π
, and €
1, · · · , €
n are independent and identically distributed (iid) N (0, 0.22). This function is known to pose a variety of estimation challenges, and below we explore the difficulties inherent in this function EXPLOTATORY DATA ANALYSIS The x-values are systematically generated as equidistant points between -
2π and 2π, comprising a fixed design of 101 points uniformly spaced at an interval of 0.1256637 units between each point. In the non-equidistant design, x-values are generated between -
2π and 2π, but the distances between them vary in such a way that x[1] - x[2] is not equal to x[2] - x[3], and this pattern continues up to x[101]. Figure 1: Plot of the equidistant design Figure 2: Plot of the non- equidistant design
Two datasets were generated through a Monte Carlo simulation, comprising 1000 runs for each smoothing model. The first dataset consisted of 101 equidistant points, while the second dataset included 101 non-equidistant points randomly generated in R. In each run, the three local smoothing methods (LOESS, NW kernel smoothing, and Spline Smoothing) were applied to the datasets, and the resulting fitted values were recorded. The analysis involved computing and visualizing the empirical bias, variance, and mean squared error (MSE). These investigations aimed to assess the performance and statistical properties of the three smoothing methods in the context of these simulated datasets, which posed a known estimation challenge due to the presence of the Mexican Hat function. METHODOLOGY A Monte Carlo simulation involving 1000 runs for each smoothing model was employed to generate three datasets. The initial model chosen was LOESS, a method that utilizes local smoothing to fit a polynomial surface based on one or more predictors. While cross-validation is typically performed to determine the optimal span, a span of 0.75 was pre-specified for this simulation. In the context of leave-
one-out cross-validation with k-folds, improvements could potentially be made by selecting the model with the lowest root mean square error of prediction (RMSEP). Here's a brief overview of the local smoothing models used: 1. LOESS (Locally Weighted Scatterplot Smoothing): LOESS is a non-parametric regression technique that combines linear regression and local weighted smoothing to fit a smooth curve to a scatterplot. It estimates the value of each data point by fitting a weighted regression model to a local subset of the data, with the weights determined by a kernel function. The level of smoothing is controlled by a smoothing parameter, which governs the size of the local subset. 2. Nadaraya-Watson (NW) Kernel Smoothing: NW kernel smoothing is another non-parametric regression technique that estimates the value of each data point as a weighted average of its neighbors, with the weights determined by a kernel function. The degree of smoothing is controlled by a bandwidth parameter, which determines the size of the neighborhood. NW kernel smoothing is computationally efficient and suitable for high-dimensional data. 3. Spline Smoothing: Spline smoothing is a parametric regression technique that fits a piecewise polynomial function to the data. The degree of the polynomial and the location of knots are determined by a smoothing parameter. Spline smoothing can handle data with complex nonlinear relationships but requires more computation compared to the other two methods. RESULTS AND FINDINGS 1.
Equidistant design The comparison of empirical bias values reveals a significant challenge at x = 0 when compared to other x-values, primarily due to the broader range of response values at this point. In accordance with the bias-
variance trade-off principle, smaller empirical bias values typically coincide with larger empirical variances, and vice versa. Notably, the LOESS estimator outperforms the other two local smoothing methods concerning empirical bias and MSE values, likely due to the choice of a relatively higher span parameter (0.75). This could lead to a degree of over-smoothing.
Conversely, Spline smoothing exhibits superior performance in terms of empirical MSE values, but this advantage may be attributed to its default tuning using generalized cross-validation. In practical scenarios, cross-validation is typically employed to fine-tune model parameters. It's essential to note that the comparison may not be entirely precise since specific tuning values were employed for the other two local smoothing methods, potentially resulting in suboptimal performance due to insufficient parameter tuning. Plots of the equidistant design's fitted mean, empirical bias, empirical variance, and empirical MSE are shown below. Figure 3: Fitted mean Figure 4: Bias
Figure 5: Variance Figure 6: MSE The fitted values for the LOESS estimator with a span of 0.75, NW kernel smoothing using a Gaussian Kernel with a bandwidth of 0.2, and spline smoothing are represented by the black, red, and blue plotted lines, respectively. 2.
Non equidistant design In the presented plots, we explore the empirical bias and mean squared error (MSE) of three distinct smoothing methods: spline smoothing, kernel smoothing, and LOESS, when applied to a dataset featuring the Mexican hat function. This dataset encompasses both equidistant and non-equidistant x values. A notable observation is that x = 0 stands out, exhibiting significantly larger empirical bias and MSE compared to other x values. This highlights the inherent challenge faced by these methods in accurately estimating the function in this specific region. Interestingly, an inverse relationship between empirical bias and empirical variance is evident across all three estimators. When analyzing the non-equidistant dataset, we notice a slightly higher empirical bias in the spline smoothing method in comparison to the equidistant dataset. This may be attributed to potential over-smoothing caused by a relatively larger spar parameter. Conversely, the LOESS model in the non-equidistant setup displays generally smaller empirical bias and MSE than its equidistant counterpart. This improvement is likely linked to the use of a smaller LOESS span, which enhances the local fit and reduces both bias and MSE.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
code with matlab
arrow_forward
Write an expression for the decomposition of selection bias in each of the following cases.
Which is the "worst"?
(a)
(b)
(c)
Sx = 0
X = [0, 1], Six = [0, 0.75], Sox= [0.25, 1], f(x) = 1
X = 2¹
arrow_forward
That is not my question... Here's mine:- Implement this in MATLAB algorithms for: Polynomial Least Squares Regression
- For polynomial curve fitting, examine 1st to 5th polynomial order and determine the right order to be used using the least value of Akaike Information Criterion and Bayesian Information Criterion.
- For the final evaluation of your curve fitting functions, use the Root Mean Square Error and Mean Absolute Error as the final metrics against Data 00.
- Include plots/graphs.
arrow_forward
In R-Studio
Problem 2. Find a large set of integers (with at least 10000 observations) representing a sample drawn from real life (i.e., do not generate the numbers using some random number generator). First, explain what the integers correspond to in real life. Then, investigate the distribution of the integers, and also the first and last digits of the integers in this sample (For example, if the data has [34 65], plot the distribution of these numbers and also distribution of [3,6] and [4 ,5] separately). Does any of these digits follow a uniform distribution? Is this expected? If one of them does not follow a uniform distribution, what distribution does it follow? Can you explain why?
[You can analyze the image age dataset provided by IMDB Wiki, which contains the ages of actors and actresses whose photos are present in the IMDB database. The data contains 459868 age values in years, ranging from 1 to 99 years.]
arrow_forward
Classify the 1’s, 2’s, 3’s for the zip code data in R. (a) Use the k-nearest neighbor classification with k = 1, 3, 5, 7, 15. Report both the training and test errors for each choice. (b) Implement the LDA method and report its training and testing errors. Note: Before carrying out the LDA analysis, consider deleting variable 16 first from the data, since it takes constant values and may cause the singularity of the covariance matrix. In general, a constant variable does not have a discriminating power to separate two classes.
arrow_forward
how to write interpretation for coef for logistic regression in python with these descriptive features and target one which is (CHD)
arrow_forward
MATLAB
arrow_forward
- Please check for my errors. I am still learning.- Implement this in MATLAB algorithms for: Polynomial Least Squares Regression- For polynomial curve fitting, examine 1st to 5th polynomial order and determine the right order to be used using the least value of Akaike Information Criterion and Bayesian Information Criterion.- For the final evaluation of your curve fitting functions, use the Root Mean Square Error and Mean Absolute Error as the final metrics against Data 00.- Include plots/graphs.
close all;clear all;clc;current_script = mfilename('fullpath');script_directory = fileparts(current_script);file_name0 = 'data_00.csv';file_name1 = 'data_01.csv';file_name2 = 'data_02.csv';file_name3 = 'data_03.csv';data0 = csvread([script_directory '\' file_name0]);data1 = csvread([script_directory '\' file_name1]);data2 = csvread([script_directory '\' file_name2]);data3 = csvread([script_directory '\' file_name3]);avg_data = (data1 + data2 + data3) / 3;figure;hold on;plot(avg_data(:, 1),…
arrow_forward
Recall that the dot product of two vectors (1d matrices) produces a scalar value.
The dot product is slightly confusing as the scalar value produced can have an
arbitrary meaning that simply represents the mathematical operations of
multiplying and summing. In other words, the dot product can simply represent
the linear projection of vector onto the number line. This interpretation will be
used repeatedly throughout machine learning as our main goal will be to take
some features dotted with some weights/parameters. Once again, this can be
though of as projecting our features onto a number line where the projection
acts as our prediction! Keep this idea in mind as it might not make complete
sense as of yet.
It also important to know that the dot product can additionally take on other
meanings such as a geometric meaning which represents how similar any two
vectors are when projected onto one another. Meaning, how much one vector
points in the direction of another.
Given the following…
arrow_forward
A researcher conducted computational
thinking course over a period of five days
among a group of beginners, intermediate
and advanced learners of computer science.
At the end of the programme, the researcher
administered a test to determine which group
of students benefited most from the
programme.
Explain how you would select a sample of 10
students for each group from 100 computer
science students using stratified random
sampling.
arrow_forward
2. Use the rbinom() function to generate a random sample of size N = 50 from the bino-
mial distribution Binomial(n, p), with n
6 and p = 0.3. Note that this distribution
has mean u = np and standard deviation o =
Vnp(1 – p). Record the obtained sample
as a vector v. Repeat the tasks of Problem 1 for the sample v.
arrow_forward
code by python with screenshot for code and output
arrow_forward
1
2
3
95
108
110
126
118
102
124
121
145
118
140
155
185
158
8
190
178
9
205
159
10
222
184
The table shows scores of 10 students in Java programming and
Data Science
The estimated Spearman's rank-order coefficient of correlation p
is:
A. r=0.842
B.
r = -0.842
C. r= 0.158
D. r=-0.158
5
Student
6
Java Programming
Data Science
arrow_forward
Please don't copy *Python Pseudocode: read the question good and please answer the question and do not put something nonsense*
Training neural networks requires two steps. In the forward pass we compute the output of the network and the loss given the input, and in the backward pass we compute the derivatives for all the parameters given the loss, and the values computed in the forward pass.
Take a regression network and assume you have some instance, some target value and, and that you are using squared error loss.
Write pseudocode for the forward and backward pass. You may use a separate variable for each node in the network, or store all the values of one layer in a list or similar datastructure.
You can use whatever form of pseudocode seems most appropriate, but the more detailed it is, the better. When in doubt, we suggest making it as close to runnable python code as you can.
arrow_forward
Prove that the following equations are equivalent.
Hint: start by parsing out yy into a function of Jur and yunr.
Bias(y,) = Jur - Ju = Wnr (Jur - JUnr)
arrow_forward
Subject: Design analysis and algorithm
Please Solve this question and explain briefly
arrow_forward
1. Write down an algorithm that can be used to evaluate whether a given sample isfrom a Poisson distribution or not using a Bayesian p-value and a discrepancymeasure T(y, θ)?
arrow_forward
A binary search for the word "science" over a set of 1000 documents returns results in the average search time of 100 ms. A researcher comes up with a new search method with search time of 105 ms and standard deviation (s) of 5 ms. We can conclude that the null hypothesis should be rejected and claim that the new search algorithm is better than binary search.
Is True of False?
arrow_forward
how to use a histrogram to estimate the the size of selection of the form σA<=γ(r)?
arrow_forward
Please do not give solution in image format thanku
For a two-class problem, generate normal samples for two classes with different variances, then use parametric classification to estimate the discriminant points. Compare these with the theoretical values. (using MATLAB or Python)
arrow_forward
let P matrix with dimension (1x6) . write code in matlab to change P dimension to P(2x3) then write code to change it back to P(1x6).
arrow_forward
Correct answer will be upvoted else Multiple Downvoted. Don't submit random answer. Computer science.
You are given a parallel table of size n×m. This table comprises of images 0 and 1.
You can make such activity: select 3 distinct cells that have a place with one 2×2 square and change the images in these cells (change 0 to 1 and 1 to 0).
Your assignment is to make all images in the table equivalent to 0. You are permitted to make all things considered 3nm activities. You don't have to limit the number of activities.
It tends to be demonstrated that it is consistently conceivable.
Input
The principal line contains a solitary integer t (1≤t≤5000) — the number of experiments. The following lines contain portrayals of experiments.
The principal line of the depiction of each experiment contains two integers n, m (2≤n,m≤100).
Every one of the following n lines contains a parallel line of length m, depicting the images of the following column of the table.
It is…
arrow_forward
Write a computer program for Gauss elimination method using C programming language. Decide the number of significant figures yourselves. While writing your program, consider the effects of the number of significant figures, pivoting, scaling and do not forget to check if the system is ill conditioned. Solve an example system using your Gauss elimination method. Measure the time your computer solves the system for programs.
arrow_forward
https://archive.ics.uci.edu/ml/datasets/auto+mpg
by using the data, write a MATLAB code to solve for the simple linear regression formula for each independent variable (e.g., ? = ?1?1, ? = ?2?2, …). Include the calculation for error. without using the built in function
arrow_forward
Suppose you are presented with a large integer N and are asked to find its complete factorisation. You are not told anything at all in advance about how many factors it will have, but you are instructed to use the Pollard Rho method as a probabilistic algorithm of choice. Using code snippets as may be necessary, discuss the overall struc ture of the code you would write. Show clearly where the code calls the Pollard Rho subroutine and any other sub-algorithms you may use.
arrow_forward
Using the MATLAB Histogram function, "hist.m", Illustrate the Central Limit Theorem by taking two different random variables and show how when appropriately scaled and summed, they converge to a Normal (Gaussian) distribution.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Related Questions
- code with matlabarrow_forwardWrite an expression for the decomposition of selection bias in each of the following cases. Which is the "worst"? (a) (b) (c) Sx = 0 X = [0, 1], Six = [0, 0.75], Sox= [0.25, 1], f(x) = 1 X = 2¹arrow_forwardThat is not my question... Here's mine:- Implement this in MATLAB algorithms for: Polynomial Least Squares Regression - For polynomial curve fitting, examine 1st to 5th polynomial order and determine the right order to be used using the least value of Akaike Information Criterion and Bayesian Information Criterion. - For the final evaluation of your curve fitting functions, use the Root Mean Square Error and Mean Absolute Error as the final metrics against Data 00. - Include plots/graphs.arrow_forward
- In R-Studio Problem 2. Find a large set of integers (with at least 10000 observations) representing a sample drawn from real life (i.e., do not generate the numbers using some random number generator). First, explain what the integers correspond to in real life. Then, investigate the distribution of the integers, and also the first and last digits of the integers in this sample (For example, if the data has [34 65], plot the distribution of these numbers and also distribution of [3,6] and [4 ,5] separately). Does any of these digits follow a uniform distribution? Is this expected? If one of them does not follow a uniform distribution, what distribution does it follow? Can you explain why? [You can analyze the image age dataset provided by IMDB Wiki, which contains the ages of actors and actresses whose photos are present in the IMDB database. The data contains 459868 age values in years, ranging from 1 to 99 years.]arrow_forwardClassify the 1’s, 2’s, 3’s for the zip code data in R. (a) Use the k-nearest neighbor classification with k = 1, 3, 5, 7, 15. Report both the training and test errors for each choice. (b) Implement the LDA method and report its training and testing errors. Note: Before carrying out the LDA analysis, consider deleting variable 16 first from the data, since it takes constant values and may cause the singularity of the covariance matrix. In general, a constant variable does not have a discriminating power to separate two classes.arrow_forwardhow to write interpretation for coef for logistic regression in python with these descriptive features and target one which is (CHD)arrow_forward
- MATLABarrow_forward- Please check for my errors. I am still learning.- Implement this in MATLAB algorithms for: Polynomial Least Squares Regression- For polynomial curve fitting, examine 1st to 5th polynomial order and determine the right order to be used using the least value of Akaike Information Criterion and Bayesian Information Criterion.- For the final evaluation of your curve fitting functions, use the Root Mean Square Error and Mean Absolute Error as the final metrics against Data 00.- Include plots/graphs. close all;clear all;clc;current_script = mfilename('fullpath');script_directory = fileparts(current_script);file_name0 = 'data_00.csv';file_name1 = 'data_01.csv';file_name2 = 'data_02.csv';file_name3 = 'data_03.csv';data0 = csvread([script_directory '\' file_name0]);data1 = csvread([script_directory '\' file_name1]);data2 = csvread([script_directory '\' file_name2]);data3 = csvread([script_directory '\' file_name3]);avg_data = (data1 + data2 + data3) / 3;figure;hold on;plot(avg_data(:, 1),…arrow_forwardRecall that the dot product of two vectors (1d matrices) produces a scalar value. The dot product is slightly confusing as the scalar value produced can have an arbitrary meaning that simply represents the mathematical operations of multiplying and summing. In other words, the dot product can simply represent the linear projection of vector onto the number line. This interpretation will be used repeatedly throughout machine learning as our main goal will be to take some features dotted with some weights/parameters. Once again, this can be though of as projecting our features onto a number line where the projection acts as our prediction! Keep this idea in mind as it might not make complete sense as of yet. It also important to know that the dot product can additionally take on other meanings such as a geometric meaning which represents how similar any two vectors are when projected onto one another. Meaning, how much one vector points in the direction of another. Given the following…arrow_forward
- A researcher conducted computational thinking course over a period of five days among a group of beginners, intermediate and advanced learners of computer science. At the end of the programme, the researcher administered a test to determine which group of students benefited most from the programme. Explain how you would select a sample of 10 students for each group from 100 computer science students using stratified random sampling.arrow_forward2. Use the rbinom() function to generate a random sample of size N = 50 from the bino- mial distribution Binomial(n, p), with n 6 and p = 0.3. Note that this distribution has mean u = np and standard deviation o = Vnp(1 – p). Record the obtained sample as a vector v. Repeat the tasks of Problem 1 for the sample v.arrow_forwardcode by python with screenshot for code and outputarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Operations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks Cole
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole