Question 3.1. Write a function called simulate_estimates. It should take 4 arguments: • original_df : A DataFrame from which the data should be sampled, with 1 column named 'serial_number'. • sample_size: The size of each sample, an integer. (For example, to do resampling, we would pass the number frows in original_df for this argument.) • statistic: A function that computes a statistic on a sample. This argument is the name of a function that takes a Series of serial numbers as its argument and returns a number (e.g. calculate_mean_based_estimate). • repetitions : The number of repetitions to perform (i.e. the number of resamples to create). should simulate repetitions samples with replacement from the given DataFrame. For each of those samples, should compute the statistic on that sample. Then it should return an array containing the value that statistic for each sample (this means that the length of the returned array should be equal to repetitions). The code below provides an example use of your function and describes how you can verify that you've written it correctly. Check your answer. The histogram you see should be a bell-shaped curve centered at 1000 with most of its mass in [800, 1200]. In [ ]: def simulate_estimates (original_df, sample_size, statistic, repetitions): # Our implementation of this function took 4 to 5 short lines of code. new_statistic_5= np.array([]) for i in range (repetitions): statistic_5= statistic ((original_df.sample (sample_size, replace = True).get('serial_number'))) new_statistic_5 = np.append(new_statistic_5, statistic_5) # This should generate an empirical histogram of twice-mean estimates # of N from samples of size 50 if N is 1000. #Notice that the statistic argument is calculate_mean_based_estimate. example_estimates = simulate_estimates ( bpd. DataFrame().assign (serial_number=np.arange (1, 1000+1)), 50, calculate_mean_based_estimate, 10000) bpd.DataFrame().assign (mean based estimate = example estimates).plot(kind = 'hist', density=True, bins-np.arange (500, 1

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question
Question 3.1. Write a function called simulate_estimates. It should take 4 arguments:
original_df : A DataFrame from which the data should be sampled, with 1 column named 'serial_number'.
• sample_size: The size of each sample, an integer. (For example, to do resampling, we would pass the number of rows in original_df for this
argument.)
statistic: function that computes a statistic on a sample. This argument is the name of a function that takes a Series of serial numbers as its
argument and returns a number (e.g. calculate_mean_based_estimate).
repetitions: The number repetitions to perform (i.e. the number of resamples to create).
It should simulate repetitions samples with replacement from the given DataFrame. For each of those samples, it should compute the statistic on that
sample. Then it should return an array containing the value of that statistic for each sample (this means that the length of the returned array should be equal to
repetitions).
The code below provides an example use of your function and describes how you can verify that you've written it correctly.
Check your answer. The histogram you see should be a bell-shaped curve centered at 1000 with most of its mass in [800, 1200].
In [ ]: def simulate_estimates (original_df, sample_size, statistic, repetitions):
# Our implementation of this function took 4 to 5 short lines of code.
new_statistic_5= np.array([])
for i in range (repetitions):
statistic_5= statistic((original_df.sample (sample_size, replace = True).get('serial_number')))
new_statistic 5 = np.append(new_statistic_5, statistic_5)
# This should generate an empirical histogram of twice-mean estimates
# of N from samples of size 50 if N is 1000.
#Notice that the statistic argument is calculate_mean_based_estimate.
example_estimates = simulate_estimates (
bpd. DataFrame().assign(serial_number=np.arange(1, 1000+1)),
50,
calculate_mean_based_estimate,
10000)
bpd.DataFrame().assign (mean based estimate = example estimates).plot(kind = 'hist', density=True, bins np.arange (500, 1
Transcribed Image Text:Question 3.1. Write a function called simulate_estimates. It should take 4 arguments: original_df : A DataFrame from which the data should be sampled, with 1 column named 'serial_number'. • sample_size: The size of each sample, an integer. (For example, to do resampling, we would pass the number of rows in original_df for this argument.) statistic: function that computes a statistic on a sample. This argument is the name of a function that takes a Series of serial numbers as its argument and returns a number (e.g. calculate_mean_based_estimate). repetitions: The number repetitions to perform (i.e. the number of resamples to create). It should simulate repetitions samples with replacement from the given DataFrame. For each of those samples, it should compute the statistic on that sample. Then it should return an array containing the value of that statistic for each sample (this means that the length of the returned array should be equal to repetitions). The code below provides an example use of your function and describes how you can verify that you've written it correctly. Check your answer. The histogram you see should be a bell-shaped curve centered at 1000 with most of its mass in [800, 1200]. In [ ]: def simulate_estimates (original_df, sample_size, statistic, repetitions): # Our implementation of this function took 4 to 5 short lines of code. new_statistic_5= np.array([]) for i in range (repetitions): statistic_5= statistic((original_df.sample (sample_size, replace = True).get('serial_number'))) new_statistic 5 = np.append(new_statistic_5, statistic_5) # This should generate an empirical histogram of twice-mean estimates # of N from samples of size 50 if N is 1000. #Notice that the statistic argument is calculate_mean_based_estimate. example_estimates = simulate_estimates ( bpd. DataFrame().assign(serial_number=np.arange(1, 1000+1)), 50, calculate_mean_based_estimate, 10000) bpd.DataFrame().assign (mean based estimate = example estimates).plot(kind = 'hist', density=True, bins np.arange (500, 1
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps with 1 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY