asg2

.pdf

School

University of Alberta *

*We aren’t endorsed by this school

Course

151

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

4

Uploaded by JusticeBraveryMandrill39

Report
1 LAB 2 ASSIGNMENT Due: Friday February 16 at 9:59 pm IMPORTANT: 1) In this lab, you will need to use R (or R commander ) to generate the outputs. 2) For all graphs and charts, please label the axes and ensure proper titles are used. 3) For all tables, please ensure the correct variable name(s) are used. 4) Each group will be expected to create a Google document for the lab report where students will type their answers (in full sentences) and paste the R commander output (where necessary) for each lab question. 5) Completed assignments will be saved as a PDF file, submitted, graded, and returned on eClass. 6) Each lab group MUST upload and submit only ONE lab report, so students MUST work together to complete the lab assignment together. 7) Please see the Lab Submission Info tab through the Lab Information link in the Labs section on eClass for details on how to submit your lab report on eClass. SAMPLING DISTRIBUTIONS. CENTRAL LIMIT THEOREM. In this lab assignment, you will use an applet (a small, self-contained program that runs in web pages) to explore various aspects of the sampling distribution of the total points that show up in a simulated dice- rolling experiment. In particular, you will use the applet to demonstrate the central limit theorem in an interactive, dynamic way. You will investigate how the sampling distribution of the total is affected by sample size and the population distribution. Dice Experiment When you roll a standard six-sided fair die numbered 1 through 6 as shown, the number of points showing up follows the probability distribution with the probability of 1/6 assigned to each of 6 possible equally likely outcomes: 1, 2, 3, 4, 5, and 6. Thus, if you were to throw the die a large number of times, the number of ones, twos, threes, ,…and sixes would be approximately the same. In general, the die does not need to be fair and the assignment of probabilities can be different from the one specified above. Consider the experiment of rolling a fair die 100 times successively and observing the sum of points showing up. The sum (or total) is between the minimum 100 (all ones) and the maximum 600 (all sixes). However, in a real experiment what you usually get is likely to be closer to 350 than to one of the two extremes. What is the distribution of the total for the 100 rolls of the die in a very long series of repetitions of the experiment? How will the distribution change when the number of rolls increases? How will the distribution be affected if the die is unfair? You will answer these and also some other questions with theory and simulation. An experiment is conducted using a dice-rolling simulation. The simulation involves rolling a fair 6-sided die n = 100 times and calculating the sum (or total) of all the numbers that were rolled. For example, for the first three rolls, the simulation may produce a value of 5 for the first roll, a value of 6 for the second roll, and a value of 1 for the third roll. The simulation would simultaneously calculate a sum (or total) of 5 after the first roll, a sum of 11 after the second roll (5 + 6), and a sum of 12 after the third roll (5 + 6 + 1). A statistics researcher decided to run this simulation 10,000 times, obtaining and recording the sum after each dice roll. Therefore, each run of the simulation produced a 100 sum values and, altogether, 1,000,000 data values were obtained (10,000 × 100). This data can be found in the data file Lab2 - Q1.txt on eClass.
2 In questions 1-4, you will use the sum of points showing up on the die as the statistic of interest. 1. In this question, you will compare the histograms for the simulated data with the population distribution. In addition, the sample mean and standard deviation of the simulated data are compared with the mean and standard deviation of the population distribution. (a) Obtain a relative frequency histogram of the 10,000 simulated observations for n = 1 with the bins starting at 0.5, ending at 6.5, and using a width of 1. This makes numbers 1 to 6 the middle of the bins. Be sure to click “Percentages” in the option when creating the histogram. Give proper titles and labels to axes. Paste the histograms into your report. Describe the shape of the histogram. Give the population distribution of rolling a fair 6-sided die (in the form of a table) and compare the histogram with the population distribution. (b) Obtain the relative frequency histograms of the 10,000 simulated observations for n = 2, 3, 10, 20, 30, and 100. Recall that the sum of points showing up when the researcher rolled a die n times is between the minimum n (all ones) and the maximum 6 n (all sixes). Make sure numbers n to 6n are in the middle of the bins. To do this, your bins should start at n – 0.5, end at 6 n + 0.5, using a width of 1. Use the partition function, par(mfrow=c(3,2)) , to have all six graphs in one screen. Describe the shape of the histograms. What are the centers of the histograms? ( Hint: for Q1 (b) give an expression in terms of n to describe the centers.) How does increasing sample size n affect the variation of simulated data? How does increasing sample size n affect the shape of the histograms? Give the population distribution of the sum of points showing up for rolling a fair 6- sided die twice (in the form of a table) and compare the histogram for n = 2 with the population distribution. (c) Obtain the mean and the standard deviation (SD) of the simulated data for n = 1, 2, 3, 10, 20, 30, and 100 and fill out the following table ( Hint: Use four decimal places): Statistics n = 1 n = 2 n = 3 n = 10 n = 20 n = 30 n = 100 Mean SD What are the mean and the standard deviation of the population distribution of rolling a fair 6- sided die? Then obtain (theoretically) the mean and standard deviation of the sum of points showing up when rolling a fair 6-sided die n times for n = 1, 2, 3, 10, 20, 30, and 100 and fill the above table. How do the values for simulated data compare to the mean and the standard deviation of the population distribution? How does increasing sample size n affect the mean and standard deviation of the sum of points showing up? (d) Find the relative frequency of observations of at least 360 for n = 100. Then use a normal approximation to find the probability that the sum of points showing up when rolling a fair 6-sided die n = 100 times is at least 360. Compare the relative frequency with the probability obtained from the normal approximation. (e) Consider a game in which a fair 6-sided die is rolled n times where n is even and you bet on the sum of points showing up. If your bet is correct, you win n dollars, otherwise, you lose n dollars. Based on the above results, which number should you bet on? How does increasing sample size n affect the chance of winning the game? What are your answers if you bet on the average of points showing up instead of the sum of points? Explain. 2. Now consider the experiment of rolling an unfair die 100 times successively and observing the sum of points showing up where the chance of showing an even number for the die is twice that of showing an odd number. In fact, P (2) = P (4) = P (6) = 2 P (1) = 2 P (3) = 2 P (5). The researcher decided to run this simulation using the unfair die 10,000 times, obtaining and recording the total sum after each die roll. The data can be found in the data file Lab2 - Q2.txt on eClass. Repeat question 1, parts (a) to (d) for this data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help