Week 3 practice questions

.pdf

School

Skyline College *

*We aren’t endorsed by this school

Course

100

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by BailiffAnt8330

Week 3 practice questions Suppose you have a dataset, named students_data, with 60 rows. There are 4 variables in this dataset: Name: name of the student Sex: male or female Handedness: right or left Extroverted: students’ self-rated score of how extroverted they are, on a scale of 1 - 10, with 10 being the most extroverted 1. What does each row of the dataset represent? [Choose all that apply] a. An observation b. A variable c. A student d. A rating of how extroverted they are 2. If we added a new summary variable named Extroverted_binary to the dataset that categorized whether students’ extroverted scores were above 5 or below 5, how would that affect the number of rows? How would that affect the number of columns? a. It would increase the number of rows, but NOT change the number of columns b. It would decrease the number of rows, but NOT change the number of columns c. It would NOT change the number of rows, but increase the number of columns d. It would NOT change the number of rows, but decrease the number of columns e. Neither the number of rows nor the number of columns will change 3. The table below shows the first six rows of the dataset: Name Sex Handedness Extroverted Adam Male Left 10 Emma Female Right 9 Dave Male Right 4 Joseph Male Right 9 Elizabeth Female Left 1 Amelia Female Right 3 Suppose we run the code below: students_data_new <- filter(students_data, Sex == “male”) Compared to the original students_data , the number of rows in students_data_new would:

a. Increase b. Decrease c. Stay the same d. We cannot know based on the 6 rows of the dataset provided 4. Look at this line of code: students_data$Sex_shuffled <- shuffle(students_data$Sex) a. What is this line of code doing? This line of code randomly reorganizes the Sex variable and puts it into a new variable named Sex_shuffled in the students_data dataset. b. If you know a students’ value on Sex_shuffled, do you think it would help you make a better guess of that students’ Extroverted score? Why or why not? No, knowing a students’ value on Sex_shuffled would not help make a better guess of that students’ Extroverted score, because there is no relationship between Sex_shuffled and Extroverted. 5. We ran the code below. students_data$Sex_shufﬂed <- shufﬂe(students_data$Sex) gf_histogram (~Extroverted,bins=10, data = students_data)%>% gf_facet_grid (Sex_shufﬂed~.)%>% gf_model(Extroverted~Sex_shufﬂed) Here’s a histogram that shows the relationship between Sex_shuffled and Extroverted. It looks like Sex_shuffled predicts some variation in Extroverted. Why do you think this is? Do you think this relationship exists in the DGP? This is purely random. This relationship does not exist in the DGP. 6. Aliya used the shufﬂe() function to shuffle the Extroverted variable from students_data . Each time Aliya shuffled, they calculated the mean difference in Extroverted scores between the female group and the male group. Aliya then repeated this process 1000 times and made a distribution of these 1000 shuffled mean differences. Explain in your own words, where this

distribution will be centered (i.e. what will be the mean of the 1000 shuffled mean differences) and why. The distribution will be centered around 0. Because shuffle breaks the relationship between Sex and Extroverted, each time Aliya shuffles, they will get a case from a DGP where there’s no relationship between the two variables. When there’s no relationship between Sex and Extroverted, I do not expect males and females to be different, so the mean of the 1000 shuffled mean differences should be around 0. Pretest 7. Suppose you run the code below: resampled_students_data <- resample(students_data, 60) a. What is this line of code doing? This line of code samples with replacement 60 observations from students_data. b. Is it possible for the resampled_students_data dataset to have an observation that is not in students_data? Why or why not? No, because we are sampling with replacement from students_data. Every observation we obtained had to exist in the original dataset. c. Is it possible for the resampled_students_data dataset to have duplicate observations (i.e. the same observation sampled twice from students_data)? Yes, because we are sampling with replacement, which means we are putting the sampled observation back into the dataset. d. Suppose you run the code again, are you more likely to get the same dataset, or a different dataset? Why? I’m more likely to get a different dataset, because each time, each observation has an equal chance of being picked, so I’m likely ending up with a different dataset. 8. Let's say you had a data set made up of 9 females and 1 male. If you drew 5 observations with replacement from this data set, is it possible to end up with a sample with 2 males and 3 females? a. Yes b. No 9. We made a histogram to visualize the relationship between Handedness and Extroverted:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version