assignment-3

.docx

School

Brown University *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

10

Uploaded by CommodoreWater17495

Report
GPHP 2000: Assignment #3: LLN/CLT/CI Sonam Christopher Due:November 15, 2022 1. What does the Law of Large Numbers tell us? ##As you increase the sample size, the sample mean becomes the true population mean.## 2. What does the Central Limit Theorem tell us? ##If you repeat a study over and over again, the means of the studies become normally distributed.## 3. How do the law of large numbers and the Central Limit Theorem work together? ##The LLN tells us a large enough sample size gives us a reasonable estimate of the true mean. The CLT tells us that if we repeated the study over and over that the means would be normally distributed. We then desire to ensure we have a good sample size and apply the CLT for probabilities.## 4. What are 3 reasons we would want to create a confidence interval? ## To test your hypothesis. It can give a range of values and a confidence level range to see if you have “confidence” in your mean to make assumptions on the sample population. ## 5. How do we know whether or not we can get our critical value in a confidence interval from the Z distribution or a t distribution. ##We can know whether or not we can get a critical value in a confidence interval from the T distribution or a Z distribution based on whether we have the variance. To use a Z distribution, you should have a variance or standard deviation to estimate more precisely. ## 6. Generate 1000 random values from a N ( 2 , 4 ) distribution. a. Create and interpret a 90% confidence interval using the Z distribution. b. Create and interpret a 90% confidence interval using the t distribution. library (BSDA) ## Loading required package: lattice ## ## Attaching package: 'BSDA' ## The following object is masked from 'package:datasets': ## ## Orange data <- rnorm ( 1000 , 2 , 4 ) dist_z <- z.test (data, mu = 2 , sigma.x = 2 , conf.level = 0.9 ) dist_z $ conf.int
## [1] 1.704461 1.912520 ## attr(,"conf.level") ## [1] 0.9 dist_t <- t.test (data, mu = 2 , conf.level = 0.9 ) dist_t $ conf.int ## [1] 1.601060 2.015921 ## attr(,"conf.level") ## [1] 0.9 c. How do these compare? ##The confidence interval with the z distribution is (1.899, 2.107), and the confidence interval with the t distribution is (1.796, 2.210). In this instance, it is better to use the z distribution because we have a known variance, so it can make more precise estimates without having to approximate the variance.## Data Problems We will work with the BRFSS data which was used in modules. load ( "~/Desktop/R Studio/course_data (13).RData" ) 7. What is the mean number of poor mental health days ( menthlth )? ##The mean number of poor mental health days is 5.508424 days.## library (dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library (magrittr) brfss2 %>% summarise ( mean (menthlth, na.rm = TRUE )) ## mean(menthlth, na.rm = TRUE) ## 1 5.508428 8. Graph and describe the distribution of poor mental health days. ##This is not a normal distribution for the number of poor mental health days in a 30-day period. We notice that the highest density of individuals reported between 1-5 days of poor mental health in the past 30 days. It reflects the New York City skyline with the highest number of individuals, 800, reporting 2 poor mental health days with spikes at 5, 7, 10, 15, 20, 25, and 30. These numbers reflect time frames and terms that
might have been easier for people to respond to. For example, we have nice round numbers of 5, 10, 15, 20, 25, a week (7), and the whole month (30). library (ggplot2) dmh <- brfss2 %>% filter ( ! ( is.na (menthlth) | menthlth > 30 | menthlth == 0 )) ggplot ( data = dmh, aes ( x= menthlth)) + geom_histogram ( binwidth= 1 ) + xlab ( "Poor Mental Health Days" ) + ylab ( "Number of Individuals" ) 9. Create and interpret a confidence interval for poor mental health days a. Using the t distribution. mn <- mean (brfss2 $ menthlth, na.rm = TRUE ) std.dev <- sd (brfss2 $ menthlth, na.rm = TRUE ) n <- length (brfss2 $ menthlth) low = mn - 2.26 * std.dev / sqrt (n) high = mn + 2.26 * std.dev / sqrt (n) low ## [1] 5.269694 high ## [1] 5.747162
b. Using the bootstrap approach with 1000 bootstraps. library (purrr) ## ## Attaching package: 'purrr' ## The following object is masked from 'package:magrittr': ## ## set_names library (rsample) library (boot) ## ## Attaching package: 'boot' ## The following object is masked from 'package:lattice': ## ## melanoma library (bootstrap) library (dplyr) brfss4 <- brfss2 %>% select (menthlth) set.seed ( 123 ) bt_data <- bootstraps (brfss4, times = 1000 ) bt_data ## # Bootstrap sampling ## # A tibble: 1,000 × 2 ## splits id ## <list> <chr> ## 1 <split [6706/2464]> Bootstrap0001 ## 2 <split [6706/2468]> Bootstrap0002 ## 3 <split [6706/2425]> Bootstrap0003 ## 4 <split [6706/2468]> Bootstrap0004 ## 5 <split [6706/2460]> Bootstrap0005 ## 6 <split [6706/2451]> Bootstrap0006 ## 7 <split [6706/2419]> Bootstrap0007 ## 8 <split [6706/2470]> Bootstrap0008 ## 9 <split [6706/2501]> Bootstrap0009 ## 10 <split [6706/2543]> Bootstrap0010 ## # … with 990 more rows bt_data $ splits[[ 1 ]] ## <Analysis/Assess/Total> ## <6706/2464/6706>
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help