final_report-1

pdf

School

University of Minnesota-Twin Cities *

*We aren’t endorsed by this school

Course

3811

Subject

Sociology

Date

Jan 9, 2024

Type

pdf

Pages

5

Uploaded by katecake123

Report
SOC 3811: Social Statistics Final Assignment (Due: Dec-17-23; 11:59 PM) Your final project will, once again, place you in the role of “Lead Data Scientist” at the The Missouri Institute of Social Research (M-ISR). This time around though, you’ll be going national: The United States Centers for Disease Control and Prevention (CDC) has noticed a troubling uptick in high blood pressure among young adults across the country — and is desperate for answers. In this project, you’ll serve as a consultant, helping the CDC understand how to draw insights about blood pressure levels among all young adults across the country, using just a sample. This task will give you the opportunity to apply and demonstrate inferential statistical tools in a real world setting. Assignments should be uploaded to Canvas, no later than 11:59 PM on Dec. 17th. Overview As a narrative background, The United States Centers for Disease Control and Prevention (CDC) saw your careful report describing the social factors that shaped unequal life expectancy across Missouri communities. Policymakers at The CDC were so impressed by your analysis that they’ve asked you to serve as a consultant for a project that they’ve been working on. In more detail, the CDC has collected data from nearly 5,000 U.S. adults between the ages of 25 and 34 years old. In these data, they’ve noticed a troubling trend: compared to past years, these data seem to suggest that blood pressure is on the rise among young adults. They’ve conducted some data analyses to better understand this phenomena, but could use your help in a few key areas to better understand what their sample data suggest about blood pressure levels among the entire population of young adults in the US. Below are a number of questions and concerns that the CDC could use your expert guidance on. Your task is to provide responses to each of these questions to help officials better understand the prevalence and determinants of systolic blood pressure levels among the U.S. population of young adults (i.e., people between 25-34 years old). As always, remember that your audience is not comprised of statistical experts! Indeed, your aim should be to produce detailed responses to each of the CDC’s questions, in a way that almost anyone can understand! Instructions On the Canvas website, I have provided you with the CDC’s sample of 4,693 young adults between the ages of 25 and 34 years old. These data are located in a .csv file called ah data on the Canvas website, under the Files/Data tab. The outcome variable sbp measures each individual in the sample’s systolic blood pressure level. For context, the CDC provides you with the following fact sheet on systolic blood pressure and what different levels of this outcome 1
indicate about cardiovascular health: https://bit.ly/3nsO0ZM . The overarching idea is that higher systolic blood pressure levels indicate worse cardiovascular health. Please use these data to address each of the following questions: 1. The CDC wants you to first provide a descriptive analysis of how blood pressure levels are distributed among the nearly 4,693 young adults in their sample. To accomplish this, please provide a histogram and the median, range, and standard deviation of the sbp variable. Combine these descriptive statistical tools into a clear and intuitive narrative that helps CDC officials understand what systolic blood pressure looks like across the young adults within their sample. 2. Next, the CDC is interested in understanding if and how the statistics in their sample describe the population of young adults in the U.S. To assist them in this, calculate the population mean and 95% confidence interval of systolic blood pressure from the ah data sample. Describe what these statistics tell us about the average blood pressure level among the whole population of young adults in the U.S. According to the background information on systolic blood pressure provided to you above, is your analysis cause for concern? Be specific and as approachable as possible in your explanation. 3. A CDC official asks for more information here. In particular, they want you to use their sample data to provide an informed-guess at the proportion of all young adults across the US who have systolic blood pressure levels high enough to be considered at least “Hypertension – Stage 1.” Calculate an estimate of this population proportion from your sample and a corresponding 95% confidence interval. 1 Describe what your calculations suggest about the proportion of young adults with dangerously high blood pressure levels in the US. 4. Prior to reaching out to you, the CDC reached out to another statistical consultant to assist them in their analysis of blood pressure among young adults. This researcher produced a simple linear regression model describing the relationship between an individuals’ age (measured in years) and their systolic blood pressure level. The results of this analysis are presented in the regression table below: parameter estimate 95% CI p-value Intercept 111.35 [104, 117] < 0.001 Age (in years) 0.516 [0.29, 0.74] < 0.001 Table 1: Linear regression model of the relationship between systolic blood pressure and age 1 Note that the procedure for calculating the confidence interval for a proportion is a little different than calculating the confidence interval for a mean. See the following online reference for more detail: https://bit.ly/3x2Vxle . 2
Before writing up a summary of this analysis, the other researcher quit in a fit of rage! The officials at the CDC aren’t statistical experts, so they can’t make sense of what this data analysis demonstrates. To help them out, please provide a detailed summary of what this regression model communicates about the relationship between age and blood pressure levels among the population of young adults in the US. Be as specific and detailed as possible in explaining the story conveyed by this analysis. Make sure to describe every piece of information presented in the regression table. 5. Before they quit, the same researcher also preformed a simple linear regression analysis describing the relationship between race(-ism) 2 and systolic blood pressure levels. Here, the variable race measures an individual’s racial identity as either: White; Black; American Indian/Native Alaskan; Asian; Latinx; or Other . Below is a regression table that displays the results of this analysis: parameter estimate 95% CI p-value Intercept 125.08 [124, 126] < 0.001 Race: White [ref.] - - - Race: AI/AN 4.07 [-2.07, 10.2] 0.202 Race: Asian 0.02 [-2.44, 2.47] 0.991 Race: Black 6.86 [5.86, 7.83] < 0.001 Race: Latinx 1.64 [0.31 2.98] 0.015 Race: Other -1.55 [-3.43 0.33] 0.111 Table 2: Linear regression model of the relationship between systolic blood pressure and race. Note: “ref.” stands for “reference category.” “AI/AN” stands for “American Indian/Native Alaskan.” CDC officials are truly puzzled by this regression analysis. They know that it suggest something about racialized disparities in blood pressure among young adults, but are not sure what exactly it’s communicating. Part of the difficulty here is that these officials have no idea how to interpret a linear regression model where the predictor is a nominal categorical variable (like race). To help the CDC out here, please provide a careful summary of what this analysis suggests about racial disparities in blood pressure among the full population of young adults in the US. Point to specific evidence from the above regression model to support your explanation. 3 6. A very eager — if somewhat unusual — CDC official has a strange theory of what explains why some young adults have higher blood pressure than others. Indeed, he believes that the single most important factor in determining someone’s systolic blood pressure level is the 2 Always remember that “race” is a proxy for a broad system of unequal social experiences — not biology! 3 Hint: read through this document — particularly the section “Categorical variables with two levels” — to help foundation your thinking: https://bit.ly/2YZuRFq. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
number of times that they blink per second. To investigate this idea, he builds a simple linear regression model describing the relationship between the outcome variable sbp and a predictor variable blinks per second — which measures how many times each of the young adults in our sample typically blink every second. (Values of this variable range from 0 to 5.) Below is a regression table summarizing the findings from his analysis: parameter estimate 95% CI p-value Intercept 111.35 [104, 117] < 0.001 Blinks (per second) 1.00 [-4.00, 6.00] 1.000 Table 4: Linear regression model of the relationship between systolic blood pressure and blinks per second The researcher in question enthusiastically presents his findings to the rest of the group. He points to the slope of this regression model and says, I was totally right! My model provides strong, data-driven evidence that young people who blink more per second have higher levels of blood pressure than those who blink less. Indeed, my model strongly suggest that every extra blink per second that someone does increases their blood pressure by a full point. If we just enact blinking limits on young folks, then we can improve the cardiovascular health of this population wholesale! Let’s get to work! The rest of the CDC members are (to put it lightly) unsure of this interpretation — and turn to you to help adjudicate. How do you interpret the eccentric researchers’ claims? Does his regression analysis provide compelling support for the idea that more blinking = higher blood pressure levels among the full population of young adults? Why or why not? Explain your thinking in detail. 7. One CDC official has been doing reading into new public health theories which suggest that childhood trauma is an important social determinant of cardiovascular health. The general idea here is that experiencing traumatic events as a child increases chronic stress and overburdens people’s stress responses systems — in ways that put significant wear and tear on their cardiovascular system as they age. To test this idea, the CDC researcher asks you to look into what their sample data have to say about the relationship between systolic blood pressure and three forms of violent exposures: (1) Childhood physical abuse — which measures the number of times that someone experienced severe psychical abuse as a child; (2) Police violence — which measures the number of times a person witnessed aggressive, unfair policing within their community as a child; and (3) Evictions — which captures the number of times a person’s family was evicted from their residence as a youth. Build a multiple linear regression model describing systolic blood pressure’s relationship to each of these three variables among the population of U.S. young adults. Present your 4
results in a regression table. Do these results support the official’s theory that childhood physical abuse, traumatic police contact, and experiences with evictions explain why some young adults in the U.S. population have higher levels of systolic blood pressure than others? Explain in detail; remember to describe your results in as intuitive and thorough of a way as possible. 8. Using your model from the prior analysis, calculate expected systolic blood pressure for the following hypothetical young adults: An individual who experienced no childhood abuse; no police violence; and no evictions. An individual who experienced 3 instances of childhood abuse; no police violence; and no evictions. An individual who experienced no childhood abuse; 3 instances of police violence; and no evictions. An individual who experienced no childhood abuse; no instances of police violence; and 3 evictions. What do these expected values suggest policy-wise? That is, use these calculations to communicate to the CDC which of these factors they should intervene on to most effectively prevent high blood pressure from emerging in future populations of young adults in the U.S. Use the numbers you’ve calculated above to make your case. Explain at least one other analysis you might preform to investigate the effectiveness of these policy interventions in more detail. Grading Remember, your assignment should be written with non-statistician policymakers in mind. Use full sentences and organize your thoughts into paragraphs, as if you were writing to an official committee, in your response to each question. Responses that are structured otherwise — e.g., in disjointed sentence fragments; as bullet points; etc. — will not be accepted. Similarly, reports that are not written with a non-stats expert in mind will be severely penalized The goal is to get you familiar with presenting technical findings in an approachable way. Each of the questions above is worth 10-points — making the entire assignment worth 80-points overall. A full 10-points will be awarded for each question that is addressed thoroughly and accurately, while 0-points will be awarded for each question that goes entirely unaddressed. Partial points will be awarded for efforts that fall somewhere in-between. Please also provide an R-script containing any R-code that you used for this assignment. Questions that require R-coding, but lack the corresponding script will be penalized. 5