FinalProject_AyeshabiTigdikar_20230630

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6060

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

docx

Pages

9

Uploaded by ayeshut

Report
Final Project Probability Theory & Introductory Statistics Ayeshabi W Tigdikar Master of Science in Project Management, Northeastern University Professor Tom Breur June 30 th , 2023
T ABLE OF C ONTENTS I NTRODUCTION ......................................................................................................................................... 2 V ARIABLES ................................................................................................................................................ 3 S UMMARY OF EDA .................................................................................................................................. 3 Q UESTIONS ................................................................................................................................................ 4 H YPOTHESIS F ORMULATION & T ESTING ............................................................................................ 4 A NALYSIS ................................................................................................................................................... 4 R ESULTS .................................................................................................................................................... 6 R EFERENCES ............................................................................................................................................. 6 A PPENDIX .................................................................................................................................................. 6
I NTRODUCTION The data is taken from the COVID-19 State Dashboard for California. Hospitalization statistics include all patients who had a COVID-19 diagnosis while they were there. This does not necessarily imply that they had COVID-19-related problems or COVID-19-related symptoms when they were admitted to the hospital. Note: Because hospitals record the total number of patients each day (as opposed to new patients), cumulative totals are not available. V ARIABLES According to the data dictionary on the website, the variables are: 1. county: The County where the hospital is located. None of the consolidated reporters had hospitals in different counties. 2. todays_date: The date on which the counts were recorded is the todays_date. 3. hospitalized_covid_confirmed_patients: The total number of inpatients with a COVID diagnosis who are hospitalized and occupy a bed. This value is not accumulated. This includes all inpatients (including those in ICUs and Medical/Surgical units), but excludes patients who are awaiting an inpatient bed at connected clinics, outpatient clinics, emergency rooms, and overflow locations. COVID ED patients were no longer included in the Hospitalized COVID total as of April 21, 2020, and were instead counted separately. 4. hospitalized_suspected_covid_patients: The number of patients admitted to a hospital with an inpatient bed who, in accordance with the CDC's Interim Public Health Guidance for Evaluating Persons Under Investigation (PUIs), have symptoms and signs that are consistent with COVID (the majority of patients with confirmed COVID have a fever and/or symptoms of an acute respiratory illness, such as cough, shortness of breath, or myalgia/fatigue). This includes all inpatients (including those in ICUs and Medical/Surgical units), but excludes patients waiting for inpatient beds in overflow facilities, connected clinics, outpatient clinics, emergency departments, and emergency rooms. 5. hospitalized_covid_patients: The number of patients currently hospitalized in an inpatient bed who have suspected or confirmed COVID. 6. all_hospital_beds: All surge beds, inpatient and outpatient post-surgical beds, labor and delivery unit beds, and observation beds are included in the facility's total bed count. This covers all beds for which the hospital might supply personnel and resources; it does not necessarily reflect the number of beds staffed at the time the facility submits its report. Bays for the emergency department (ED) are not included in this field. 7. icu_ covid_confirmed_patients: The total number of hospitalized COVID patients with laboratory confirmation of a positive result. All ICU beds (NICU, PICU, and adult) are included in this. 8. icu_suspected_covid_patients: The number of symptomatic patients in the hospital's ICU whose COVID tests are still awaiting laboratory confirmation. All ICU beds (NICU, PICU, and adult) are included in this. 9. icu_available_beds: The quantity of ICU beds that the hospital has available. All ICU beds (NICU, PICU, and adult) are included in this.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
S UMMARY OF EDA The initial EDA involved loading the dataset, checking its structure, and understanding the variables present. The dataset was then cleaned by renaming columns, handling missing values, and summarizing the data. Additionally, the frequency of each county was determined, and the top 10 counties with the highest number of hospitalized COVID-19 patients were identified. Time series analysis was performed to observe the trend of available ICU beds over the years. A linear regression analysis was conducted to explore the relationship between the number of hospitalized COVID-19 patients and the availability of ICU beds. Finally, a t-test was performed to compare the number of ICU beds between two specific counties, Lake and Colusa. Q UESTIONS The following questions were explored during the analysis: a) What is the frequency distribution of counties in the dataset, and which counties have the highest number of hospitalized COVID-19 patients? b) How does the availability of ICU beds change over time? c) Is there a relationship between the number of hospitalized COVID-19 patients and the availability of ICU beds? d) Are there any significant differences in the number of ICU beds between Lake and Colusa counties as there are less number of patients hospitalized? H YPOTHESIS F ORMULATION & T ESTING Hypothesis 1: Null Hypothesis (H0): There is no significant relationship between the number of hospitalized COVID-19 patients and the availability of ICU beds. Alternative Hypothesis (Ha): There is a significant relationship between the number of hospitalized COVID-19 patients and the availability of ICU beds. The hypothesis was tested using linear regression analysis, where the number of hospitalized COVID-19 patients (Covid_Patients_H) was considered as the dependent variable, and the availability of ICU beds (ICU_Beds) was the independent variable. Hypothesis 2: Null Hypothesis (H0): There is no significant difference in the number of ICU beds between Lake and Colusa counties. Alternative Hypothesis (Ha): There is a significant difference in the number of ICU beds between Lake and Colusa counties. The hypothesis was tested using a two-sample t-test, comparing the number of ICU beds in Lake County (Lake_data$ICU_Beds) with that in Colusa County (Colusa_data$ICU_Beds). The null hypothesis is often accepted as true in hypothesis testing up until there is enough data to refute it. To determine the likelihood of witnessing the data under the null hypothesis, statistical tests are run. The null hypothesis is accepted in favor of the alternative hypothesis if this probability is lower than a preset significance level, which is often set at 0.05.
A NALYSIS 1. Does the number of hospitalized COVID-19 patients have a significant relationship with the availability of ICU beds? An analysis of the association between the quantity of ICU beds and the number of COVID-19 patients requiring high-level care (Covid_Patients_HC) produced the outcome that is shown. According to the model, the quantity of ICU beds influences the number of COVID-19 patients requiring high-level care in a statistically significant way. The projected patient population when there are no ICU beds is represented by the estimated intercept, which is 33.077523. According to the coefficient for ICU_Beds, there should be 1.166171 more COVID-19 patients in high care for every new ICU bed. Since the coefficient and intercept are both highly significant (p-value 2.2e-16), it seems improbable that they are zero. The linear association between ICU_Beds and Covid_Patients_HC can account for about 22.56% of the variation in that variable, according to the model's R-squared value of 0.2256. Overall, the findings indicate that more COVID-19 patients receiving high-level treatment are associated with more ICU beds. 2. Is there a significant difference in the availability of ICU beds between Lake and Colusa counties? To answer this question, we can perform a two-sample t-test to compare the means of the ICU available beds in the two counties. The hypothesis that the mean ICU beds in Lake County and Colusa County are equal is tested using the Welch Two Sample t-test you gave. The test's p-value is 0.09448, which is higher than the 0.05 criterion of significance. Because there is insufficient data to support a conclusion that the mean ICU beds in Lake County and Colusa County are different, we cannot rule out the null hypothesis. The range of the mean difference's 95% confidence interval is from -0.33910530 to 0.02688407. In other words, we have a 95% confidence level that the actual mean difference is between -0.339 and 0.027. For Lake County and Colusa County, the sample estimates of the means are 1.502230 and 1.658341, respectively. This indicates that Lake County's mean ICU bed count is marginally lower than Colusa County's mean ICU bed count. Overall, the Welch Two Sample t-test results indicate that there is insufficient evidence to draw the conclusion that the mean ICU beds in Lake County and Colusa County differ from one another. 3. Check for correlation between the hospitalized confirmed covid cases between the counties Lake and Colusa. A slightly positive correlation between two variables is indicated by a correlation coefficient of 0.36. This indicates that, although not always, the value of the other variable tends to increase as the value of the first variable does. The correlation coefficient for a perfect positive correlation is 1, while the correlation coefficient for a perfect negative correlation is -1. There is no association between the two variables, as indicated by a correlation coefficient of 0.
4. Check for a correlation between the confirmed hospital covid cases and all the hospital beds. A moderately positive correlation between two variables is indicated by a correlation coefficient of 0.63. This indicates that, although not always, the value of the other variable tends to increase as the value of the first variable does. R ESULTS The EDA findings imply a connection between the number of COVID-19 patients receiving hospital care and the availability of ICU beds. ICU_Beds and Covid_Patients_HC have a linear relationship that can explain roughly 22.56% of the variation in that variable. This suggests that the number of ICU beds is inversely correlated with the number of COVID-19 patients receiving high-level care. ICU bed availability in Lake and Colusa counties is not significantly different, albeit there isn't enough information to support this. The results of the Welch Two Sample t-test show that there is not enough data to establish a difference between the mean ICU beds in Lake County and Colusa County. In addition, there is a weakly positive association between the counties Lake and Colusa's hospitalized confirmed COVID patients and a moderately positive correlation between those instances and all hospital beds. R EFERENCES Covid-19 hospital data . COVID-19 Hospital Data | CA Open Data. (n.d.). https://lab.data.ca.gov/dataset/covid-19-hospital-data1 Covid-19 hospitalizations by County Data Dictionary . COVID-19 Hospitalizations by County data dictionary | CA Open Data. (n.d.). https://lab.data.ca.gov/dataset/covid-19-hospital- data1/8f989799-b959-46ca-b3c5-0e67e95b584e A PPENDIX
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Figure 1: Structure of the Dataset Figure 2 : Summary of the Dataset Figure 3: Barplot of Hospitalized Covid-19 Patients by the County
Figure 4: Scatterplot between Hospitalized confirmed cases and Hospital beds Figure 5: Time Series of the ICU beds available over time
Figure 6: Summary for linear regression model between the Hospitalized Confirmed Cases and the ICU Beds Figure 7: The Two sample T test between the ICU beds in Lake and Colusa County
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help