IE 0015_ Homework 2 2023
.docx
keyboard_arrow_up
School
University of Pittsburgh *
*We aren’t endorsed by this school
Course
0015
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
5
Uploaded by CorporalOkapi1224
IE 0015: Homework 2
Post date:
2/21/2023
Due date
: Midnight on 3/24/2023 ●
Please record your answers in this microsoft word document. Also, please note who (if anyone) you worked with. ●
In this homework you will mainly practice data visualization and manipulation. ●
For this homework you will need the file covid.csv that is posted on canvas. To get things started, run the following lines of code: covid = read.csv("covid.csv")
covid$date = as.Date(covid$date)
covid[is.na(covid)] = 0
The last line of code replaces all missing values in the dataset with zero. (This is not always a good idea, but it will not cause any issues in this homework.) Problem 1
In this problem we will investigate the relationship between the variables new_cases and new_deaths. a.
Plot the variable new_cases against the variable new_deaths. Use geom_point() and make sure to appropriately label the axes. b.
In the plot, do the variables appear to be positively or negatively correlated? Explain. Does this make sense intuitively? c.
Compute and report the numerical value of the correlation between the two variables. Please report all of your code below.
Answer
: a.
Positively correlated because the data points make a semi-straight
line going from near the origin out to high y-values (looks similar to
y=x). this makes sense intuitively because as the number of new cases increases, more deaths will occur.
b.
0.9228248
Problem 2
In this problem we will restrict our attention to covid cases in the United States and further investigate the relationship between the variables new_cases and new_deaths. a.
Create a dataframe that satisfies the following conditions. i.
It contains the variables date, location, new_deaths, and new_cases.
ii.
Each observation has a United States location value. iii.
And each observation has a new_cases value that is greater than 0. b.
In one graphic, plot the variables new_cases and new_deaths against the variable date. Use different colors for each plot, use geom_line() for each plot, and make sure to appropriately label the graphic. c.
The graphic from part b does not provide a lot of insight. Create a new variable mortality_rate = (new_deaths)/(new_cases), and plot it against the variable date. Use geom_line() and appropriately label the axes. Please report all of your code below.
Answer
: Problem 3
In this problem we will investigate which location has the highest number of new cases during a specific time period. a.
Create a dataframe that satisfies the following conditions. i.
It contains the variables date, new_cases, and location. ii.
Each observation is from between May 25, 2020 and June 3, 2020. (This includes May 25, 2020 and June 3, 2020.) iii.
The tibble is ordered by the variable new_cases (in decreasing order). b.
Which date and location had the highest number of new cases during this time interval? i.
Date: 2020-05-30
location: World
Count: 136515
c.
Now further restrict your attention to observations that have a United States or Russia location. Which date and location have the highest number of new cases? i.
Date: 2020-05-29
Location: United States
Count: 24472
Please report all of your code below.
Answer
: Problem 4
:
In this problem you will group the data by location. a.
Create a new tibble that groups the data by location. The tibble should contain a variable
called max_new that equals the maximum value of the variable new_cases for each location. Order the tibble by the variable max_new in decreasing order. b.
Which location has the largest value of max_new? Please report all of your code below. Answer
:
a.
World with 879905 Problem 5
:
In this problem you will group the data by date. a.
Create a tibble that groups the data by date. The tibble should contain a variable called avg_new that equals the mean of the variable new_cases for each date. b.
Plot the variable avg_new against the variable date. Use geom_smooth(), and make sure to appropriately label axes. Please report all of your code below. Answer
:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help