BIO259-Graphical-Data-Summaries_Tutorial

pdf

School

Toronto Metropolitan University *

*We aren’t endorsed by this school

Course

259

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

31

Uploaded by DeanValorHornet7

Report
BIO259-Graphical-Data-Summaries_Tutorial October 16, 2023 Our ability to generate visually stimulating graphical representations that evoke appropriate re- sponses from the audiences of our data is arguably the most important aspect of biological data science. These graphical representations not only help us communicate our results to our audience, but also enable us to gain greater insight into our own data through simplification. By leveraging the power of computer programming, we can generate and manipulate figures in order to produce an extremely wide range of plots that simply aren’t accessible through most graphic user interfaces. We can also heavily parallelize our analyses and incorporate these visualizations into pipelines that enable us to generate the same plots for different datasets. In today’s tutorial we will learn how to generate several different types of graphical data summaries in R using the ggplot2 package. We will also learn to perform some basic manipulations and customize these plots where applicable. Our tutorial is divided into seven primary sections, each of which will cover how to generate and customize a different type of plot: 1. Line Graphs 2. Bar Charts 3. Box and Whisker Plots (with dot plots) 4. Scatter Plots 5. Pie Charts 6. Histograms 7. Heat Maps #1. Line typically used for time data #2. Bar charts are good for looking at catagorical data and their numbers #6. histogram y axis is typically count #3. Look at the spread of your data #4. Look at the relationship between 2 variables #5. Visualize proportions or percentages #7. infer visually correlation between 2 variables [1]: #Run the code block below in order to import our dataset, select a subset of , the columns, and filter out undesirable location data. #This will be the data that we will work with throughput this tutorial and in , our practical. library (dplyr) library (ggplot2) 1
covid_df <- read.table ( "/var/biojupyterhubdata/BIO259/owid-covid-data-15.06. , 2022.csv" , sep = "," , header = TRUE , quote = "" ) covid_df <- covid_df %>% , select ( c ( "continent" , "location" , "date" , "total_cases" , "new_cases" , "total_deaths" , "new_deaths" , "total_cases_per_million" , "new_cases_per_million" , , "total_deaths_per_million" , "new_deaths_per_million" , "icu_patients" , , "icu_patients_per_million" , "hosp_patients" , "hosp_patients_per_million" , , "total_vaccinations" , "total_tests" , "new_tests" , "positive_rate" , , "total_vaccinations" , "people_vaccinated" , "people_fully_vaccinated" , , "total_boosters" , "new_vaccinations" , "population" , "population_density" , , "median_age" , "gdp_per_capita" , "diabetes_prevalence" , "life_expectancy" )) covid_df <- covid_df %>% filter ( ! location %in% c ( 'Africa' , 'Asia' , 'Europe' , , 'North America' , 'South America' , 'Oceania' , 'Low income' , 'Lower middle , income' , 'Upper middle income' , 'High income' , 'European Union' , , 'International' , 'Northern Cyprus' , 'World' , 'Grand Total' )) covid_df <- covid_df %>% mutate (date = as.Date (date, format = "%Y-%m-%d" )) , #convert column to date object covid_df [is.na (covid_df)] = 0 #replace NA values with 0 Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union 0.1 Part 1: Line Graphs Line graphs are an effective means of displaying changes in a response variable through time. In this section, we will use our Covid-19 dataset to plot the number of Sars-CoV-2 cases detected each day through time in order to compare the pandemic curves for Canada and the USA. [2]: # Generate a line graph of daily cases (per million) in Canada using # geom_line() # The ggplot process is usually: create a data frame with the part of #the data you want # Then add a type of graph, with aesthetic parameters #(such as values of x, values of y etc) 2
#colo(u)r will colour depending on the location column # The answer should look like this, but with some 'corrections' ggplot (covid_df %>% filter (location == "Canada" )) + #Create subset line graph geom_line ( aes (x = date, y = new_cases_per_million, group = location, , colour = location)) # Write your code here. Be careful to follow the instructions! [3]: # Compare line graph from Canada to one from the USA, normalizing for , population. Note change in scale. # The answer should look like this, but with some 'corrections' 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ggplot (covid_df %>% filter (location == "Canada" | location == "United States" )) , + geom_line ( aes (x = date, y = new_cases_per_million, group = location, colour = location)) # Write your code here. Be careful to follow the instructions! [4]: # Adjust your axes, legends, and colours to improve figure readability. ggplot (covid_df %>% filter (location == "Canada" | location == "United States" )) , + geom_line ( aes (x = date, y = new_cases_per_million, group = location, , colour = location)) + 4
theme (axis.title = element_text (size =15 ), axis.text = , element_text (size =15 ), axis.ticks.length = unit ( .1 , "cm" )) + #each option in , theme has several underlying options, here we are manipulating several , aspects of our axes xlab ( "Date" ) + ylab ( "New Cases Per Million" ) + #replace axis labels theme (legend.position = "bottom" , legend.text = element_text (size =15 ), legend. , title = element_blank ()) + #manipulate legend scale_color_manual (values = c ( "#ff0000" , "#0000FF" )) #specify different , "national" colours for each group 5
0.2 Part 2: Bar Charts Bar charts utilize rectangular bars with lengths that are proportional to the values they represent to compare categorical data. Error bars are commonly added to bar charts to display variance. In this section, we will compare the average number of hospitalized Covid-19 patients per million across eight countries in Europe and learn how to include error bars on these plots. [5]: # We'll pivot our covid_df in order to calculate summary mean, sd, and sem , values for subset of European countries. # Our countries of interest are Belgium, France, Italy, Iceland, Netherlands, , Portugal, Sweden, and the United Kingdom. pivot_covid_df <- covid_df %>% group_by (location) %>% filter (location == , 'Belgium' | location == 'France' | location == 'Italy' | location == , 'Iceland' | location == 'Netherlands' | location == 'Portugal' | location , == 'Sweden' | location == 'United Kingdom' ) %>% summarise (mean_hosp_patients = mean (hosp_patients_per_million), , sd_hosp_patients = sd (hosp_patients_per_million), sem_hosp_patients = , sd (hosp_patients_per_million) / sqrt ( n ())) pivot_covid_df A tibble: 8 × 4 location mean_hosp_patients sd_hosp_patients sem_hosp_patients <chr> <dbl> <dbl> <dbl> Belgium 161.27678 133.90926 4.558324 France 244.34309 143.84840 4.865745 Iceland 36.31031 53.97307 1.863357 Italy 201.03494 177.41961 6.025486 Netherlands 59.19184 41.76282 1.440954 Portugal 110.12235 130.67054 4.516635 Sweden 87.08845 84.02470 2.855275 United Kingdom 136.17966 122.06000 4.145375 [6]: # Generate a simple bar chart of mean hospitalized patients in # each country. You can play around with geom_bar settings to see what they , each do. # Again the trick is to load the data to ggplot # Then plot a bar graph with specific aesthetics # geom_bar() can perform summaries for us, but we can also plot the pivot table , directly. # stat = "identity" we are giving it the y values to plot #The answer should look like this, but with some 'corrections' ggplot (pivot_covid_df) + geom_bar ( aes (x = location, y = mean_hosp_patients), stat = "identity" , , fill = "gray" , colour = "black" , size =1 , alpha =1 ) # Write your code here. Be careful to follow the instructions! 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Warning message: “Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. ￿ Please use `linewidth` instead.” [7]: # Add error bars. You can play around with geom_errorbar settings to see what , they do. # Adding elements to graphs with ggplot is simple - just pile on ggplot , graphing commands #Error bars are the 95% CI: [mean - 1.96*SEM, mean + 1.96*SEM] #The answer should look like this, but with some 'corrections' ggplot (pivot_covid_df) + 7
geom_bar ( aes (x = location, y = mean_hosp_patients), stat = "identity" , , fill = "gray" , colour = "black" , size =1 , alpha =1 ) + geom_errorbar ( aes (x = location, ymin = mean_hosp_patients -1.96* sem_hosp_patients, ymax = mean_hosp_patients +1.96* sem_hosp_patients), width =0 , colour = "black" , alpha =1 , size =1 ) # Write your code here. Be careful to follow the instructions! [8]: # Make thematic modifications to optimize your plot. The syntax for this is the , same for different plot types. For example, let's change the axis label , sizes etc. # Update the X and Y axis titles 8
# Change their font sizes etc. # Look at one of the graph above we modified to inspire yourself. #The answer should look like this, but with some 'corrections' ggplot (pivot_covid_df) + geom_bar ( aes (x = location, y = mean_hosp_patients), stat = "identity" , fill = "gray" , colour = "black" , size =1 , alpha =1 ) + #manipulate bars geom_errorbar ( aes (x = location, ymin = mean_hosp_patients -1.96* sem_hosp_patients, ymax = mean_hosp_patients +1.96* sem_hosp_patients), width =0 , colour = "black" , alpha =1 , size =1 ) + #manipulate error , bars theme (axis.title = element_text (size =15 ), axis.text.x = element_text (size =15 , angle =45 , hjust =1 ), axis.text.y = element_text (size =15 ), axis.ticks.length = unit ( .1 , "cm" )) + #manipulate axes xlab ( "Country" ) + ylab ( "Mean Hospitalize patients" ) #update axis labels # Write your code here. Be careful to follow the instructions! 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
0.3 Part 3: Box and Whisker Plots Like bar charts, box and whisker plots also distribute categorical data on the x-axis and enable us to compare quantitative variables on the y-axis. However, they also enable us to display more information about variance, including the median, the interquartile range, and the overall range. In this section, we will compare the same data that we compared in Part 2 of this tutorial, but here, we will use a box and whisker plot to do so. [9]: # Filter the covid_df to extract European countries of interest. filter_covid_df <- covid_df %>% filter (location == 'Belgium' | location == , 'France' | location == 'Italy' | location == 'Iceland' | location == , 'Netherlands' | location == 'Portugal' | location == 'Sweden' | location , == 'United Kingdom' ) 10
[10]: # Generate a basic box plot using geom_boxplot() #The answer should look like this, but with some 'corrections' ggplot (filter_covid_df) + geom_boxplot ( aes (x = location, y = hosp_patients_per_million)) #all default , settings # Write your code here. Be careful to follow the instructions! [11]: # Modify some geom_boxplot settings to enhance visual appeal. # Change the fill color, the size of the lines etc. #The answer should look like this, but with some 'corrections' 11
ggplot (filter_covid_df) + geom_boxplot ( aes (x = location, y = hosp_patients_per_million), fill = "blue" , colour = "pink" , size =1 , alpha =1 ) # Write your code here. Be careful to follow the instructions! [12]: # Make thematic modifications to optimize your plot. # Again, change the x/y axis titles, the label sizes etc. #The answer should look like this, but with some 'corrections' ggplot (filter_covid_df, aes (x = location, y = hosp_patients_per_million)) + 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
geom_boxplot (fill = "gray" , colour = "black" , size =1 , alpha =1 ) + #modify boxplot , features theme (axis.title = element_text ( 15 ), axis.text.x = element_text (size = 15 , angle = 45 , hjust = 1 ), axis.text.y = element_text (size = 15 ), axis.ticks.length = unit ( .1 , "cm" )) + #modify axis features xlab ( "Country" ) + ylab ( "Hospitalized patients per million" ) #re-assign axis , labels # Write your code here. Be careful to follow the instructions! 13
0.4 Part 4: Scatter Plots Scatter plots are used to illustrate the relationship between two variables, with each dot representing an individual piece of data. In this section, we will explore the relationship between total Covid-19 deaths per million and the median age of the population in each country. The expectation is that countries that have a higher median age will have experience a greater burden of Covid-19 deaths. [13]: # We'll pivot our covid_df in order to extract total deaths per million and , median age for each country. pivot_covid_df <- covid_df %>% group_by (location) %>% , summarise (total_deaths_per_million = max (total_deaths_per_million), , median_age = max (median_age)) pivot_covid_df[pivot_covid_df == 0 ] <- NA #filter out rows with missing data on , total deaths or median age. pivot_covid_df <- na.omit (pivot_covid_df) pivot_covid_df 14
A tibble: 188 × 3 location total_deaths_per_million median_age <chr> <dbl> <dbl> Afghanistan 193.546 18.6 Albania 1217.223 38.0 Algeria 154.091 29.1 Angola 55.992 16.8 Antigua and Barbuda 1418.037 32.1 Argentina 2828.455 31.9 Armenia 2907.220 35.7 Aruba 2528.103 41.2 Australia 357.450 37.9 Austria 2208.873 44.4 Azerbaijan 950.081 32.4 Bahamas 2053.342 34.3 Bahrain 852.259 32.4 Bangladesh 175.168 27.5 Barbados 1637.076 39.8 Belarus 738.970 40.3 Belgium 2736.768 41.8 Belize 1674.425 25.0 Benin 13.091 18.8 Bhutan 26.927 28.6 Bolivia 1855.076 25.4 Bosnia and Herzegovina 4840.263 42.5 Botswana 1130.050 25.8 Brazil 3124.829 33.5 Brunei 509.589 32.4 Bulgaria 5395.369 44.7 Burkina Faso 17.863 17.6 Burundi 3.101 17.5 Cambodia 180.333 25.6 Cameroon 70.893 18.8 ￿ ￿ ￿ Spain 2294.117 45.5 Sri Lanka 768.422 34.1 Sudan 110.222 19.7 Suriname 2289.633 29.6 Sweden 1874.872 41.0 Switzerland 1585.338 43.1 Syria 172.360 21.7 Taiwan 190.568 42.2 Tajikistan 12.821 23.3 Tanzania 13.659 17.7 Thailand 434.634 40.1 Timor 98.968 18.0 Togo 32.200 19.4 Tonga 112.403 22.3 Trinidad and Tobago 2827.472 36.2 Tunisia 2400.768 32.7 Turkey 1164.074 31.6 Uganda 76.501 16.4 Ukraine 2587.238 41.4 United Arab Emirates 230.706 34.0 United Kingdom 2632.966 40.8 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[14]: # Generate a basic scatter plot using geom_point() #The answer should look like this, but with some 'corrections' ggplot (pivot_covid_df) + geom_point ( aes (x = median_age, y = total_deaths_per_million), fill = "black" , colour = "black" , size =2 , alpha =1 ) #mostly default , settings, with colour, size and transparency specified # Write your code here. Be careful to follow the instructions! [15]: # Make thematic modifications to optimize your plot. #The answer should look like this, but with some 'corrections' 16
ggplot (pivot_covid_df) + geom_point ( aes (x = median_age, y = total_deaths_per_million), fill = "black" , colour = "black" , size =2 , alpha =2 ) + theme (axis.title = element_text ( 15 ), axis.text.x = element_text (size = 15 , angle = 45 , hjust = 1 ), axis.text.y = element_text (size = 15 ), axis.ticks.length = unit ( .1 , "cm" )) + #modify axis features xlab ( "Median Age" ) + ylab ( "total deaths per million" ) #re-assign axis labels # Write your code here. Be careful to follow the instructions! 17
[16]: # Add a trendline using geom_smooth(). # Notice how a trend line is something like a second 'chart' on top of this , graph, which can have its own aesthetic. # Although it's possible to have the aesthetic passed from the initial ggplot() , command, it's easier to learn ggplot by specifying the aesthetic every time. #The answer should look like this, but with some 'corrections' ggplot (pivot_covid_df) + geom_point ( aes (x = median_age, y = total_deaths_per_million), fill = "black" , , colour = "black" , size =2 , alpha =2 ) + theme (axis.title = element_text ( 15 ), axis.text.x = element_text (size = 15 , angle = 45 , hjust = 1 ), axis.text.y = element_text (size = 15 ), axis.ticks.length = unit ( .1 , "cm" )) + #modify axis features xlab ( "Median Age" ) + ylab ( "Total Deaths Per Million" ) + geom_smooth ( aes (x = median_age, y = total_deaths_per_million), method = lm) #add simple regression line with confidence interval # Write your code here. Be careful to follow the instructions! `geom_smooth()` using formula = 'y ~ x' 18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[17]: # Label and highlight specific point. # To do so, we'll pass different datasets to the graphing commands #The answer should look like this, but with some 'corrections' ggplot () + geom_point (data = pivot_covid_df, aes (x = median_age, , y = total_deaths_per_million), fill = "black" , colour = "black" , size =2 , alpha =2 ) + theme (axis.title = element_text ( 15 ), axis.text.x = element_text (size = 15 , angle = 45 , hjust = 1 ), axis.text.y = element_text (size = 15 ), axis.ticks.length = unit ( .1 , "cm" )) + #modify axis features xlab ( "Median Age" ) + ylab ( "Total Deaths Per Million" ) + 19
geom_smooth (data = pivot_covid_df, aes (x = median_age, , y = total_deaths_per_million),method = lm) + geom_point (data = pivot_covid_df %>% filter (location == "Canada" ), aes (x = median_age, y = total_deaths_per_million), color = "red" , size =5 ) + #label point for Canada red geom_text (data = pivot_covid_df %>% filter (location == "Canada" ), aes (x = median_age, y = total_deaths_per_million, label = "Canada" ), nudge_x = 3 , size =5 ) #add a title to the , Canada point # Write your code here. Be careful to follow the instructions! `geom_smooth()` using formula = 'y ~ x' 20
0.5 Part 5: Pie Charts Pie charts allow us to display percentage values as slices of a pie. In this section, we will use a pie chart to display the percentage of worldwide Covid-19 deaths that occured on each continent. [18]: # Let's pivot our covid_df in order to calculate the total deaths that occured , in each continent. pivot_covid_df <- covid_df %>% group_by (continent) %>% summarise (total_deaths = , sum (new_deaths)) %>% mutate (percent = total_deaths / sum (total_deaths)) pivot_covid_df A tibble: 6 × 3 continent total_deaths percent <chr> <dbl> <dbl> Africa 254379 0.040553667 Asia 1432580 0.228385096 Europe 1847300 0.294500682 North America 1448053 0.230851836 Oceania 13253 0.002112823 South America 1277086 0.203595896 [19]: #A stacked bar plot to illustrate the proportion of deaths on each continent #x must be "" for a stacked bar plot since we want everything plotted on top of , each other #The answer should look like this, but with some 'corrections' ggplot () + geom_bar (data = pivot_covid_df, aes (x = "" , y = percent, fill = continent), stat = , "identity" ) #default for a stacked bar graph # Write your code here. Be careful to follow the instructions! 21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[20]: #Convert to a pie chart. #The answer should look like this, but with some 'corrections' ggplot () + geom_bar (data = pivot_covid_df, aes (x = "" , y = percent, fill = continent), stat = , "identity" ) + coord_polar (theta = "y" , start = 0 ) #turn into a pie graph # Write your code here. Be careful to follow the instructions! 22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[21]: # Clean up the rest of the chart. # Add percentages etc ggplot (pivot_covid_df, aes (x = "" , y = percent, fill = continent)) + geom_bar (stat = "identity" ) + geom_text ( aes (x = 1.6 , label = scales :: percent (percent, accuracy = .1 )), , position = position_stack (vjust = .5 ), size = 5 ) + coord_polar ( "y" , start =0 ) + theme_minimal () + #remove background theme (axis.title.x = element_blank (), axis.title.y = element_blank (), panel. , border = element_blank (), #remove other excess text panel.grid = element_blank (), axis.ticks = element_blank (), axis. , text = element_blank ()) + 23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
theme (legend.text = element_text (size =15 ), legend.title = element_blank ()) , #manipulate legend 0.6 Part 6: Histograms Histograms diplay frequency distributions of data from one or more variables using adjacent vertical bars. In this section, we will look at the distribution of normalized daily deaths throughout the pandemic in Canada and the United States. [22]: # Generate a basic histogram for daily deaths per million observed in Canada. # We use the geom_histogram() function for this. #The answer should look like this, but with some 'corrections' 24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ggplot (covid_df %>% filter (location == "Canada" )) + geom_histogram ( aes (x = new_deaths_per_million), binwidth =.2 ) # Write your code here. Be careful to follow the instructions! [23]: #Plot histograms for daily deaths per million in both Canada and the USA on the , same plot. # Write your code here. Be careful to follow the instructions! ggplot (covid_df %>% filter (location == "Canada" | location == "United States" )) , + geom_histogram ( aes (x = new_deaths_per_million, color = location, fill = , location),alpha = 0.5 , binwidth =.2 ) 25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
#alpha modifies fill transparent, lower the alpha more transparent the plot [24]: # Add a mean line for each distribution. # We've created a new dataframe with the means of the datasets. means <- covid_df %>% filter (location == "Canada" | location == "United , States" ) %>% group_by (location) %>% summarise (mean = , mean (new_deaths_per_million)) means #The answer should look like this, but with some 'corrections' ggplot () + geom_histogram (data = covid_df %>% 26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
filter (location == "Canada" | location == "United States" ), aes (x = new_deaths_per_million, colour = location, fill = location), alpha =0.5 , , binwidth =.2 ) + #alpha makes fill transparent geom_vline (data = means, aes (xintercept = mean, colour = location), , linetype = "dashed" ) #mean line # Write your code here. Be careful to follow the instructions! A tibble: 2 × 2 location mean <chr> <dbl> Canada 1.259491 United States 3.472203 27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[25]: #Clean up the rest of the histogram. ggplot () + geom_histogram (data = covid_df %>% filter (location == "Canada" | location == , "United States" ), aes (x = new_deaths_per_million, colour = location, , fill = location), alpha =0.5 , binwidth =.2 ) + #alpha makes fill transparent geom_vline (data = means, aes (xintercept = mean, colour = location), , linetype = "dashed" ) + xlab ( "Daily Deaths Per Million" ) + ylab ( "Count" ) + #modify axis labels theme (legend.position = "bottom" , legend.text = element_text (size =15 ), legend. , title = element_blank ()) + #manipulate legend theme (axis.title = element_text (size =15 ), axis.text.x = , element_text (size =15 ), axis.text.y = element_text (size =15 ), axis.ticks. , length = unit ( .1 , "cm" )) + scale_color_manual (values = c ( "#ff0000" , "#0000FF" )) #manually specify , colours for each group 28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
0.7 Part 7: Heat Maps Heat maps enable us to illustrate the relationship between three variables by separating two cate- gorical variables on the x and y axes, and displaying a third variable on the 2-dimensional matrix using a colour gradient. In this section, we will compare the average number of Covid-19 cases observed each day for each month of the pandemic in Canada. Our goal is to highlight months were cases were especially high, and those where cases were especially low. [26]: # Modify the covid_df dataframe to have years and months as a separate column covid_df <- covid_df %>% mutate (year = format (date, "%Y" )) %>% , mutate (month = format (date, "%m" )) [27]: # Now let's build a heatmap dataframe. This data frame should only look at data , from Canada, and then get the average number of new cases for each month/year #The answer should look like this, but with some 'corrections' heatmap_df <- covid_df %>% filter (location == "Canada" ) %>% group_by (year, , month) %>% summarise (avg_cases = mean (new_cases)) head (heatmap_df) # Write your code here. Be careful to follow the instructions! `summarise()` has grouped output by 'year'. You can override using the `.groups` argument. A grouped_df: 6 × 3 year month avg_cases <chr> <chr> <dbl> 2020 01 0.4444444 2020 02 1.0000000 2020 03 344.4516129 2020 04 1550.1000000 2020 05 1124.2903226 2020 06 418.0333333 [28]: #Generate a basic heat map with mostly default settings. #The answer should look like this, but with some 'corrections' ggplot (heatmap_df) + geom_tile ( aes (year,month , fill = avg_cases), colour = "white" ) + scale_fill_gradient (low = "white" , high = "dark red" ) #specify gradient colours # Write your code here. Be careful to follow the instructions! 29
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[29]: #Clean up the heat map to make it more visually appealing. options (repr.plot.width =8 , repr.plot.height =4 ) #this setting will set a new , working default for all of the Jupyter Notebook ggplot (heatmap_df) + geom_tile ( aes (year, month, fill = avg_cases),colour = "white" ) + scale_fill_gradient (low = "white" , high = "dark red" , guide = , guide_colorbar (frame.colour = "black" , frame.linewidth = 2 , ticks.colour = , "black" , ticks.linewidth = 2 , label = TRUE , barwidth =2 , barheight =8 )) + , #modify your gradient features coord_flip () + #turn the plot 90 degrees scale_x_discrete (limits = rev) + #flip the order of the year axis theme_minimal () + #remove background features 30
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
theme (panel.border = element_rect (colour = "black" , fill = NA , linewidth =2 ), , panel.grid = element_blank ()) + #add a panel border, remove the grid theme (legend.text = element_text (size =15 ), legend.title = , element_text (size =15 )) + #manipulate legend theme (axis.title = element_text (size =15 ), axis.text.x = , element_text (size =15 ), axis.text.y = element_text (size =15 ), axis.ticks. , length = unit ( .5 , "cm" )) #manipulate axis 0.8 Tutorial Summary Your practical this week will apply the tools that we have learned during this tutorial in ggplot2 to generate appropriate plots for a given scenario. Specifically, you will be asked to identify the best visualization tool to answer a particular question and then create the corresponding figure. These activities will be conducted with the guidance of your TAs, but they will be graded. [ ]: 31
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help