7+exploratory analysis inclass exercise (1)

.Rmd

School

University of Saskatchewan *

*We aren’t endorsed by this school

Course

311

Subject

Electrical Engineering

Date

Apr 3, 2024

Type

Rmd

Pages

Uploaded by MagistrateStar1002

--- title: "Module 7 inclass exercise" author: "Sattar" output: html_document --- # Preparation A. load the packages - Please simply run the code chunk ```{r setup} if (!require("pacman")) install.packages("pacman") pacman::p_load(tidyverse,skimr,gridExtra,corrr, ggplot2) ``` B. Remove previous objects from your environment - please simply run the code chunk ```{r} rm(list=ls()) # this code removes all your previous objects to create a clean sheet for you to work on this inclass exercise ``` C. Import the dataset - in this exercise, you need to use the "education" dataset - please run the code chunk to load the dataset into the environment ```{r} education<-read.csv(choose.files(),header=TRUE,stringsAsFactors=TRUE) ``` # Task 1 - use skim() function to generate the summary statistics of the variables "Urban_Population" and "state" for the Southern region - write in a pipeline and one pipeline only ```{r} education%>% filter(Region=="Southern")%>% select(Urban_Population,State)%>% skim() ``` # Task 2: - use my_skim() function to generate the summary statistics of the variables "Urban_Population" and "State" for each region.(one pipeline only) More information about my_skim() function - my_skim() function use skim_with() to customizes the outputs of skim() function - With the following my_skim() function, we change to the length of values displayed in top_counts for factors to 100 characters. In the function, the full name for "sfl" is skim function list. - The arugment of the my_skim() function is still a data frame, similar to skim() - you need to run the code to generate my_skim() function first, which will appear

in the environment. ```{r} my_skim <- skim_with(factor = sfl(top_counts = ~top_counts(., max_char = 100))) # your code in the following education%>% group_by(Region)%>% select(Urban_Population,State)%>% my_skim() ``` # Task 3 - create a summary table to display the mean education expenditures for each region and each level of urban population - one pipeline only ```{r} education%>% group_by(Region,Levels_Urban_Population)%>% summarise(mean_expenditure=mean(Education_Expenditures)) ``` #Task 4. correlation - generate the correlation values between three variables: Urban_Population; Per_Capita_Income and Education_Expenditures (one pipeline only) ```{r} education%>% select(Urban_Population, Per_Capita_Income, Education_Expenditures)%>% correlate() ``` #Task 5:Histogram Create a histogram for each variable :"Urban_Population" and ""Education_Expenditures", and display them in a matrix of one row and two columns. The following is a breakdown to guide to complete the task: First, please revise the following code to create a histogram for the variable "Urban_Population". You need to write in a pipeline for each chart. - Save it as plot1 in the environment. ```{r} ____ <-____%>%ggplot()+ geom_histogram(aes(x=________)) ``` Second, - Please create a histogram for the variable "Education_Expenditures" - Save it as plot2 in the environment ```{r} plot1<-education%>%ggplot()+ geom_histogram(aes(x=Urban_Population)) plot1 plot2<- education%>%ggplot()+

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version