Project - Chicken
pdf
keyboard_arrow_up
School
Montgomery College *
*We aren’t endorsed by this school
Course
101
Subject
Industrial Engineering
Date
Apr 3, 2024
Type
Pages
8
Uploaded by GrandSparrowMaster909
Project - Chicken
Yeabsira Alemu
Data Import
1. Download chicken.csv to your working directory. Make sure to set your working directory appropriately! This
dataset was created by modifying the R built-in dataset chickwts.
2. Import the chicken.csv data into R. Store it in a data.frame named ch_df and print out the entire ch_df to the
screen.
ch_df<- read.csv("chicken.csv")
ch_df
## weight feed
## 1 206 meatmeal
## 2 140 horsebean
## 3 <NA> <NA>
## 4 318 sunflower
## 5 332 casein
## 6 na horsebean
## 7 216 na
## 8 143 horsebean
## 9 271 soybean
## 10 315 meatmeal
## 11 227 horsebean
## 12 N/A sunflower
## 13 322 sunflower
## 14 352 casein
## 15 329 not sure
## 16 N/A linseed
## 17 379 casein
## 18 153 ?
## 19 N/A linseed
## 20 213 linseed
## 21 257 ## 22 179 horsebean
## 23 380 meatmeal
## 24 327 soybean
## 25 260 linseed
## 26 168 horsebean
## 27 248 soybean
## 28 181 linseed
## 29 160 horsebean
## 30 <NA> sunflower
## 31 soybean
## 32 340 sunflower
## 33 260 casein
## 34 169 ?
## 35 171 soybean
## 36 368 casein
## 37 283 casein
## 38 334 sunflower
## 39 - unknown
## 40 309 linseed
## 41 soybean
## 42 295 ?
## 43 404 <NA>
## 44 392 sunflower
## 45 na casein
## 46 267 soybean
## 47 303 meatmeal
## 48 250 soybean
## 49 243 soybean
## 50 108 horsebean
## 51 229 linseed
## 52 <NA> horsebean
## 53 222 casein
## 54 344 meatmeal
## 55 263 unknown
## 56 148 linseed
## 57 318 casein
## 58 - meatmeal
## 59 258 meatmeal
## 60 <NA> sunflower
## 61 325 meatmeal
## 62 217 ## 63 271 linseed
## 64 244 linseed
## 65 341 sunflower
## 66 141 ?
## 67 158 soybean
## 68 423 sunflower
## 69 316 <NA>
## 70 na soybean
## 71 casein
Clean Missing Values
There are some missing values in this dataset. Unfortunately they are represented in a number of different ways.
3. Clean up this dataset by doing the following:
Calculate how many elements in the original ch_df are recognized as NA by R.
Change all of the missing elements to NA in ch_df.
You do NOT have to fill in the missing values. Just leave them as NA.
na_df <- sum(is.na(ch_df))
ch_df[ch_df== "na"]<- NA
ch_df
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## weight feed
## 1 206 meatmeal
## 2 140 horsebean
## 3 <NA> <NA>
## 4 318 sunflower
## 5 332 casein
## 6 <NA> horsebean
## 7 216 <NA>
## 8 143 horsebean
## 9 271 soybean
## 10 315 meatmeal
## 11 227 horsebean
## 12 N/A sunflower
## 13 322 sunflower
## 14 352 casein
## 15 329 not sure
## 16 N/A linseed
## 17 379 casein
## 18 153 ?
## 19 N/A linseed
## 20 213 linseed
## 21 257 ## 22 179 horsebean
## 23 380 meatmeal
## 24 327 soybean
## 25 260 linseed
## 26 168 horsebean
## 27 248 soybean
## 28 181 linseed
## 29 160 horsebean
## 30 <NA> sunflower
## 31 soybean
## 32 340 sunflower
## 33 260 casein
## 34 169 ?
## 35 171 soybean
## 36 368 casein
## 37 283 casein
## 38 334 sunflower
## 39 - unknown
## 40 309 linseed
## 41 soybean
## 42 295 ?
## 43 404 <NA>
## 44 392 sunflower
## 45 <NA> casein
## 46 267 soybean
## 47 303 meatmeal
## 48 250 soybean
## 49 243 soybean
## 50 108 horsebean
## 51 229 linseed
## 52 <NA> horsebean
## 53 222 casein
## 54 344 meatmeal
## 55 263 unknown
## 56 148 linseed
## 57 318 casein
## 58 - meatmeal
## 59 258 meatmeal
## 60 <NA> sunflower
## 61 325 meatmeal
## 62 217 ## 63 271 linseed
## 64 244 linseed
## 65 341 sunflower
## 66 141 ?
## 67 158 soybean
## 68 423 sunflower
## 69 316 <NA>
## 70 <NA> soybean
## 71 casein
Now that the dataset is clean, let’s see what percentage of our data is missing.
4. Calculate the percentage of missing data from: . The weight column . The feed column . The entire dataset.
Print out each result
weight_na <- sum(is.na(ch_df$weight))/nrow(ch_df)*100
weight_na
## [1] 9.859155
feed_na <- sum(is.na(ch_df$feed))/nrow(ch_df)*100
feed_na
## [1] 5.633803
missing_whole <- sum(is.na(ch_df))/nrow(ch_df)*100
missing_whole
## [1] 15.49296
EXTRA CREDIT (Optional)
: Figure out how to create these print statements so that the name and percentage
number are not hard-coded into the statement. In other words, so that the name and percentage number are read
in dynamically (for example, from a variable, from a function call, etc.) instead of just written in the statement.
#I think thats what i did above, not sure
Data Investigation
5. Group the data by feed and find the mean and median weight for each group. Your result should be a new
data frame with the group means in a column named weight_mean and the group medians in a column
named weight_median. Save this new data frame; you can name the data frame as you wish. (Remember
that variable names should be somewhat descriptive of what they contain.)
library
(dplyr)
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## ## filter, lag
## The following objects are masked from 'package:base':
## ## intersect, setdiff, setequal, union
ch_df$weight <- as.numeric(as.character(ch_df$weight))
## Warning: NAs introduced by coercion
clean_feed<- ch_df %>%
group_by(feed)%>%
summarize(weight_mean = mean(weight, na.rm= TRUE),
weight_median = median(weight, na.rm= TRUE) )
clean_feed
## # A tibble: 11 × 3
## feed weight_mean weight_median
## <chr> <dbl> <dbl>
## 1 "" 237 237 ## 2 "?" 190. 161 ## 3 "casein" 314. 325 ## 4 "horsebean" 161. 160 ## 5 "linseed" 232. 236.
## 6 "meatmeal" 304. 315 ## 7 "not sure" 329 329 ## 8 "soybean" 242. 249 ## 9 "sunflower" 353. 340 ## 10 "unknown" 263 263 ## 11 <NA> 312 316
#https://www.statology.org/group-summarize-data-r/
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
6. Find the feed that has the maximum median chicken weight.
max_median<- clean_feed%>%
filter(weight_median == max(weight_median))
max_median
## # A tibble: 1 × 3
## feed weight_mean weight_median
## <chr> <dbl> <dbl>
## 1 sunflower 353. 340
7. Create a quick histogram of the weight from the original data frame.
library
(ggplot2)
ggplot(ch_df, aes(x=weight))+ geom_histogram(binwidth= 15)
## Warning: Removed 15 rows containing non-finite values (`stat_bin()`).
#https://stackoverflow.com/questions/51234843/error-mapping-should-be-created-with-aes-or-aes
#https://ggplot2.tidyverse.org/reference/geom_histogram.html
8. Create a box plot with feed type as the X axis.
ggplot(ch_df, aes(feed, weight ))+
geom_boxplot(varwidth=1 )
## Warning: Removed 15 rows containing non-finite values (`stat_boxplot()`).
#https://ggplot2.tidyverse.org/reference/geom_boxplot.html
9. What do these graphs tell you? Does the box plot confirm your median calculations? If yes, how so? Are
there any outliers displayed in either graph? Confirm this using the five number summary for specific feed
types and the IQR.
#The outliers are removed for the both the plot and the calculations, yes there is an outlier in the weight of an unknown feed, horsebean and soybean, yes the boxplot confirms my calculations fivenum(ch_df$weight)
## [1] 108.0 209.5 261.5 326.0 423.0
# fivenum sum also confirms it, couldnt figure out how to do it with a specific feed type