Project - Chicken

pdf

School

Montgomery College *

*We aren’t endorsed by this school

Course

101

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

pdf

Pages

8

Uploaded by GrandSparrowMaster909

Report
Project - Chicken Yeabsira Alemu Data Import 1. Download chicken.csv to your working directory. Make sure to set your working directory appropriately! This dataset was created by modifying the R built-in dataset chickwts. 2. Import the chicken.csv data into R. Store it in a data.frame named ch_df and print out the entire ch_df to the screen. ch_df<- read.csv("chicken.csv") ch_df
## weight feed ## 1 206 meatmeal ## 2 140 horsebean ## 3 <NA> <NA> ## 4 318 sunflower ## 5 332 casein ## 6 na horsebean ## 7 216 na ## 8 143 horsebean ## 9 271 soybean ## 10 315 meatmeal ## 11 227 horsebean ## 12 N/A sunflower ## 13 322 sunflower ## 14 352 casein ## 15 329 not sure ## 16 N/A linseed ## 17 379 casein ## 18 153 ? ## 19 N/A linseed ## 20 213 linseed ## 21 257 ## 22 179 horsebean ## 23 380 meatmeal ## 24 327 soybean ## 25 260 linseed ## 26 168 horsebean ## 27 248 soybean ## 28 181 linseed ## 29 160 horsebean ## 30 <NA> sunflower ## 31 soybean ## 32 340 sunflower ## 33 260 casein ## 34 169 ? ## 35 171 soybean ## 36 368 casein ## 37 283 casein ## 38 334 sunflower ## 39 - unknown ## 40 309 linseed ## 41 soybean ## 42 295 ? ## 43 404 <NA> ## 44 392 sunflower ## 45 na casein ## 46 267 soybean ## 47 303 meatmeal ## 48 250 soybean ## 49 243 soybean ## 50 108 horsebean ## 51 229 linseed
## 52 <NA> horsebean ## 53 222 casein ## 54 344 meatmeal ## 55 263 unknown ## 56 148 linseed ## 57 318 casein ## 58 - meatmeal ## 59 258 meatmeal ## 60 <NA> sunflower ## 61 325 meatmeal ## 62 217 ## 63 271 linseed ## 64 244 linseed ## 65 341 sunflower ## 66 141 ? ## 67 158 soybean ## 68 423 sunflower ## 69 316 <NA> ## 70 na soybean ## 71 casein Clean Missing Values There are some missing values in this dataset. Unfortunately they are represented in a number of different ways. 3. Clean up this dataset by doing the following: Calculate how many elements in the original ch_df are recognized as NA by R. Change all of the missing elements to NA in ch_df. You do NOT have to fill in the missing values. Just leave them as NA. na_df <- sum(is.na(ch_df)) ch_df[ch_df== "na"]<- NA ch_df
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## weight feed ## 1 206 meatmeal ## 2 140 horsebean ## 3 <NA> <NA> ## 4 318 sunflower ## 5 332 casein ## 6 <NA> horsebean ## 7 216 <NA> ## 8 143 horsebean ## 9 271 soybean ## 10 315 meatmeal ## 11 227 horsebean ## 12 N/A sunflower ## 13 322 sunflower ## 14 352 casein ## 15 329 not sure ## 16 N/A linseed ## 17 379 casein ## 18 153 ? ## 19 N/A linseed ## 20 213 linseed ## 21 257 ## 22 179 horsebean ## 23 380 meatmeal ## 24 327 soybean ## 25 260 linseed ## 26 168 horsebean ## 27 248 soybean ## 28 181 linseed ## 29 160 horsebean ## 30 <NA> sunflower ## 31 soybean ## 32 340 sunflower ## 33 260 casein ## 34 169 ? ## 35 171 soybean ## 36 368 casein ## 37 283 casein ## 38 334 sunflower ## 39 - unknown ## 40 309 linseed ## 41 soybean ## 42 295 ? ## 43 404 <NA> ## 44 392 sunflower ## 45 <NA> casein ## 46 267 soybean ## 47 303 meatmeal ## 48 250 soybean ## 49 243 soybean ## 50 108 horsebean ## 51 229 linseed
## 52 <NA> horsebean ## 53 222 casein ## 54 344 meatmeal ## 55 263 unknown ## 56 148 linseed ## 57 318 casein ## 58 - meatmeal ## 59 258 meatmeal ## 60 <NA> sunflower ## 61 325 meatmeal ## 62 217 ## 63 271 linseed ## 64 244 linseed ## 65 341 sunflower ## 66 141 ? ## 67 158 soybean ## 68 423 sunflower ## 69 316 <NA> ## 70 <NA> soybean ## 71 casein Now that the dataset is clean, let’s see what percentage of our data is missing. 4. Calculate the percentage of missing data from: . The weight column . The feed column . The entire dataset. Print out each result weight_na <- sum(is.na(ch_df$weight))/nrow(ch_df)*100 weight_na ## [1] 9.859155 feed_na <- sum(is.na(ch_df$feed))/nrow(ch_df)*100 feed_na ## [1] 5.633803 missing_whole <- sum(is.na(ch_df))/nrow(ch_df)*100 missing_whole ## [1] 15.49296 EXTRA CREDIT (Optional) : Figure out how to create these print statements so that the name and percentage number are not hard-coded into the statement. In other words, so that the name and percentage number are read in dynamically (for example, from a variable, from a function call, etc.) instead of just written in the statement. #I think thats what i did above, not sure
Data Investigation 5. Group the data by feed and find the mean and median weight for each group. Your result should be a new data frame with the group means in a column named weight_mean and the group medians in a column named weight_median. Save this new data frame; you can name the data frame as you wish. (Remember that variable names should be somewhat descriptive of what they contain.) library (dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union ch_df$weight <- as.numeric(as.character(ch_df$weight)) ## Warning: NAs introduced by coercion clean_feed<- ch_df %>% group_by(feed)%>% summarize(weight_mean = mean(weight, na.rm= TRUE), weight_median = median(weight, na.rm= TRUE) ) clean_feed ## # A tibble: 11 × 3 ## feed weight_mean weight_median ## <chr> <dbl> <dbl> ## 1 "" 237 237 ## 2 "?" 190. 161 ## 3 "casein" 314. 325 ## 4 "horsebean" 161. 160 ## 5 "linseed" 232. 236. ## 6 "meatmeal" 304. 315 ## 7 "not sure" 329 329 ## 8 "soybean" 242. 249 ## 9 "sunflower" 353. 340 ## 10 "unknown" 263 263 ## 11 <NA> 312 316 #https://www.statology.org/group-summarize-data-r/
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
6. Find the feed that has the maximum median chicken weight. max_median<- clean_feed%>% filter(weight_median == max(weight_median)) max_median ## # A tibble: 1 × 3 ## feed weight_mean weight_median ## <chr> <dbl> <dbl> ## 1 sunflower 353. 340 7. Create a quick histogram of the weight from the original data frame. library (ggplot2) ggplot(ch_df, aes(x=weight))+ geom_histogram(binwidth= 15) ## Warning: Removed 15 rows containing non-finite values (`stat_bin()`). #https://stackoverflow.com/questions/51234843/error-mapping-should-be-created-with-aes-or-aes #https://ggplot2.tidyverse.org/reference/geom_histogram.html 8. Create a box plot with feed type as the X axis.
ggplot(ch_df, aes(feed, weight ))+ geom_boxplot(varwidth=1 ) ## Warning: Removed 15 rows containing non-finite values (`stat_boxplot()`). #https://ggplot2.tidyverse.org/reference/geom_boxplot.html 9. What do these graphs tell you? Does the box plot confirm your median calculations? If yes, how so? Are there any outliers displayed in either graph? Confirm this using the five number summary for specific feed types and the IQR. #The outliers are removed for the both the plot and the calculations, yes there is an outlier in the weight of an unknown feed, horsebean and soybean, yes the boxplot confirms my calculations fivenum(ch_df$weight) ## [1] 108.0 209.5 261.5 326.0 423.0 # fivenum sum also confirms it, couldnt figure out how to do it with a specific feed type