Assignment-6-Introduction-to-working-with-R-RStudio

.docx

School

University of Saskatchewan *

*We aren’t endorsed by this school

Course

311

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

4

Uploaded by MagistrateStar1002

Report
Assignment #6: Introduction to Working with R/RStudio Submission Instructions Due: Friday, April 6, 2018 at 11:59 PM. Submit the following four files through Canvas>Assignments>To-Do: (1) The completed, working R script that produced the analysis in Steps 1 through 9 (2) The output file – descriptivesOutput.txt (3) Another output file – histogram.pdf (4) The completed answer sheet provided on the last page and also as a separate word file If you do not follow the instructions, your assignment will be counted late. o Late Assignment policy: Same as before. Evaluation Your submission will be graded based on the correctness of the completed answer sheet, with other files as supporting documents. Before you start For this assignment, you’ll run simple analyses by modifying the R script you used in the ICA #11 ( Descriptives.r ). You will also need a new data set – OnTimeAirport2017Dec.csv , which contains actual data regarding on-time flight statistics for 83,915 flights, by airline and airport, for December 2017, collected from Bureau of Transportation Statistics. 1 IMPORTANT! When downloading the .csv file, please make sure that the name doesn’t change, and that it is in the same folder as the Descriptives.r file that you are modifying . The metadata for the – OnTimeAirport2017Dec.csv spreadsheet is below: Variable Name Variable Description FlightDate The date of the flight (mm/dd/yyyy) UniqueCarrier The unique carrier code CarrierlName The name of the carrier FlightNum Flight Number Origin The origin airport of the flight OriginCity The origin city of the flight Dest The destination airport of the flight DestCity The destination city of the flight DepDelay The delay in departing from the origin gate (in minutes) TaxiOut The minutes spent taxiing out to the runway at origin TaxiIn The minutes spent taxiing in from the runway at destination ArrDelay The delay in arrive to the destination gate (in minutes) Cancelled Whether the flight was cancelled (0 = no, 1 = yes) AirTime Flight Time (in minutes) Distance The total distance of the flight (in miles) 1 https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
Modifying the Descriptives.r script To complete the assignment, modify the Descriptives.r script (used in ICA #11) to perform an analysis of departure delays by origin airport, following the instructions below, and complete the answer sheet on the last page . 1) Use OnTimeAirport2017Dec.csv as the input file. HINT: In line 21 of the Descriptives.r script, it says: INPUT_FILENAME <- "NBA14Salaries.csv" Change that line to: INPUT_FILENAME <- "OnTimeAirport2017Dec.csv" 2) Present the number of flights, grouped by destination airport (using Dest ). HINT: In line 61, change the line to read: summary(dataSet$Dest) This presents the number of observations/rows (flights) by destination airport. You will need the output from this command to answer the first question in the answersheet on the last page. 3) Present summary statistics for arrival delay (using ArrDelay ). HINT: In line 66, change the line by replacing Salary with ArrDelay : describe(dataSet$ArrDelay) 4) Present summary statistics for arrival delay (using ArrDelay ), grouped by airline carriers (using UniqueCarrier ). HINT: Check line 73 in the script: describeBy(dataSet$Salary,dataSet$Position) This presents summary statistics for salary by position (for the NBA salary data). Now that we are using a different data set, you should be able to figure out how to change line 73 to present summary statistics for arrival delay ( ArrDelay ), grouped by airline carrier ( UniqueCarrier ). If you get that, you will now be able to answer questions 2 through 4 on the answer sheet! 5) Compare, using a t-test, the arrival delays for two airline carrier s (using UniqueCarrier ) , American Airlines (AA) and United Airlines (UA). HINT: Now please change line 87 and line 93 on your own. Hopefully the first few steps will get you started! Check line 87: subset <- dataSet[ which(dataSet$Position=='PG' | dataSet$Position=='SF'), ] This create a subset with only two positions: PG and SF (for the NBA salary data). Now that we are using a different data set, you should be able to figure out how to change this line to create a subset with only two airline carriers: AA and UA. Check line 93: Page 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help