Assignment-6-Introduction-to-working-with-R-RStudio
docx
School
University of Saskatchewan *
*We aren’t endorsed by this school
Course
311
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
4
Uploaded by MagistrateStar1002
Assignment #6: Introduction to Working with R/RStudio
Submission Instructions
Due:
Friday, April 6, 2018 at 11:59 PM.
Submit the following four
files through Canvas>Assignments>To-Do: (1)
The completed, working R script that produced the analysis in Steps 1 through 9
(2)
The output file – descriptivesOutput.txt (3)
Another output file – histogram.pdf
(4)
The completed answer sheet provided on the last page and also as a separate word file
If you do not follow the instructions, your assignment will be counted late.
o
Late Assignment policy: Same as before.
Evaluation
Your submission will be graded based on the correctness of the completed answer sheet, with other files
as supporting documents.
Before you start
For this assignment, you’ll run simple analyses by modifying the R script you used in the ICA #11 (
Descriptives.r
). You will also need a new data set – OnTimeAirport2017Dec.csv
, which contains actual data regarding on-time flight statistics for 83,915 flights, by airline and airport, for December 2017, collected from Bureau of Transportation Statistics.
1
IMPORTANT! When downloading the .csv file, please make sure that the name doesn’t change, and that it is in the same folder as the Descriptives.r file that you are modifying
.
The metadata for the – OnTimeAirport2017Dec.csv spreadsheet is below:
Variable Name
Variable Description
FlightDate
The date of the flight (mm/dd/yyyy)
UniqueCarrier
The unique carrier code
CarrierlName
The name of the carrier
FlightNum
Flight Number
Origin
The origin airport of the flight
OriginCity
The origin city of the flight
Dest
The destination airport of the flight
DestCity
The destination city of the flight
DepDelay
The delay in departing from the origin gate (in minutes)
TaxiOut
The minutes spent taxiing out to the runway at origin
TaxiIn
The minutes spent taxiing in from the runway at destination
ArrDelay
The delay in arrive to the destination gate (in minutes)
Cancelled
Whether the flight was cancelled (0 = no, 1 = yes)
AirTime
Flight Time (in minutes)
Distance
The total distance of the flight (in miles)
1
https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
Modifying the Descriptives.r script
To complete the assignment, modify the Descriptives.r
script (used in ICA #11) to perform an analysis of departure delays by origin airport, following the instructions below, and complete the answer sheet on the last page
.
1)
Use OnTimeAirport2017Dec.csv as the input file.
HINT: In line 21 of the Descriptives.r script, it says:
INPUT_FILENAME <- "NBA14Salaries.csv"
Change that line to:
INPUT_FILENAME <- "OnTimeAirport2017Dec.csv"
2)
Present the number of flights, grouped by destination airport (using Dest
).
HINT: In line 61, change the line to read:
summary(dataSet$Dest)
This presents the number of observations/rows (flights) by destination airport. You will need the output from this command to answer the first question in the answersheet on the last page.
3)
Present summary statistics for arrival delay (using ArrDelay
).
HINT: In line 66, change the line by replacing Salary with ArrDelay
:
describe(dataSet$ArrDelay)
4)
Present summary statistics for arrival delay (using ArrDelay
), grouped by airline carriers (using UniqueCarrier
).
HINT: Check line 73 in the script:
describeBy(dataSet$Salary,dataSet$Position)
This presents summary statistics for salary by position (for the NBA salary data). Now that we are using a different data set, you should be able to figure out how to change line 73 to present summary statistics for arrival delay (
ArrDelay
), grouped by airline carrier (
UniqueCarrier
).
If you get that, you will now be able to answer questions 2 through 4 on the answer sheet!
5)
Compare, using a t-test, the arrival delays for two airline carrier
s (using UniqueCarrier
)
, American Airlines (AA) and United Airlines (UA).
HINT: Now please change line 87 and line 93 on your own. Hopefully the first few steps will get you started!
Check line 87:
subset <- dataSet[ which(dataSet$Position=='PG' | dataSet$Position=='SF'), ] This create a subset with only two positions: PG and SF (for the NBA salary data). Now that we are using a different data set, you should be able to figure out how to change this line to create a subset with only two airline carriers: AA and UA.
Check line 93:
Page 2
t.test(subset$Salary~subset$Position)
This runs a t-test by using Salary as your dependent variable and Position as your grouping variable (for the NBA salary data). Now with the airport data, you should be able to change this line by using ArrDelay
as the dependent variable, and UniqueCarrier
as the grouping variable.
6)
Create a histogram, properly labeled, of the overall distribution of arrival delays (using ArrDelay
) for all flights.
HINT: You will need to change the hist()
function in both line 106 and line 112. You also need to change line 25 & line 27 for the label and title of the histogram. In addition, in line 24, change the number of breaks (NUM_BREAKS) to 50 so you will see more vertical bars in the histogram.
Once you’ve completed this part, add several new lines to the script that does the following 7), 8), and 9):
NOTE: Make sure you add these lines right before the sink()
function (line 96) so that the results are included in your text file output.
7)
Use describeBy()
to compare the flight distance (
Distance
) across airlines (using UniqueCarrier
).
8)
Use describeBy()
to compare the taxiing out time (
TaxiOut
) across origin airports (
Origin
).
9)
Answer this question using a t-test: Do planes spend more time taxiing out to the runway in Newark (EWR) or Philadelphia (PHL) as the origin airport? (using TaxiOut
as the taxiing out time, and Origin
as the origin airport); Once you’ve completed all the 9 steps, you can set the working directory and run the script. Based on your script output, answer the 11 questions listed on the answersheet on the next page.
Answer Sheet on the Next Page……
1.
Page 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Answer Sheet for Assignment: Introduction to Working with R/RStudio
Name __________________________________
Answer the questions below based on your script output
Question
Answer
1
How many total flights (including cancelled flights) have Philadelphia (PHL) as the destination airport during December 2017?
2
What was the average arrival delay (in minutes) across all flights during December 2017?
3
What was the average arrival delay (in minutes) for American Airlines (with UniqueCarrier code of AA) during December 2017?
4
What was the longest arrival delay for United Airlines (with UniqueCarrier code of UA) during December 2017?
5
On average, which airline (using UniqueCarrier) experienced greater arrival delays: American Airline (AA) or United Airlines (UA)?
6
For question #5, was this difference statistically significant? What is the p-value?
(answer both questions in the blank to the right)
7
Which airline(s) had longest average flight distance?
(you can list more than one if it’s a tie)
8
Which airline (s) had shortest average flight distance?
(you can list more than one if it’s a tie)
9
On average, which origin airport (using Origin) experienced greater taxi out times: Newark (EWR) or Philadelphia (PHL)?
1
0
For question #9, was this difference statistically significant? What is the p-value?
(answer both questions in the blank to the right)
1
1
Looking at the histogram. Is the distribution symmetric? Are most flights delayed less than 50 minutes or more than 50 minutes?
Page 4
Related Documents
Related Questions
need evaluation steps
arrow_forward
You have been asked to complete a short skills assessment exam that will be given to screen applicants to a Jr. Operations Analyst position.
check the attched pic for full question
arrow_forward
Describe the three primary charts and graphs used to organize and display data.
arrow_forward
can a cause and effect relationship be determined?
arrow_forward
A survey about social media reported that 82% of B2B marketers (marketers that focus primarily on
attracting businesses) plan to increase their use of social media, as compared to 55% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,286 B2B marketers and 1,731
B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A. What is the probability that a randomly selected respondent is a B2C marketer?
B. What is the probability that a randomly selected respondent plans to increase use of social media
or is a B2C marketer?
C. Explain the difference in the results in (a) and (b)
arrow_forward
A survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A Click the icon to view the contingency table about social media use and marketers.
Contingency table
a. What is the probability that a randomly selected respondent plans to increase use of social media?
(Round to three decimal places as needed.)
Increase Use of
Social Media?
Business Focus
B2B
B2C
Total
b. What is the probability that a randomly selected respondent is a B2C marketer?
Yes
1,049
901
1,950
(Round to three decimal places as needed.)
No
284
768
1,052
Total
1,333
1,669
3,002
c. What is the probability that a randomly selected respondent plans to increase use of social media or is…
arrow_forward
(Please do not give solution in image format thanku)
arrow_forward
A data set contains the observations 7, 4, 2, 3, 1. Findx J2.
arrow_forward
Sahar Rasoul-Math 7 End of Yea X Gspy ninjas book-Google
docs.google.com/spreadsheets/d/1j5MotWzsc0V1V3Qyl4rbP_OFOUotaNXCIIFax>
Copy of Copy of Col...
8.8
Sahar Rasoul - Math 7 End of Year Digital Task Cards Student Version ☆
File Edit View Insert Format Data Tools Extensions Help Last edit was 5 minu
$ % .0 .00 123 Century Go... ▼ 18 Y BIS
fx| =IF(B4="Question 1", Sheet2! H21, if(B4="Question 2", Sheet2! H22, IF(B4="
n
100%
36:816
A
B
C
6
16
A flashlight can light
a circular area of up
to 6 feet in diameter.
What is the maximum
area that can be lit?
Round to the nearest
tenth.
30x
0004
15
A Sheet1
https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.amazon.com%2FSpy-Ninjas-Ultimate-Guidebook-Scholastic%2Fdp
7
8
9
10
11
12
13
14
3
5.
7.
a
5
$9
A
arrow_forward
The university administration has assembled the data for a twelve-month period pertaining to the monthly total
costs of providing the service and the corresponding number of students who used the laundering facilities each
month. You were recently taught how to use the Excel graphing tool and a member of the team successfully
generated the scattergram given below from the data set provided.
Total Laundering Costs
300,000
250,000
200,000
150,000
100,000
50,000
0
0
V
+
ANGLA RUSTON UNIVERSITY SCATTER DIAGRAM
500
1,000
1,500
Line of Best Fit
# of Students Laundering
2,000
2,500
3,000
The other team members are now tasked to use the graph to provide the administrators with a detailed response
to the following questions:
a) What is another name for the "line of best fit" in Excel? What is the purpose of this line?
b) Using the line of best fit", determine Angla Ruston University's fixed cost per month and the variable cost
per student. (Use 0 & 2,500 students.)
c) Based on the scatter gram,…
arrow_forward
I need some assistance solving Part B of this question. Refer to the excel data in the image provided to answer Part B. SoftBus Company sells PC equipment and customized software to small companies to help them manage their day-to-day business activities. Although SoftBus spends time with all customers to understand their needs, the customers are eventually on their own to use the equipment and software intelligently. To understand its customers better, SoftBus recently sent questionnaires to a large number of prospective customers. Key personnel—those who would be using the software—were asked to fill out the questionnaire. SoftBus received 82 usable responses, as shown in the file. You can assume that these employees represent a random sample of all of SoftBus's prospective customers. SoftBus believes it can afford to spend much less time with customers who own PCs and score at least 4 on PC Knowledge. Let's call these the "PC-savvy" customers. On the other hand, SoftBus believes it…
arrow_forward
Use the data provided on Canvas. In automotive assembly processes, automation cannot always guarantee the dimensional accuracy of a car assembly as required by the design specification. Thus, some skillful workers will visually inspect those assembled car bodies and conduct manual adjustments when necessary. These workers are called "Fitter" in the automotive industry. This scenario is illustrated in the following
arrow_forward
Use the given minimum and maximum data entries, and the number of classes, to find the class width, the lower class limits, and the
upper
class limits.
minimum = 8, maximum = 65, 6 classes
%3D
The class width is
arrow_forward
The whole data set will be in the two pictures
arrow_forward
Use the given minimum and maximum data entries, and the number of classes, to find the class width, the lower class limits, and the upper class limits.
minimum=9,
maximum=83,
6 classes
arrow_forward
The entirety of the data set will be in the two pictures
arrow_forward
Please use the given info to answer the subquestion Part B
arrow_forward
In IBM SPSS, what does clicking on this icon do?
arrow_forward
• Open RStudio.
• Open a new R Script.
• Copy and paste the below code in R. (to present your answer for this task, take a screenshot of
your RStudio session, and paste it into your assessment solutions document).
• Run the code, then copy and paste the outputs and the graphs into your assessment solutions
document.
• Copy and paste the code in your assignment and use comments to explain each line of the code.
Note: to get help in R, highlight the code and press F1.
Code to be copied and pasted into an R script in RStudio:
Treatment <- rep (c("Level_1", "Level_2","Level_3"), each=10)
Response <- c(11, 8, 10, 12, 8, 12, 9, 9, 4, 12,
12, 9, 12, 10, 18, 24, 17, 16, 20, 11,
14, 7, 18, 8, 12, 10, 9, 14, 12, 10)
Data Question_3<-data.frame(Treatment, Response)
View(Data_Question 3)
aggregate (Data Question_3$Response, list (Data Question_3$Treatment), mean)
aggregate (Data_Question_3$Response, list (Data Question_3$Treatment), sd)
aggregate (Response Treatment, Data Question 3, function…
arrow_forward
Describe the four levels of data measurement, and provide examples of when each method might be appropriate.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Related Questions
- need evaluation stepsarrow_forwardYou have been asked to complete a short skills assessment exam that will be given to screen applicants to a Jr. Operations Analyst position. check the attched pic for full questionarrow_forwardDescribe the three primary charts and graphs used to organize and display data.arrow_forward
- can a cause and effect relationship be determined?arrow_forwardA survey about social media reported that 82% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 55% of B2C marketers (marketers that primarily target consumers). The survey was based on 1,286 B2B marketers and 1,731 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below. A. What is the probability that a randomly selected respondent is a B2C marketer? B. What is the probability that a randomly selected respondent plans to increase use of social media or is a B2C marketer? C. Explain the difference in the results in (a) and (b)arrow_forwardA survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers (marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below. A Click the icon to view the contingency table about social media use and marketers. Contingency table a. What is the probability that a randomly selected respondent plans to increase use of social media? (Round to three decimal places as needed.) Increase Use of Social Media? Business Focus B2B B2C Total b. What is the probability that a randomly selected respondent is a B2C marketer? Yes 1,049 901 1,950 (Round to three decimal places as needed.) No 284 768 1,052 Total 1,333 1,669 3,002 c. What is the probability that a randomly selected respondent plans to increase use of social media or is…arrow_forward
- (Please do not give solution in image format thanku)arrow_forwardA data set contains the observations 7, 4, 2, 3, 1. Findx J2.arrow_forwardSahar Rasoul-Math 7 End of Yea X Gspy ninjas book-Google docs.google.com/spreadsheets/d/1j5MotWzsc0V1V3Qyl4rbP_OFOUotaNXCIIFax> Copy of Copy of Col... 8.8 Sahar Rasoul - Math 7 End of Year Digital Task Cards Student Version ☆ File Edit View Insert Format Data Tools Extensions Help Last edit was 5 minu $ % .0 .00 123 Century Go... ▼ 18 Y BIS fx| =IF(B4="Question 1", Sheet2! H21, if(B4="Question 2", Sheet2! H22, IF(B4=" n 100% 36:816 A B C 6 16 A flashlight can light a circular area of up to 6 feet in diameter. What is the maximum area that can be lit? Round to the nearest tenth. 30x 0004 15 A Sheet1 https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.amazon.com%2FSpy-Ninjas-Ultimate-Guidebook-Scholastic%2Fdp 7 8 9 10 11 12 13 14 3 5. 7. a 5 $9 Aarrow_forward
- The university administration has assembled the data for a twelve-month period pertaining to the monthly total costs of providing the service and the corresponding number of students who used the laundering facilities each month. You were recently taught how to use the Excel graphing tool and a member of the team successfully generated the scattergram given below from the data set provided. Total Laundering Costs 300,000 250,000 200,000 150,000 100,000 50,000 0 0 V + ANGLA RUSTON UNIVERSITY SCATTER DIAGRAM 500 1,000 1,500 Line of Best Fit # of Students Laundering 2,000 2,500 3,000 The other team members are now tasked to use the graph to provide the administrators with a detailed response to the following questions: a) What is another name for the "line of best fit" in Excel? What is the purpose of this line? b) Using the line of best fit", determine Angla Ruston University's fixed cost per month and the variable cost per student. (Use 0 & 2,500 students.) c) Based on the scatter gram,…arrow_forwardI need some assistance solving Part B of this question. Refer to the excel data in the image provided to answer Part B. SoftBus Company sells PC equipment and customized software to small companies to help them manage their day-to-day business activities. Although SoftBus spends time with all customers to understand their needs, the customers are eventually on their own to use the equipment and software intelligently. To understand its customers better, SoftBus recently sent questionnaires to a large number of prospective customers. Key personnel—those who would be using the software—were asked to fill out the questionnaire. SoftBus received 82 usable responses, as shown in the file. You can assume that these employees represent a random sample of all of SoftBus's prospective customers. SoftBus believes it can afford to spend much less time with customers who own PCs and score at least 4 on PC Knowledge. Let's call these the "PC-savvy" customers. On the other hand, SoftBus believes it…arrow_forwardUse the data provided on Canvas. In automotive assembly processes, automation cannot always guarantee the dimensional accuracy of a car assembly as required by the design specification. Thus, some skillful workers will visually inspect those assembled car bodies and conduct manual adjustments when necessary. These workers are called "Fitter" in the automotive industry. This scenario is illustrated in the followingarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill