Please written by computer source Assignment 4 In this assignment you will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.   In this assignment, you will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:   cleaning data with pandas make specific changes with numpy handling date-related values with datetime Note: please consider the flights departing from BOS, JFK, SFO and LAX.   Each question is equally weighted for the total grade.   import os

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Please written by computer source

Assignment 4 In this assignment you will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.

 

In this assignment, you will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:

 

cleaning data with pandas make specific changes with numpy handling date-related values with datetime Note: please consider the flights departing from BOS, JFK, SFO and LAX.

 

Each question is equally weighted for the total grade.

 

import os

import pandas as pd

import pandas.api.types as ptypes

import numpy as np

import datetime as dt

 

airlines_df= pd.read_csv('assets\airlines.csv')

airports_df = pd.read_csv('assets\airports.csv')

flights_df_raw = pd.read_csv('assets\flights.csv', low_memory = False)

 

Question 1: Data Preprocessing

For this question, perform the following:

 

remove rows with missing values

keep flights departing from airports (ORIGIN_AIRPORT) that we want to look at (BOS, JFK, SFO and LAX)

filter out the flights that have more than 1 day delay (DEPARTURE_DELAY)

convert FLIGHT_NUMBER column type to string

SCHEDULED_DEPARTURE is coded as a float where the first two digits indicate the hour and the last two indicate the minutes. Convert this column to datetime format by combining existing columns DAY, MONTH, YEAR and SCHEDULED_DEPARTURE

add IS_DELAYEDcolumn by considering any flight above 15 minutes delay (DEPARTURE_DELAY) are delayed, and any other flight is not delayed

remove YEAR, MONTH, DAY columns

 

def data_preprocess(flights_df):

# YOUR CODE HERE

#raise NotImplementedError()

return df

 

flights_df = data_preprocess(flights_df_raw.copy())

assert len(flights_df) == 535744, "Q1: There should be 535744 observations in the flights dataframe"

 

Question 2

 

NOTE: The column to merge both dataframes are flights_df['ORIGIN_AIRPORT'] and airports_df['IATA_CODE'] and there is no ['NUM_FLIGHTS'] column in the dataframe

 

PLEASE MAKE SURE that the shape of the dataframe return as (4,1) AND number of counts are not equal to 105276

 

Merge flights_df dataframe with airports_df dataframe and return the number of departing flights (NUM_FLIGHTS) per airport (IATA_CODE) across the year.

 

def flights_per_airport(flights_df, airports_df):

# YOUR CODE HERE

raise NotImplementedError()

return df

 

num_flights_df=flights_per_airport(flights_df_raw.copy(), airports_df.copy())

 

assert num_flights_df.shape==(4,1), "Shape of DataFrame should be (4,1)"

assert num_flights_df.columns[0]=='NUM_FLIGHTS', "DataFrame should have a column which is called NUM_FLIGHTS"

assert num_flights_df.loc["BOS", "NUM_FLIGHTS"] == 105276, "The NUM_FLIGHTS for BOS is wrong"

 

PLEASE MAKE SURE that the shape of the dataframe return as (4,1)

 

Question 3

 

For this question, find the top three airline names which have high number of flights and the least percentage of delay compared to other airlines. The result should be a dataframe which has three columns AIRLINE_NAME, NUM_FLIGHTS and PERC_DELAY.

 

NOTE: THERE ARE NO COLUMNS NAMED AIRLINE_NAME AND PERC_DELAY ND NUM_FLIGHTS so you have create them

 

Hint:

 

percentage of delay for each airline is obtained using groupby and apply methods

 

merge flights_df with airlines_df to get the names of top three airlines

 

def top_three_airlines(flights_df, airlines_df):

# YOUR CODE HERE

raise NotImplementedError()

return df

 

top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())

 

assert sorted(list(top_three_airlines_df.columns)) == sorted(['NUM_FLIGHTS', 'PERC_DELAY', 'AIRLINE_NAME']), "Dataframe doesn't have required columns"

assert top_three_airlines_df.loc[0, 'AIRLINE_NAME'] == 'United Air Lines Inc.', "Top airline name doesn't match"

 

Question 4

 

For this question, obtain the monthly percentage of delays for each ORIGIN_AIRPORT.

 

Example Result:

 

     MONTH BOS JFK LAX SFO

0 January 0.1902 0.2257 0.1738 0.xxxx

1 February 0.3248 0.xxxx 0.xxxx 0.xxxx

2 March 0.1984 0.xxxx 0.xxxx 0.xxxx

3 April 0.xxxx 0.xxxx 0.xxxx 0.xxxx

def monthly_airport_delays(flights_df):

# YOUR CODE HERE

raise NotImplementedError()

return df

 

monthly_airport_delays_df = monthly_airport_delays(flights_df_raw.copy())

 

I would like to add the csv files but can't.

 

I need help with this assignment.

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Binary numbers
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-engineering and related others by exploring similar questions and additional content below.
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY