Bike-Prediction_updated

.pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6414

Subject

Geography

Date

Apr 3, 2024

Type

pdf

Pages

41

Uploaded by ChancellorLlama3214

Report
Overview Background With increasing urban population, traffic congestion and saturation and/or lack of public transportation bike sharing proved to be an ingenious environment friendly solution for daily commuters. There has been steady increase in the number of bike share programs worldwide reaching 1608 bike share programs with a fleet of 18.2 million bikes in 2018. Despite the steady growth in bike sharing programs one of the key challenges faced by aggregators is to estimate the demand for bikes and allocate resources accordingly as the usage rates vary from around three to eight trips per bicycle per day globally2. The variation in usage could be due to multitude of factors one of which we believe are the prevalent weather conditions. We can expect that passengers are more likely to choose bike rides on days when the weather is pleasant without snowfall and/or heavy winds. Another important factor is time during the day. The demand is more during morning and evening peak traffic hours, and lesser during other times of the day. Further, a study carried out by Bowman Cutter and Matthew Neidell’s on the effect of voluntary information disclosure of information on air quality urging people to reduce ozone emissions found that there is an increase in people choosing alternate methods of transportation on days such warnings are issued, supporting the idea that weather parameters have an effect on individual’s behavior and choices. Data Description The response variable is: Y (Cnt): Total bikes rented by both casual & registered users together The predicting variables are: X 1 (Instant): Record index X 2 (Dteday): Day on which the observation is made X 3 (Season): Season which the observation is made (1 = Winter, 2 = Spring, 3 = Summer, 4 = Fall) X 4 (Yr): Year on which the observation is made X 5 (Mnth): Month on which the observation is made X 6 (Hr): Day on which the observation is made (0 through 23) X 7 (Holiday): Indictor of a public holiday or not (1 = public holiday, 0 = not a public holiday) X 8 (Weekday): Day of week (0 through 6) X 9 (Working day): Indicator of a working day (1 = working day, 0 = not a working day) X 10 (Weathersit): Weather condition (1 = Clear, Few clouds, Partly cloudy, Partly cloudy, 2 = Mist & Cloudy, Mist & Broken clouds, Mist & Few clouds, Mist, 3 = Light Snow, Light Rain, Thunderstorm & Scattered clouds, Light Rain & Scattered clouds, 4 = Heavy Rain, Ice Pallets, Thunderstorm & Mist, Snow & Fog) X 11 (Temp): Normalized temperature in Celsius X 12 (Atemp): Normalized feeling temperature in Celsius X 13 (Hum): Normalized humidity X 14 (Windspeed): Normalized wind speed X 15 (Casual): Bikes rented by casual users in that hour X 16 (Registered): Bikes rented by registered users in that hour 1
Exploratory Data Analysis Reading data # Set colors gtblue = rgb( 0 , 48 , 87 , maxColorValue = 255 ) techgold = rgb( 179 , 163 , 105 , maxColorValue = 255 ) buzzgold = rgb( 234 , 170 , 0 , maxColorValue = 255 ) bobbyjones = rgb( 55 , 113 , 23 , maxColorValue = 255 ) # Read the data using read.csv data = read.csv( ' /Users/Owner/Documents/ISYE6414 Fall2023/ISYE6414Module3RCode&Data-1 (1)/Bikes.csv ' ) # Show the number of observations obs = nrow(data) cat( "There are" , obs, "observations in the data" ) ## There are 17379 observations in the data Response Data Distribution # Check the distribution of the response, cnt hist(data$cnt, main= "" , xlab= "Count of Bike Shares" , border= buzzgold, col= gtblue) 2
Count of Bike Shares Frequency 0 200 400 600 800 1000 0 1000 3000 5000 The frequency of zero bike shares is high, which skews the demand data. # Check the response, cnt, against time of day boxplot(cnt~hr, main= "" , xlab= "Hour" , ylab= "Count of Bike Shares" , col= blues9, data= data) 3
0 2 4 6 8 10 12 14 16 18 20 22 0 200 400 600 800 1000 Hour Count of Bike Shares The number of bike shares between hour 0 and hour 6 is low. The majority activity as expected is focused between hour 7 and hour 23, peaking at hour 8 and hour 17. par( mfrow= c( 1 , 2 )) # Plot cnt against season boxplot(cnt~season, main= "" , xlab= "Season" , ylab= "Count of Bike Shares" , col= blues9, data= data) # Plot cnt against weather boxplot(cnt~weathersit, main= "" , xlab= "Weather" , ylab= "Count of Bike Shares" , col= blues9, data= data) 4
1 2 3 4 0 400 800 Season Count of Bike Shares 1 2 3 0 400 800 Weather Count of Bike Shares The number of bikes rented during winter are the lowest. The number of bikes decreases as the weather becomes unfavorable. plot(data$windspeed, data$cnt, xlab= "Scaled Wind Speed" , ylab= "Count of Bike Share" , main= "" , col= gtblue) abline(lm(cnt~windspeed, data= data), col= buzzgold, lty= 2 , lwd= 2 ) 0.0 0.2 0.4 0.6 0.8 0 200 400 600 800 1000 Scaled Wind Speed Count of Bike Share 5
The count of rental bikes seems to decrease as windspeed increases. <- Need to discuss this as the OLS line contradicts this statement. par( mfrow= c( 1 , 2 )) plot(data$temp, data$cnt, xlab= "Scaled Temperature" , ylab= "Count of Bike Share" , main= "" , col= gtblue) abline(lm(cnt~temp, data= data), col= buzzgold, lty= 2 , lwd= 2 ) plot(data$hum, data$cnt, xlab= "Scaled Humidity" , ylab= "Count of Bike Share" , main= "" , col= gtblue) abline(lm(cnt~hum, data= data), col= buzzgold, lty= 2 , lwd= 2 ) 0.0 0.2 0.4 0.6 0.8 1.0 0 400 800 Scaled Temperature Count of Bike Share 0.0 0.2 0.4 0.6 0.8 1.0 0 400 800 Scaled Humidity Count of Bike Share The count of rental bikes seems to decrease as humidity increases although the demand varies within similar ranges at varying humidity levels. The count of rental bikes seems to increase as temperature increases however with much wider variability at larger temperature levels. Preparing the Data # Set a seed for reproducibility set.seed( 9 ) # Remove the irrelevant columns clean_data = data[-c( 1 , 2 , 9 , 15 , 16 )] # Convert the numerical categorical variables to predictors clean_data$season = as.factor(clean_data$season) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help