Assignment 7

.pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

pdf

Pages

14

Uploaded by ColonelMosquitoMaster1017

Report
KumariSimran_Assignment 7 October 11, 2023 Importing the python Libraries [1]: import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from matplotlib.cm import get_cmap import geopandas as gpd from shapely.geometry import Point, Polygon import warnings warnings . filterwarnings( "ignore" ) Q 1 [2]: eq_csv = pd . read_csv( 'Earthquake_Data_2014-2023.csv' ) print (eq_csv . head( 5 )) time latitude longitude depth mag magType \ 0 2013-12-30T23:57:19.646Z 53.512100 -167.173800 14.700 1.60 ml 1 2013-12-30T23:54:42.690Z 37.544667 -118.805833 4.954 0.30 md 2 2013-12-30T23:48:16.800Z -10.211200 -75.405100 27.600 4.60 mb 3 2013-12-30T23:47:15.540Z 33.707500 -116.726833 18.641 0.72 ml 4 2013-12-30T23:47:06.533Z 65.248100 -144.290200 8.200 1.00 ml nst gap dmin rms updated \ 0 NaN NaN NaN 0.34 2023-07-19T20:48:32.372Z 1 8.0 166.0 0.04865 0.02 2017-02-02T05:46:04.803Z 2 NaN 91.0 2.25900 1.01 2014-02-28T08:44:14.000Z 3 14.0 150.0 0.01117 0.05 2016-03-12T05:59:52.919Z 4 NaN NaN NaN 0.65 2023-07-19T20:48:31.152Z place type horizontalError depthError \ 0 58 km SW of Unalaska, Alaska earthquake NaN 1.50 1 8 km W of Aspen Springs, California earthquake 1.53 1.29 2 40 km N of Oxapampa, Peru earthquake NaN 5.10 3 4km SSW of Idyllwild, CA earthquake 0.63 0.51 4 43 km SSE of Central, Alaska earthquake NaN 0.80 magError magNst status locationSource magSource 0 NaN NaN reviewed ak ak 1
1 0.173 7.0 reviewed nc nc 2 0.089 NaN reviewed us us 3 0.098 7.0 reviewed ci ci 4 NaN NaN reviewed ak ak [5 rows x 22 columns] The above data shows earthquake data from the last Dec 01 2013 to Oct 10 2023. It has attributes such as the time of earthquake, the place along with its latitude and longitude, and the magnitude and depth of the earthquake to name a few. Q 2 [3]: print ( 'Data Types: \n ' ,eq_csv . dtypes) print ( ' \n Data Info: \n ' ) print (eq_csv . info()) print ( ' \n ' ) print ( 'Data Summary Statistics: \n ' ,eq_csv . describe()) Data Types: time object latitude float64 longitude float64 depth float64 mag float64 magType object nst float64 gap float64 dmin float64 rms float64 net object id object updated object place object type object horizontalError float64 depthError float64 magError float64 magNst float64 status object locationSource object magSource object dtype: object Data Info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 1388279 entries, 0 to 1388278 Data columns (total 22 columns): 2
# Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 1388279 non-null object 1 latitude 1388279 non-null float64 2 longitude 1388279 non-null float64 3 depth 1388277 non-null float64 4 mag 1387039 non-null float64 5 magType 1387037 non-null object 6 nst 869968 non-null float64 7 gap 1018789 non-null float64 8 dmin 927092 non-null float64 9 rms 1387862 non-null float64 10 net 1388279 non-null object 11 id 1388279 non-null object 12 updated 1388279 non-null object 13 place 1381529 non-null object 14 type 1388279 non-null object 15 horizontalError 878319 non-null float64 16 depthError 1387748 non-null float64 17 magError 961644 non-null float64 18 magNst 991784 non-null float64 19 status 1388279 non-null object 20 locationSource 1388279 non-null object 21 magSource 1388279 non-null object dtypes: float64(12), object(10) memory usage: 233.0+ MB None Data Summary Statistics: latitude longitude depth mag nst \ count 1.388279e+06 1.388279e+06 1.388277e+06 1.387039e+06 869968.000000 mean 3.924650e+01 -1.129268e+02 2.376709e+01 1.630511e+00 19.301795 std 2.025902e+01 6.892230e+01 5.531163e+01 1.239557e+00 16.744273 min -8.288370e+01 -1.799997e+02 -1.000000e+01 -9.990000e+00 0.000000 25% 3.386783e+01 -1.507760e+02 3.270000e+00 8.100000e-01 8.000000 50% 3.859950e+01 -1.215040e+02 8.190000e+00 1.350000e+00 15.000000 75% 5.481405e+01 -1.165747e+02 1.688000e+01 2.100000e+00 24.000000 max 8.738600e+01 1.799994e+02 6.973600e+02 8.300000e+00 452.000000 gap dmin rms horizontalError \ count 1.018789e+06 927092.000000 1.387862e+06 878319.000000 mean 1.223811e+02 0.658837 3.089814e-01 1.981813 std 6.769824e+01 2.322588 2.902084e-01 3.465467 min 7.000000e+00 0.000000 0.000000e+00 0.000000 25% 7.100000e+01 0.023420 1.000000e-01 0.270000 50% 1.060000e+02 0.065300 1.900000e-01 0.460000 75% 1.584000e+02 0.201300 4.900000e-01 1.340000 3
max 3.600000e+02 141.160000 4.624000e+01 194.584100 depthError magError magNst count 1.387748e+06 961644.000000 991784.000000 mean 2.682736e+00 0.213335 15.388853 std 1.154638e+02 0.321005 27.009770 min 0.000000e+00 0.000000 0.000000 25% 4.000000e-01 0.104000 4.000000 50% 7.200000e-01 0.160000 9.000000 75% 1.900000e+00 0.233000 17.000000 max 1.284070e+05 6.190000 941.000000 The above shows that the datatype of the columns is mostly string or float type. The summary statistics shows the mean, standard deviation and the qartile ranges of the numeric values of the data. Q 3 [4]: print (eq_csv . isnull() . sum()) time 0 latitude 0 longitude 0 depth 2 mag 1240 magType 1242 nst 518311 gap 369490 dmin 461187 rms 417 net 0 id 0 updated 0 place 6750 type 0 horizontalError 509960 depthError 531 magError 426635 magNst 396495 status 0 locationSource 0 magSource 0 dtype: int64 Yes, there are missing data in the given dataset. [5]: eq_csv . dropna(subset = [ 'mag' ],inplace = True ) eq_csv . dropna(subset = [ 'magType' ],inplace = True ) eq_csv . dropna(subset = [ 'place' ],inplace = True ) 4
[6]: missing_values = [ 'nst' , 'gap' , 'dmin' , 'rms' , 'horizontalError' , 'depthError' , 'magError' , 'magNst' ] for i in missing_values: eq_csv[i] . fillna(eq_csv[i] . mean(), inplace = True ) The following is the dataset after handling the missing data. [7]: print (eq_csv . isnull() . sum()) time 0 latitude 0 longitude 0 depth 0 mag 0 magType 0 nst 0 gap 0 dmin 0 rms 0 net 0 id 0 updated 0 place 0 type 0 horizontalError 0 depthError 0 magError 0 magNst 0 status 0 locationSource 0 magSource 0 dtype: int64 Q 4 [8]: # Extracting the year, month, and day eq_csv[ 'time' ] = pd . to_datetime(eq_csv[ 'time' ]) eq_csv[ 'updated' ] = pd . to_datetime(eq_csv[ 'updated' ]) eq_csv[ 'time_date' ] = pd . DatetimeIndex(eq_csv[ 'time' ]) . date eq_csv[ 'updated_date' ] = pd . DatetimeIndex(eq_csv[ 'updated' ]) . date eq_csv[ 'time_year' ] = pd . DatetimeIndex(eq_csv[ 'time_date' ]) . year eq_csv[ 'time_month' ] = pd . DatetimeIndex(eq_csv[ 'time_date' ]) . month eq_csv[ 'time_day' ] = pd . DatetimeIndex(eq_csv[ 'time_date' ]) . day [9]: # Yearly and Monthly Distribution Charts yearly_distribution = eq_csv[ 'time_year' ] . value_counts() monthly_distribution = eq_csv[ 'time_month' ] . value_counts() plt . figure(figsize = ( 12 , 6 )) 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help