Assignment 7
.pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6400
Subject
Industrial Engineering
Date
Apr 3, 2024
Type
Pages
14
Uploaded by ColonelMosquitoMaster1017
KumariSimran_Assignment 7
October 11, 2023
Importing the python Libraries
[1]:
import
pandas
as
pd
import
seaborn
as
sns
import
matplotlib.pyplot
as
plt
from
matplotlib.cm
import
get_cmap
import
geopandas
as
gpd
from
shapely.geometry
import
Point, Polygon
import
warnings
warnings
.
filterwarnings(
"ignore"
)
Q 1
[2]:
eq_csv
=
pd
.
read_csv(
'Earthquake_Data_2014-2023.csv'
)
print
(eq_csv
.
head(
5
))
time
latitude
longitude
depth
mag magType
\
0
2013-12-30T23:57:19.646Z
53.512100 -167.173800
14.700
1.60
ml
1
2013-12-30T23:54:42.690Z
37.544667 -118.805833
4.954
0.30
md
2
2013-12-30T23:48:16.800Z -10.211200
-75.405100
27.600
4.60
mb
3
2013-12-30T23:47:15.540Z
33.707500 -116.726833
18.641
0.72
ml
4
2013-12-30T23:47:06.533Z
65.248100 -144.290200
8.200
1.00
ml
nst
gap
dmin
rms
…
updated
\
0
NaN
NaN
NaN
0.34
…
2023-07-19T20:48:32.372Z
1
8.0
166.0
0.04865
0.02
…
2017-02-02T05:46:04.803Z
2
NaN
91.0
2.25900
1.01
…
2014-02-28T08:44:14.000Z
3
14.0
150.0
0.01117
0.05
…
2016-03-12T05:59:52.919Z
4
NaN
NaN
NaN
0.65
…
2023-07-19T20:48:31.152Z
place
type horizontalError depthError
\
0
58 km SW of Unalaska, Alaska
earthquake
NaN
1.50
1
8 km W of Aspen Springs, California
earthquake
1.53
1.29
2
40 km N of Oxapampa, Peru
earthquake
NaN
5.10
3
4km SSW of Idyllwild, CA
earthquake
0.63
0.51
4
43 km SSE of Central, Alaska
earthquake
NaN
0.80
magError
magNst
status
locationSource magSource
0
NaN
NaN
reviewed
ak
ak
1
1
0.173
7.0
reviewed
nc
nc
2
0.089
NaN
reviewed
us
us
3
0.098
7.0
reviewed
ci
ci
4
NaN
NaN
reviewed
ak
ak
[5 rows x 22 columns]
The above data shows earthquake data from the last Dec 01 2013 to Oct 10 2023. It has attributes
such as the time of earthquake, the place along with its latitude and longitude, and the magnitude
and depth of the earthquake to name a few.
Q 2
[3]:
print
(
'Data Types:
\n
'
,eq_csv
.
dtypes)
print
(
'
\n
Data Info:
\n
'
)
print
(eq_csv
.
info())
print
(
'
\n
'
)
print
(
'Data Summary Statistics:
\n
'
,eq_csv
.
describe())
Data Types:
time
object
latitude
float64
longitude
float64
depth
float64
mag
float64
magType
object
nst
float64
gap
float64
dmin
float64
rms
float64
net
object
id
object
updated
object
place
object
type
object
horizontalError
float64
depthError
float64
magError
float64
magNst
float64
status
object
locationSource
object
magSource
object
dtype: object
Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1388279 entries, 0 to 1388278
Data columns (total 22 columns):
2
#
Column
Non-Null Count
Dtype
---
------
--------------
-----
0
time
1388279 non-null
object
1
latitude
1388279 non-null
float64
2
longitude
1388279 non-null
float64
3
depth
1388277 non-null
float64
4
mag
1387039 non-null
float64
5
magType
1387037 non-null
object
6
nst
869968 non-null
float64
7
gap
1018789 non-null
float64
8
dmin
927092 non-null
float64
9
rms
1387862 non-null
float64
10
net
1388279 non-null
object
11
id
1388279 non-null
object
12
updated
1388279 non-null
object
13
place
1381529 non-null
object
14
type
1388279 non-null
object
15
horizontalError
878319 non-null
float64
16
depthError
1387748 non-null
float64
17
magError
961644 non-null
float64
18
magNst
991784 non-null
float64
19
status
1388279 non-null
object
20
locationSource
1388279 non-null
object
21
magSource
1388279 non-null
object
dtypes: float64(12), object(10)
memory usage: 233.0+ MB
None
Data Summary Statistics:
latitude
longitude
depth
mag
nst
\
count
1.388279e+06
1.388279e+06
1.388277e+06
1.387039e+06
869968.000000
mean
3.924650e+01 -1.129268e+02
2.376709e+01
1.630511e+00
19.301795
std
2.025902e+01
6.892230e+01
5.531163e+01
1.239557e+00
16.744273
min
-8.288370e+01 -1.799997e+02 -1.000000e+01 -9.990000e+00
0.000000
25%
3.386783e+01 -1.507760e+02
3.270000e+00
8.100000e-01
8.000000
50%
3.859950e+01 -1.215040e+02
8.190000e+00
1.350000e+00
15.000000
75%
5.481405e+01 -1.165747e+02
1.688000e+01
2.100000e+00
24.000000
max
8.738600e+01
1.799994e+02
6.973600e+02
8.300000e+00
452.000000
gap
dmin
rms
horizontalError
\
count
1.018789e+06
927092.000000
1.387862e+06
878319.000000
mean
1.223811e+02
0.658837
3.089814e-01
1.981813
std
6.769824e+01
2.322588
2.902084e-01
3.465467
min
7.000000e+00
0.000000
0.000000e+00
0.000000
25%
7.100000e+01
0.023420
1.000000e-01
0.270000
50%
1.060000e+02
0.065300
1.900000e-01
0.460000
75%
1.584000e+02
0.201300
4.900000e-01
1.340000
3
max
3.600000e+02
141.160000
4.624000e+01
194.584100
depthError
magError
magNst
count
1.387748e+06
961644.000000
991784.000000
mean
2.682736e+00
0.213335
15.388853
std
1.154638e+02
0.321005
27.009770
min
0.000000e+00
0.000000
0.000000
25%
4.000000e-01
0.104000
4.000000
50%
7.200000e-01
0.160000
9.000000
75%
1.900000e+00
0.233000
17.000000
max
1.284070e+05
6.190000
941.000000
The above shows that the datatype of the columns is mostly string or float type. The summary
statistics shows the mean, standard deviation and the qartile ranges of the numeric values of the
data.
Q 3
[4]:
print
(eq_csv
.
isnull()
.
sum())
time
0
latitude
0
longitude
0
depth
2
mag
1240
magType
1242
nst
518311
gap
369490
dmin
461187
rms
417
net
0
id
0
updated
0
place
6750
type
0
horizontalError
509960
depthError
531
magError
426635
magNst
396495
status
0
locationSource
0
magSource
0
dtype: int64
Yes, there are missing data in the given dataset.
[5]:
eq_csv
.
dropna(subset
=
[
'mag'
],inplace
=
True
)
eq_csv
.
dropna(subset
=
[
'magType'
],inplace
=
True
)
eq_csv
.
dropna(subset
=
[
'place'
],inplace
=
True
)
4
[6]:
missing_values
=
␣
↪
[
'nst'
,
'gap'
,
'dmin'
,
'rms'
,
'horizontalError'
,
'depthError'
,
'magError'
,
'magNst'
]
for
i
in
missing_values:
eq_csv[i]
.
fillna(eq_csv[i]
.
mean(), inplace
=
True
)
The following is the dataset after handling the missing data.
[7]:
print
(eq_csv
.
isnull()
.
sum())
time
0
latitude
0
longitude
0
depth
0
mag
0
magType
0
nst
0
gap
0
dmin
0
rms
0
net
0
id
0
updated
0
place
0
type
0
horizontalError
0
depthError
0
magError
0
magNst
0
status
0
locationSource
0
magSource
0
dtype: int64
Q 4
[8]:
# Extracting the year, month, and day
eq_csv[
'time'
]
=
pd
.
to_datetime(eq_csv[
'time'
])
eq_csv[
'updated'
]
=
pd
.
to_datetime(eq_csv[
'updated'
])
eq_csv[
'time_date'
]
=
pd
.
DatetimeIndex(eq_csv[
'time'
])
.
date
eq_csv[
'updated_date'
]
=
pd
.
DatetimeIndex(eq_csv[
'updated'
])
.
date
eq_csv[
'time_year'
]
=
pd
.
DatetimeIndex(eq_csv[
'time_date'
])
.
year
eq_csv[
'time_month'
]
=
pd
.
DatetimeIndex(eq_csv[
'time_date'
])
.
month
eq_csv[
'time_day'
]
=
pd
.
DatetimeIndex(eq_csv[
'time_date'
])
.
day
[9]:
# Yearly and Monthly Distribution Charts
yearly_distribution
=
eq_csv[
'time_year'
]
.
value_counts()
monthly_distribution
=
eq_csv[
'time_month'
]
.
value_counts()
plt
.
figure(figsize
=
(
12
,
6
))
5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help