DASC_6025_Assignment_2

.pdf

School

East Carolina University *

*We aren’t endorsed by this school

Course

6905

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

7

Uploaded by SuperHumanWorld12870

Report
Data Cleaning CSCI6905 & DASC6025 Homework/Assignment 2 1. Construct a histogram for the age and income data given in the employee records table. Compare two histogram charts you have created and give an explanation. You can use Excel to draw histogram charts. (20points) The Age Column predominantly represents ages that are typical of adults, ranging from 25 to 41. There's also a single data point representing a very young age of 1. This distribution suggests that most of the data is from adults, with one data point from an infant or toddler. However, the presence of an age of 1,000 is an evident anomaly. In real-world terms, an age of 1,000 is not feasible for humans, indicating it might be an error or a placeholder value in the dataset. This outlier would significantly distort any average or statistical analysis performed on the Age column. It would be essential to address this anomaly by correcting it (if it's an error) or omitting it from certain analyses.
The income values seem to revolve around the middle-class range, depending on the currency and the country's economic standards. The data suggests that most incomes are closely clustered, with no drastic outliers. Incomes of 80 and 120 are the most common, as they appear twice. The range of incomes suggests some variability but not a significant disparity among the data points. The spread is 60 units (from 70 to 130). In general, the Age dataset contains a prominent outlier of 1,000 years, making it skewed and challenging to interpret, while the Income dataset is more uniformly distributed between 70 and 130 with no extreme outliers, indicating a relatively consistent income range among the data points. Kindly refer to . All the results and calculations are there. Employees
2. Construct a histogram for the age and income data given in the employee records table 1. Use Jupyter Notebook to construct the graph of the age and income columns. import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv( "Data.csv" ) plt.figure(figsize=( 10 , 4 )) plt.subplot( 1 , 2 , 1 ) plt.hist(df[ "Age" ], bins= 30 , color= "blue" , alpha= 0.7 ) plt.title( "Age Histogram" ) plt.xlabel( "Age" ) plt.ylabel( "Frequency" ) plt.subplot( 1 , 2 , 2 ) plt.hist(df[ "Income" ], bins= 8 , color= "green" , alpha= 0.7 ) plt.title( "Income Histogram" ) plt.xlabel( "Income" ) plt.ylabel( "Frequency" ) plt.tight_layout() plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help