DASC_6025_Assignment_2
.pdf
keyboard_arrow_up
School
East Carolina University *
*We aren’t endorsed by this school
Course
6905
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
7
Uploaded by SuperHumanWorld12870
Data Cleaning CSCI6905 & DASC6025
Homework/Assignment 2
1.
Construct a histogram for the age and income data given in the employee
records table. Compare two histogram charts you have created and give an
explanation. You can use Excel to draw histogram charts. (20points)
The Age Column predominantly represents ages that are typical of adults,
ranging from 25 to 41. There's also a single data point representing a very young
age of 1. This distribution suggests that most of the data is from adults, with one
data point from an infant or toddler.
However, the presence of an age of 1,000 is an evident anomaly. In real-world
terms, an age of 1,000 is not feasible for humans, indicating it might be an error or
a placeholder value in the dataset. This outlier would significantly distort any
average or statistical analysis performed on the Age column. It would be essential
to address this anomaly by correcting it (if it's an error) or omitting it from certain
analyses.
The income values seem to revolve around the middle-class range, depending on
the currency and the country's economic standards. The data suggests that most
incomes are closely clustered, with no drastic outliers. Incomes of 80 and 120 are
the most common, as they appear twice. The range of incomes suggests some
variability but not a significant disparity among the data points. The spread is 60
units (from 70 to 130).
In general, the Age dataset contains a prominent outlier of 1,000 years, making it
skewed and challenging to interpret, while the Income dataset is more uniformly
distributed between 70 and 130 with no extreme outliers, indicating a relatively
consistent income range among the data points.
Kindly refer to
. All the results and calculations are there.
Employees
2.
Construct a histogram for the age and income data given in the employee
records table 1. Use Jupyter Notebook to construct the graph of the age
and income columns.
import
pandas
as
pd
import
matplotlib.pyplot
as
plt
df = pd.read_csv(
"Data.csv"
)
plt.figure(figsize=(
10
,
4
))
plt.subplot(
1
,
2
,
1
)
plt.hist(df[
"Age"
], bins=
30
, color=
"blue"
, alpha=
0.7
)
plt.title(
"Age Histogram"
)
plt.xlabel(
"Age"
)
plt.ylabel(
"Frequency"
)
plt.subplot(
1
,
2
,
2
)
plt.hist(df[
"Income"
], bins=
8
, color=
"green"
, alpha=
0.7
)
plt.title(
"Income Histogram"
)
plt.xlabel(
"Income"
)
plt.ylabel(
"Frequency"
)
plt.tight_layout()
plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
C sharp
Table: Student (the headers are the field names in the Students table)
StudentID
Name
Age
Gender
ProgramID
791
Stephanie Brown
19
Female
BCS
236
Shannon Dawn
25
Female
BA
618
Geoff Berg
24
Male
ARET
256
Andrew Schilling
22
Male
BSC
902
Gary Sang
23
Male
DAAD
Note: There is a StudentDataSet with a Student table, a StudentTableAdapter, a StudentBindingSource, and a StudentDataGridView control on the form.Note: There is an Average query, named Average, that returns the average age of the student from the Student table.Note: There is also Max query, named Highest, that returns the highest age of the student from the Student table.Write the code you would place in the AverageButton click event on your form to call the Average query and the Highest query and display in a DifferenceLabel, the difference between the highest age of a student and the average age.
arrow_forward
Complete questions 2-7
Note:
Complete questions 2-7 in part B of the Garden Glory Project on p.129 and 130 of the textbook (8th
edition).
There may be more than one service performed at a property on a given date. Use this information when you answer questions.
Comment on each question whether a given design is good and provide reasons for that. For example, the answer for Q1 is “This design is not appropriate because the designated Primary
Key (PK) (PropertyName) in the table cannot determine some other attributes of the same
table (e.g., ServiceDate, Description, Amount); in other words, the designated PK is not unique”. Your answers need to be specific and should clearly pinpoint the design problem with references to the relevant attributes in the table and the reason of violation. Any vague answer like “This design is bad because the PK is not good”, are wrong.
Note that the foreign keys are italicized instead of dashed underlined.
Clearly label answers for each question.…
arrow_forward
SQL question help
arrow_forward
For the term project, you should write a report on a chosen topic related to your field. Follow the steps below:
Choose a topic : Technology
Narrow it down : Algorithms and data structures
Find at least 10 related articles to your topic
Read the articles, synthesize them and report the parts you want.
Use appropriate citations and quotations (APA 6th or 7th edition)
You need a reference list at the end of your report.
The report should be minimum 1000 and maximum 2000 words.
arrow_forward
in a shop ,there are 10 employee and 20 kinds of goods,goods id between 1-20
EMPLOYEE id first name last name gender
10001, 'Tom', 'Brown', 'F'10002, 'Elizabeth', 'Tremblay', 'F'10003, 'Gladys', 'Julie', 'F10004, 'John', 'Taylor', 'M10005, 'Amelia', 'Smith'10006, 'Logan', 'Katherine'10007, 'Leo', 'Brown'10007, 'Lem', 'Thompson'10009, 'Tom' 'Smith'10010, 'Emma', 'Campbell'
------------
and I want to add a name library in it ,like this
how could i create a HTML file ,with will randomly create customers with these employee .
there is a start button on the page .
press "start "bottom ,and it will It will randomly match 10 items, customers, and goods, display goods id ,customername and gender ,employer name , id and gender .
how to do such a page ?
arrow_forward
3.Rose Theater Company
For this assignment you will modify the Rose Theater Company assignment created in chapter 2. Rose Theater company sells tickets to their productions. There are three seating options for purchase at Rose's theater; Orchestra seats, Center Stage seats, and Outer Stage seats. Orchestra seats cost $75 each, Center Stage seats cost $50 each, and Outer Stage seats cost $25 each. The theater company also gives a discount of 5% on tickets to the local theater group members. The user will enter the number of Orchestra, Center State, and/or Outer stage seats the customer wants to purchase along with whether the customer is a member of the local theater group. The program will calculate and display the income from each type of seat sale along with the discount, and total sale.
Requirements
Must include use of named constants
Must include meaningful variable names
Must include a main function with a call to the main function.
Must include at least one value returning…
arrow_forward
SQL query Knowledge
arrow_forward
INSERT INTO Athletics VALUES((100, 'Samuel', 'David', '1967-11-17', '23',1967-11-20);
INSERT INTO Athletics VALUES((110, 'Mario', 'Cassius', '1967-11-13', '22',1967-11-20 );
INSERT INTO Athletics VALUES((104, 'Mario', Ace', '1967-12-31', '26',1967-11-20 );
INSERT INTO Athletics VALUES((123, 'Cristisno', 'Ronny', '1965-11-13', '28',1967-11-20 );
INSERT INTO Athletics VALUES((110, 'Harx', 'Wale', '1964-11-12', '20',1967-11-20 );
QUESTION
Create a stored procedure named sp_InsertAthleteDetails, which enters details into the Athletics table by using 4 parameters for the first four columns in the Athletics table. Set No Count on.
arrow_forward
Software engineering
arrow_forward
SQL knowledge question
arrow_forward
Create a table of your courses for this
semester with fictitious grades for Test 1, Test
2, and Test 3 with appropriate weighting. The
next column should determine the grade
using the 'if' function, with plus and minus.
The grade cell should show a discrete color
scheme with below C- in red; above B in
green and, yellow for the rest. This should be
done using conditional formatting of cells.
arrow_forward
1) Create a histogram for Displacement with bin increments of 50. Paste the histogram with appropriately labeled title and
axis.
2) Create a scatter plot showing the relationship between acceleration and horsepower. Paste the chart with appropriately
labeled title and axis.
3) What is the correlation coefficient of horsepower and the cylinders the vehicles in the data set. Show the returned
value of the correlation and describe the relationship.
arrow_forward
Below is “Book Order,” the only table in library management system’s database. The design of “Book Order” as you may tell is in the zero normal formal form, you as the database designer want to convert the design into the third normal form.
Order ID
Special order date
Customer ID
Customer last name
Customer First name
Customer birth date
Book ISBN1
Book Title 1
Book Author 1
Book publication year 1
Book ISBN2
Book Title 2
Book Author 2
Book publication year 2
Store ID
Store name
Store location
Special order status
Book Order(Order ID, Special order date, Customer ID, Customer last name, Customer First name, Customer birth date, Book ISBN1, Book Title 1, Book Author 1, Book publication year 1, Book ISBN2, Book Title 2, Book Author 2, Book publication year 2, Store ID, Store name, Store location, Special order status)
arrow_forward
Do what is asked
arrow_forward
In addition to a variable's name, its "type" and "extra characteristics" must be specified. That is to say, apart from its data type, every variable has its own distinct characteristics. If you could elaborate on the idea so that we could better clarify the terms, that would be great.
arrow_forward
calculate_new_balance
Given a starting balance (a number), and a list of transaction tuples, calculate the final balance for an account. Transaction tuples are of the shape ("description", amount, "withdrawal") , or ("description", amount, "deposit"). The last entry in the tuple will be either "withdrawal" or "deposit". Every withdrawal decreases the balance of the account by the specified amount, and every deposit increases the balance. The return value is the new account balance, as a number. (which could be negative)
Sample calls should look like:
>>> calculate_new_balance(100, [("payday", 20, "deposit"), ("new shoes", 50, "withdrawal"), ("illicit winnings", 200, "deposit")])270>>> calculate_new_balance(100, [])100
arrow_forward
The Python code for step 7 is needed. Step 6 has been attached for data reference
arrow_forward
example inn 100 sqlmstatement
arrow_forward
Sales Database:
Customers(custId, lastName, firstName, address, phone, creditLimit)
Orders(ordNumber, itemNumber, qtyOrdered.)
Items(itemNumber, itemName, price)
For the Sales Database referenced above, write the SQL command to create the LineItem table, assuming the Orders table and items table already exist.
arrow_forward
A6. Suppose a farmer decided to keep individual records for the new cattle. His records show 4 bred heifers valued at $900; 5 cows valued at $1,000, and 1 bull valued at $2,000. Using the chart below, setup the individual records for each animal for year 1. Make sure that your animal ID for each type of animal is different, for example, bred heifers can start at 22-1, etc.; cows can start at 22- 01, and the bull can be 22-001.
arrow_forward
Ma2.
Attribute sampling is better than variable sampling
2. Examine the above statement.
arrow_forward
a) List the names (Fame and Lame) and SSN of all male employees (Sex='M).
b) List the last name and SSN of employees who work on project 'City Tower', sorted by alphabetical order of last na
c) Retrieve the average salary of all female employees (Sex='F').
d) List department names and the number of employees who work for each department. Sort by department name.
e) Retrieve the names of employees who make at least $10,000 more than thé employee who is pald the least in
arrow_forward
SQL Language Question
arrow_forward
PATHS is a table that contains information about paths on a number line. The
structure of PATHS is as follows, where x1 and x2 represent that there is a path
from Coordinate x1 to Coordinate x2 (You can't move from Coordinate x2 to
Coordinate x1 ).
NAME
ΤΥΡE
NULLABLE
X1
INT
FALSE
X2
INT
FALSE
Problem
Please write an SQL statement that returns the beginning and end point of each
path in PATHS . Sort them by the value of the beginning point in ascending order.
Constraints
• In the case where a direct path from Coordinate a to Coordinate b is
available, no path from Coordinate b to Coordinate a will be given.
|x1-x2| = 1
• No path will overlap with one another.
Example
Suppose that PATHS is as follows:
x1
x2
1
2
2
3
4
7
7
6
This table can be visualized as follows:
START
END
END
START
7
8.
9
Therefore, your SQL statement must return the following:
start
end
1
4
8
6
arrow_forward
Question 8
The ERD below is the initial design to track events and the dates on which they are scheduled. Using
the ERD, what can you determine about the statement "There were 5 events on July 4, 2012."
EVENT
DATE
Has
Scheduled on
O This statement could NOT be true based on the ERD.
O This statement could be true based on the ERD.
arrow_forward
se SQL program Server2014 ... A company undertakes a number of projects, the employees work on it, and during the process of analyzing the system we found the following points: The company is made up of a number of departments (the department number, which is a single number, the name of the department, the department's website) There are employees (the employee's number is a single number, his name, address, salary, and job) The employee belongs to one department only, and each department has more than one employee. There are projects (the project number, which is a single number, the name of the project, the project site) Each project has a specific department that it manages, and the department may manage it more than one project. More than one employee works in the project, and the employee may work in more than one project. Each employee has a number of work hours per week in each project. Required:
4. Create a Trigger that indicates the end of the entry operation on any of…
arrow_forward
Range controls and null value controls may improve or hinder data integrity.
arrow_forward
The histogram chart below has a single bar for every state in the United States. Each state is placed in a bucket, or range, based on its area in square miles.
Count of States by Area in Square Miles
24
Area
(Square
Miles)
(count)
18
12
50,000
100,000
150,000
200,000
Which of the following conclusions is BEST supported by the histogram chart?
More than half of all states are smaller than 50,000 square miles
B The most common size of states is between 50,000 and 100,000 square miles
© Some states are larger than 200,000 square miles
D More than half of all states are larger than 100,000 square miles
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Related Questions
- C sharp Table: Student (the headers are the field names in the Students table) StudentID Name Age Gender ProgramID 791 Stephanie Brown 19 Female BCS 236 Shannon Dawn 25 Female BA 618 Geoff Berg 24 Male ARET 256 Andrew Schilling 22 Male BSC 902 Gary Sang 23 Male DAAD Note: There is a StudentDataSet with a Student table, a StudentTableAdapter, a StudentBindingSource, and a StudentDataGridView control on the form.Note: There is an Average query, named Average, that returns the average age of the student from the Student table.Note: There is also Max query, named Highest, that returns the highest age of the student from the Student table.Write the code you would place in the AverageButton click event on your form to call the Average query and the Highest query and display in a DifferenceLabel, the difference between the highest age of a student and the average age.arrow_forwardComplete questions 2-7 Note: Complete questions 2-7 in part B of the Garden Glory Project on p.129 and 130 of the textbook (8th edition). There may be more than one service performed at a property on a given date. Use this information when you answer questions. Comment on each question whether a given design is good and provide reasons for that. For example, the answer for Q1 is “This design is not appropriate because the designated Primary Key (PK) (PropertyName) in the table cannot determine some other attributes of the same table (e.g., ServiceDate, Description, Amount); in other words, the designated PK is not unique”. Your answers need to be specific and should clearly pinpoint the design problem with references to the relevant attributes in the table and the reason of violation. Any vague answer like “This design is bad because the PK is not good”, are wrong. Note that the foreign keys are italicized instead of dashed underlined. Clearly label answers for each question.…arrow_forwardSQL question helparrow_forward
- For the term project, you should write a report on a chosen topic related to your field. Follow the steps below: Choose a topic : Technology Narrow it down : Algorithms and data structures Find at least 10 related articles to your topic Read the articles, synthesize them and report the parts you want. Use appropriate citations and quotations (APA 6th or 7th edition) You need a reference list at the end of your report. The report should be minimum 1000 and maximum 2000 words.arrow_forwardin a shop ,there are 10 employee and 20 kinds of goods,goods id between 1-20 EMPLOYEE id first name last name gender 10001, 'Tom', 'Brown', 'F'10002, 'Elizabeth', 'Tremblay', 'F'10003, 'Gladys', 'Julie', 'F10004, 'John', 'Taylor', 'M10005, 'Amelia', 'Smith'10006, 'Logan', 'Katherine'10007, 'Leo', 'Brown'10007, 'Lem', 'Thompson'10009, 'Tom' 'Smith'10010, 'Emma', 'Campbell' ------------ and I want to add a name library in it ,like this how could i create a HTML file ,with will randomly create customers with these employee . there is a start button on the page . press "start "bottom ,and it will It will randomly match 10 items, customers, and goods, display goods id ,customername and gender ,employer name , id and gender . how to do such a page ?arrow_forward3.Rose Theater Company For this assignment you will modify the Rose Theater Company assignment created in chapter 2. Rose Theater company sells tickets to their productions. There are three seating options for purchase at Rose's theater; Orchestra seats, Center Stage seats, and Outer Stage seats. Orchestra seats cost $75 each, Center Stage seats cost $50 each, and Outer Stage seats cost $25 each. The theater company also gives a discount of 5% on tickets to the local theater group members. The user will enter the number of Orchestra, Center State, and/or Outer stage seats the customer wants to purchase along with whether the customer is a member of the local theater group. The program will calculate and display the income from each type of seat sale along with the discount, and total sale. Requirements Must include use of named constants Must include meaningful variable names Must include a main function with a call to the main function. Must include at least one value returning…arrow_forward
- SQL query Knowledgearrow_forwardINSERT INTO Athletics VALUES((100, 'Samuel', 'David', '1967-11-17', '23',1967-11-20); INSERT INTO Athletics VALUES((110, 'Mario', 'Cassius', '1967-11-13', '22',1967-11-20 ); INSERT INTO Athletics VALUES((104, 'Mario', Ace', '1967-12-31', '26',1967-11-20 ); INSERT INTO Athletics VALUES((123, 'Cristisno', 'Ronny', '1965-11-13', '28',1967-11-20 ); INSERT INTO Athletics VALUES((110, 'Harx', 'Wale', '1964-11-12', '20',1967-11-20 ); QUESTION Create a stored procedure named sp_InsertAthleteDetails, which enters details into the Athletics table by using 4 parameters for the first four columns in the Athletics table. Set No Count on.arrow_forwardSoftware engineeringarrow_forward
- SQL knowledge questionarrow_forwardCreate a table of your courses for this semester with fictitious grades for Test 1, Test 2, and Test 3 with appropriate weighting. The next column should determine the grade using the 'if' function, with plus and minus. The grade cell should show a discrete color scheme with below C- in red; above B in green and, yellow for the rest. This should be done using conditional formatting of cells.arrow_forward1) Create a histogram for Displacement with bin increments of 50. Paste the histogram with appropriately labeled title and axis. 2) Create a scatter plot showing the relationship between acceleration and horsepower. Paste the chart with appropriately labeled title and axis. 3) What is the correlation coefficient of horsepower and the cylinders the vehicles in the data set. Show the returned value of the correlation and describe the relationship.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:CengageCOMPREHENSIVE MICROSOFT OFFICE 365 EXCEComputer ScienceISBN:9780357392676Author:FREUND, StevenPublisher:CENGAGE L
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L