Assignment_4_Cleaning_Data_Set_Instructions

.docx

School

Indiana University, Bloomington *

*We aren’t endorsed by this school

Course

BUSH352

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by emilyguse09

Report
Assignment 4: Cleaning a Data Set Task: Identify and Correct Errors in a Data Set From Canvas, download and save: BtownOrders(Jan2019)_assignment_data.xlsx and Assignment4_Cleaning_Data_Set_Answersheet.docx Scenario: You have done such a good job with forecasting and cleaning data from Btown Interactive that they have asked you to do some further cleaning of their data. The sales group is suspicious that there might be errors in the data for January. You will be working with a new set of 3,296 orders with 5,182 line items from January 2019. The data is in a file called “ BtownOrders(Jan2019)_assignment_data.xlsx .” A “line item” is just an order for a specified number of a particular product – there can be multiple line items per order. You’ll be looking for errors in the data in several places: 1. Errors in the product names . 2. Errors in the promotional codes . 3. Errors in the total_product_price . You will find, document, and correct the errors in the Excel workbook. Deliverables: Complete the Assignment4_Cleaning_Data_Set_Answersheet.docx associated with this assignment on Canvas. Submit the completed answer sheet with the “cleaned” Excel file on Canvas by the due date. Evaluation: You will be graded based on the number of correct answers. There are 10 questions overall. Be sure to insert rows in the answer sheet where appropriate to answer the questions completely. 1 Make sure you complete the in-class exercise “Locating Bad Data in Excel” before attempting this assignment! See the W6C2 in-class assignment on Canvas for guidance.
Part 1: Errors in Product Names Verify that the product names (Column J) are correct using the master list in the Lookups tab and correct any errors. You can assume the information in the Lookups tab is always right. So if there is a mismatch, the error is in your data set. To do this, you will use the MATCH function (use W6C2 in-class exercise as a guide). Place your MATCH calculation in column N of the “Btown Orders (Jan)” worksheet. Title of the column “ ProdMatch ” (in cell N1) and start your MATCH formulas in cell N2. HINT: Using the Sort and Filter features in Excel can also help you. You’ve used both of those features in the exercises we’ve done so far in this course. ANOTHER HINT: Remember, there is a list of correct product names in the Lookups worksheet. Using the Assignment4_Cleaning_Data_Set_Answersheet.docx provided with this assignment, answer the following questions: 1) How many line items (rows) had incorrect product names? 2) List the products names with errors, listing the incorrect name, the corrected name, and how many rows of data had the error. The product names are only off by a couple of letters and should be easily identified in the Lookups list. Correct all invalid product names in the data. (Try sorting by product_name . You only need to list each incorrect product name once.) Now fix the incorrect product names in the “ Btown Orders (Jan) ” worksheet. Use caution when sorting!!! Refer to item #4 from the Taber reading . HINT: Use “Find and Replace” to speed up fixing the errors. Part 2: Errors in Promotional Codes Verify that the promotional codes (Column E) are correct using the master list in the Lookups tab and correct any errors. Use the MATCH function and place your function in Column O of the Btown Orders (Jan) ” worksheet. Make the title of the column “ PromMatch ” (in cell O1) and start your MATCH formulas in cell O2. Answer the following questions: 1) How many line items (rows) had incorrect promotional codes? The value in the promo_code field should match the Lookups data sheet exactly. 2) List the promotional codes with errors, listing the incorrect codes, the corrected 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help