Dataset Exploration Pt 2 BDAT1005-24W
.docx
keyboard_arrow_up
School
Georgian College *
*We aren’t endorsed by this school
Course
105
Subject
Industrial Engineering
Date
Apr 3, 2024
Type
docx
Pages
3
Uploaded by BailiffRose16582
p.1 of 3
e19a393bfe0f1d7581f57768900f50bfd68d198c.docx
Dataset Exploration Project Part 2
BDAT1005-24W – Mathematics for Data Analytics
Dataset Exploration Project Part 2 (DE pt2) – Performing Univariate Analysis
Assignment Date:
Week 3
Due Date:
End of day Sunday
after Week 7
Evaluation Weight: 10% of course
Project Description – Please read the entire document before proceeding!
This is the second part of a four-part cumulative project you will work on all term. In each step, you will add new material and update your previous work as appropriate.
If possible, you should use the same data set for all parts – otherwise, you will have to re-do previous work. Each step must be inclusive – your professor will not consider previous submissions in grading this one.
In DE pt1 you described your dataset, looked for missing & invalid values, and developed initial FINER research questions. In this step you will look at some or all of the variables one by one, to help you develop an understanding of your data You must continue to use MS Excel for data manipulation, analysis and visualization, and your report (including discussion and appropriate visualizations) must be in MS Word format.
Here are the steps of DE pt 2:
1)
Continue to use the same appropriate data source if possible: the population or sample must have a minimum size of 1,000 records across at least 8 variables, with a good mix of qualitative and quantitative data types. If your dataset was deemed inappropriate
in feedback from part 1, you won’t be able to resubmit it, but you will need to go back and re-do that work on a new dataset in order to proceed.
2)
Update your DE pt1 materials
(dataset description, data dictionary, data rearrangement, missing and invalid data analysis, assumptions, and FINER questions) as suggested by feedback from part 1 and/or according to your own further work.
3)
Perform univariate
descriptive statistics on at least eight suitable variables. Include a good mix of quantitative and qualitative data types, including appropriate visualizations (tables, charts, graphs, etc.).
4)
Categorize (‘code’) one or more variables in ways that you think may be helpful for your analysis. For full marks in this area you must do more than just one-for-one coding of categorical variables. (You may do more coding or categorization later on, depending on how you proceed with your analysis.) Perform univariate descriptive statistics on the result(s), including appropriate visualizations.
5)
Check quantitative variables for suggested outliers. NOTE that the methods we use only suggest outliers – it’s up to you to decide if a value needs to be excluded from some analyses or from your entire modified dataset.
6)
Clean your data set – decide how to deal with missing or invalid data, and whether and how to remove outliers, and provide explanations for your choices in each case.
7)
Reconsider your FINER questions, your assumptions, and your need for extra data according to your further analysis.
Your research questions should work together to help you build a coherent and cohesive analysis that examines relationships among the variables in your dataset. Extra data may be needed to support your analysis. NOTE that you should not be trying to answer these questions in DE part 1 or 2 – that comes later.
8)
Continue to track your analysis –record your thoughts about what to study; what data you found and where; what questions you’re asking and what assumptions you’re making, and why. Tracking is not
about recording times or dates, or who said or did what – it’s about enabling other analysts to replicate your work.
p.2 of 3
e19a393bfe0f1d7581f57768900f50bfd68d198c.docx
Dataset Exploration Project Part 2
BDAT1005-24W – Mathematics for Data Analytics
As you build your report, try to write from the perspective of a data analyst sharing with colleagues or other stakeholders (management, customers, other industry professionals), rather than that of a student submitting to a professor. Your MS-Word report must stand on its own as a complete and thorough description of your dataset, including relevant results and visualizations.
You will submit your Excel spreadsheet as well, but only so your instructor can examine how you organized your work, obtained your results and validated your conclusions – if your professor has to look in your spreadsheet to find your results etc., your report is not complete!
Be sure to include appropriate citations and references! Every chart, graph, table or other representation of data in your submitted document must also be found in your submitted spreadsheet or specifically cited and referenced
as someone else’s work. If you quote, copy or otherwise borrow material without citation, you may be subject to penalties up to and including Academic Misconduct.
Evaluation
Part 2 is marked out of 55, and worth 10% of the course.
Appropriate Data Set & Description (10 marks) – Does your dataset follow the prescriptions? Did you arrange and describe it well, and provide a data dictionary and assumptions and/or extra data requirements as needed?
Univariate Descriptive Statistics (12 marks)
– Do your univariate descriptive statistics thoroughly and appropriately describe at least eight variables and a mix of qualitative and quantitative data types
? Did you present them well (including charts and tables as appropriate)?
Outliers & Cleaning (8 marks)
– Did you identify suggested outliers or other unusable information in your dataset? Did you clearly explain appropriate decisions about whether and how to ‘clean’ them in your modified data?
Coding & Categorization (8 marks)
– Did you do an effective job of creating and describing codes and/or categories for at least one variable (or more as appropriate), to help with analyzing your data? Did you do more than just one-
for-one coding of categoricals? Do the codes and/or categories help you to accurately depict the distributions of the variables?
FINER Questions (9 marks) – Did you list at least three research questions that can be usefully and suitably investigated later on with your dataset? Are they clearly and explicitly stated? Do they work together to examine relationships among variables that will help to build a coherent and cohesive analysis? Did you describe how you developed them? Have you updated them as suggested in part 1 feedback and/or according to your own further work?
Demonstrated Tracking (8 marks)
– Did you track your activities from parts 1 & 2 and record them along with what worked well and what didn’t, so that you or another analyst could repeat and/or improve on this analysis in the future? Do your assumptions and discussion cover potential issues?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help