DAT 430 MOD1 Journal (1)
.docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
430
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
4
Uploaded by HighnessCrownMouse16
Nathan Cumbo
DAT 430
November 1, 2023
Module 1 Journal
In this class, we are exploring seven different potential approaches to
preprocessing data
and relating how or if they might be useful working on the projects for this course: Aggregation,
data sampling,
dimensionality reduction, feature subset selection, feature creation, discretization
& binarization, and variable transformation.
As an analyst, the questions to be asked are: What is
the target data source for Projects One and Two? Which of these approaches of preprocessing are
in scope for meeting the needs of this project?
Data aggregation is simply the process of collecting data and presenting it in summary
form, for the use of conducting statistical analysis to help company executives make informed
decisions regarding marketing strategies, price settings, structuring operations, and more. The
data aggregation approach is primarily used by companies to improve marketing and sales. Data
aggregation is relevant and applicable in the scope of Project One, where our goal is to analyze
HR attrition data and present it in visual form to show the causes of employees leaving their jobs
at the rate they are. Data aggregation applies in collecting the attrition information and
summarizing it, including causes of attrition and visualization of any recurring patterns found in
the data.
In data analytics, data sampling is the practice of analyzing a small subset of data
collected from a larger set of data, discovering processes, patterns and trends in the small data,
and transferring the findings to the larger complete data set. The large benefit of data sampling is
that it allows analysts to save time and quickly produce more accurate findings in statistical
analysis (2022). Data sampling likely won’t be necessary for the projects in this class.
Dimensionality Reduction is the process of transforming high-dimensional data into low-
dimensional data. This makes it easier for analysts to work with raw data with a lot of
dimensions by reducing and removing many of those dimensions. This technique is common
when working with raw data fields such as speech recognition, language dialect relations, signal
processing, neuroinformatics, and bioinformatics. Along with dimensionality reduction comes
feature creation and feature subset selection. Feature subset selection comes into play only when
dimensionality reduction is also present; in the case of this class and working on Projects one and
two, I believe that both dimensional reduction and feature subset selection will play roles in the
decision making process. This project calls for the analyst to sort through HR attrition data to
create metrics to help them draw conclusions related to why employees are leaving their jobs.
When it comes to career attrition, there are a plethora of reasons for leaving; relocation, pay,
family emergencies, dislike of coworkers or bosses, benefits, etc. Dimensionality reduction will
allow us to sort through this data and make more precise and accurate conclusions.
Discretization is essentially the process of regrouping certain values of data in new
categorized smaller values. A great example of this is classifying age groups. For example, if we
were given the age of 50 participants and asked to group them, we could avoid discretization by
placing all participants in groups with others by decade and listing their ages. However, with data
discretization, we can make categorized groups, labeled ‘Infant’, ‘Young’, ‘Adult’, and ‘Senior’,
for example. This process helps when working with a large pool of data by reducing workload
while obtaining minimum data loss.While dimensionality reduction will likely be present, I don’t
think discretization will be a factor in the class project.
Finally, variable transformation is a way to make the data work better for our model.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help