DAT 430 MOD1 Journal (1)

.docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

430

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

4

Uploaded by HighnessCrownMouse16

Nathan Cumbo DAT 430 November 1, 2023 Module 1 Journal In this class, we are exploring seven different potential approaches to preprocessing data and relating how or if they might be useful working on the projects for this course: Aggregation, data sampling, dimensionality reduction, feature subset selection, feature creation, discretization & binarization, and variable transformation. As an analyst, the questions to be asked are: What is the target data source for Projects One and Two? Which of these approaches of preprocessing are in scope for meeting the needs of this project? Data aggregation is simply the process of collecting data and presenting it in summary form, for the use of conducting statistical analysis to help company executives make informed decisions regarding marketing strategies, price settings, structuring operations, and more. The data aggregation approach is primarily used by companies to improve marketing and sales. Data aggregation is relevant and applicable in the scope of Project One, where our goal is to analyze HR attrition data and present it in visual form to show the causes of employees leaving their jobs at the rate they are. Data aggregation applies in collecting the attrition information and summarizing it, including causes of attrition and visualization of any recurring patterns found in the data. In data analytics, data sampling is the practice of analyzing a small subset of data collected from a larger set of data, discovering processes, patterns and trends in the small data, and transferring the findings to the larger complete data set. The large benefit of data sampling is that it allows analysts to save time and quickly produce more accurate findings in statistical
analysis (2022). Data sampling likely won’t be necessary for the projects in this class. Dimensionality Reduction is the process of transforming high-dimensional data into low- dimensional data. This makes it easier for analysts to work with raw data with a lot of dimensions by reducing and removing many of those dimensions. This technique is common when working with raw data fields such as speech recognition, language dialect relations, signal processing, neuroinformatics, and bioinformatics. Along with dimensionality reduction comes feature creation and feature subset selection. Feature subset selection comes into play only when dimensionality reduction is also present; in the case of this class and working on Projects one and two, I believe that both dimensional reduction and feature subset selection will play roles in the decision making process. This project calls for the analyst to sort through HR attrition data to create metrics to help them draw conclusions related to why employees are leaving their jobs. When it comes to career attrition, there are a plethora of reasons for leaving; relocation, pay, family emergencies, dislike of coworkers or bosses, benefits, etc. Dimensionality reduction will allow us to sort through this data and make more precise and accurate conclusions. Discretization is essentially the process of regrouping certain values of data in new categorized smaller values. A great example of this is classifying age groups. For example, if we were given the age of 50 participants and asked to group them, we could avoid discretization by placing all participants in groups with others by decade and listing their ages. However, with data discretization, we can make categorized groups, labeled ‘Infant’, ‘Young’, ‘Adult’, and ‘Senior’, for example. This process helps when working with a large pool of data by reducing workload while obtaining minimum data loss.While dimensionality reduction will likely be present, I don’t think discretization will be a factor in the class project. Finally, variable transformation is a way to make the data work better for our model.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help