STAT301 checkpt2

pdf

School

Purdue University *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

4

Uploaded by AgentFireCamel20

Report
Checkpoint #2: Exploratory Data Analysis Valerie Shao STAT301 Fall 2023 I. Data Set Exploration The human resources dataset contains 312 rows and 36 columns, created by Dr. Rich and Dr. Carla Patalano on Kaggle, and is used in graduate courses to educate students. This dataset revolves around a fictional company and contains personal and employment information about its employees. When examining the dataset's structure, there are 13 categorical variables and 22 numerical variables. The numerical variables primarily assess employee performance, pay rates, and other identifiers like their marital status. Categorical variables help organize and group the data, simplifying analysis and insights, particularly when conducting studies on the company's workforce, where race, gender, and other personal details are categorized. Several important factors can help us assess the company's diversity and find any pay disparities. Personal details, like those from diversity job fairs, race, and gender, provide insights into employee characteristics that reveal the company's diversity. We can compare salaries with factors like gender or race to understand if these factors might lead to unfair pay for certain groups. However, it's crucial to consider other factors in this analysis because an employee's performance also affects their salary. Additionally, we can examine the data on recruitment sources to determine the best platforms for future company advertising to attract better candidates. Regarding data preprocessing, there were no missing values or duplicate entries found in this data set. II. Data Visualizations Figure 1Categorical Visualization: Example #1 [Pie Chart]
Figure 2Categorical Visualization: Example #2 [Bar Chart] Figure 3Numeric Visualization: Example #3 [Scatter Plot] Categorical Example #1 Insights: The pie chart provides a simple and clear view of the ethnicity distribution within the company. From the chart, we can see that the majority of employees are white, making up the largest slice of the pie at 61%. Following that, black employees represent a significant portion at 26%, while Asian employees make up 9%. A smaller portion is composed of employees with Two or more races, accounting for 3%, and those classified as Other (American Indian, Alaska Native, Hispanic) make up the remaining 1%. This visual representation helps us quickly understand the diversity of the company's workforce, with white and black employees being the most prominent groups, and there's room to enhance diversity and inclusion efforts, especially among Asian and Other ethnicities.
Categorical Example #2 Insights: The bar chart, which is a visual representation of data, provides a clear picture of the sum of salaries categorized by gender. In this chart, there are two bars, one for females and one for males, and their respective heights indicate the total salary amounts for each group. The taller bar representing females indicates a higher sum of salaries for them compared to the shorter male bar. This visual depiction makes it easy to grasp the key insight that female employees collectively earn more in total salary compared to their male counterparts, raising important questions about the factors behind this difference in compensation. This result could be because there are more female employees, or they might earn more on average. This finding prompts questions about why this difference exists, like whether certain jobs or departments have more women or if there are pay gaps to address. In short, the chart highlights the need to examine gender and pay within the company more closely. Numeric Example #3 Insights: Examining the scatter plot, we notice a dispersion of data points that don't reveal a strong, obvious link between engagement survey results (on the x-axis) and salary levels (on the y- axis). The points are spread out, forming a shape that doesn't suggest a clear correlation. This lack of a discernible pattern implies that the survey scores and salaries aren't consistently related in a straightforward manner. In practical terms, this means that higher engagement survey results don't necessarily translate into uniformly higher salaries for employees and vice versa. It could indicate that other variables or factors, like job roles, experience, or company policies, play a significant role in determining salaries within the organization. Therefore, a deeper analysis is required to uncover the nuances of the relationship between employee engagement and compensation. III. Preliminary Hypotheses Preliminary Hypothesis #1: I want to see if being married or having a family affects how motivated employees are at work and if it makes them more likely to give higher scores on the engagement survey, which measures how dedicated and enthusiastic they are about their jobs and employers. I'm interested in finding out if marital status is a significant factor. I'll also look at other factors like how long employees have been with the company and what type of job they have to understand what influences their survey responses. Preliminary Hypothesis #2: It's worth taking a closer look at the data to understand if the company is fully committed to DEI (Diversity, Equity, and Inclusion). To do this, we can examine the total or average engagement scores for employees from different racial backgrounds.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help