mod-2
.docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
511
Subject
Business
Date
Jun 5, 2024
Type
docx
Pages
5
Uploaded by DeaconRose12996
1
2-1 Report: Improving Data Quality
Cody Manley
Southern New Hampshire University
QSO-511-X4347 Business Analytics 24TW4
Professor Daniel Letort
May 26, 2024
2
Importance of Quality Data
Financial Charm Bank is a large retail bank operating in the United States. Senior management wants to compile data for 3 Florida Branches to predict the needs of existing customers and their interest in term deposits. The Bank's Vice President (V.P.) has identified an ideal data set from the Florida branches that contains customer demographics, banking history, and term deposit holdings for the business analysis team to analyze. Unfortunately, the initial analysis of the data set has been deemed questionable. The V.P. has tasked the senior members to create a comprehensive report highlighting the impact of the data set’s errors, gaps, and anomalies on the organization if they are not corrected. Errors, Gaps, and Anomalies
Several errors, gaps, and anomalies have been found within the data set. The first inconsistency that needs to be addressed is the lack of clarity in the column names. Below is a screenshot of the column names in the data set that have been color-coded to highlight the issues that could confuse individuals and cause errors when interpreting data. Columns A-D in orange are the only columns that do not need to be changed because they are self-explanatory. Columns
rambles E-H in yellow have unclear meanings. The last section of blue marking Column L-P has undefined labels and uses abbreviated titles, which create more questions than answers. The biggest issue in the blue section is the abbreviation ‘P’ used before the label. Since Column O is labeled ‘previous’ and is between N and P, one can only assume that ‘P’ stands for ‘previous but needs to be clarified. The numerical data contains inconsistencies that need to be addressed and adjusted to produce accuracy in the analysis (Michaloudis, 2024). In column L, ‘duration,’
3
several numbers have decimal points; it is unclear if column ‘duration’ should have numbers with decimal points. Further information is needed. The last inconsistencies that need to be addressed are the blank spaces and data anomalies. Overall, only 74 blanks were found in columns A, C, and P, a relatively small number
compared to the data size. The identified gaps need to be corrected; however, more information is needed to complete the data, which may need additional analysis. Columns K and B need to be addressed because data anomalies can lead to data quality issues when data is being analyzed miss spells and abbreviated. Columns I-J can be seen in the screenshot above and have little meaning. Column I, ‘contact’ shows either ‘telephone,‘contact’ Unknown or N/A, but it is unclear how that is relevant or could be used. Data Quality Issues
Analyzing the data revealed that the data set has quality issues, missing values, and gaps in data. The missing values identified were 74 gaps in 3 Columns and an unclear label of ‘pday’ on Column N, which shows a negative number in several places in a column. Columns P and I use values of ‘unknown’ and ‘N/A,’ which is confusing because both words have the same meaning. The data quality issues identified are in Columns I-J, which are in the screenshot above and have little meaning. Column I, ‘contact,’ shows either Unknown or N/A, but it is unclear if ‘contact’ means no phone number is on file or the customer has not been contacted. This column also lists ‘telephone,’ but it is unclear if this is a landline or a work line. Suppose unclean data is used for organizational prediction. In that case, it will provide the organization with inaccurate analytics, resulting in poor customer relations and creating bad decisions that will harm overall business performance (Foote, 2023). Using the existing data set with quality issues could result in additional time delays in deciphering what disparities the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Corporate triple-A bond interest rates for 12 consecutive months are as follows.
9.5
9.4
9.5
9.7
9.8
9.8
9.9
10.6
10.0
9.8
9.7
9.7
(a)
Construct a time series plot. What type of pattern exists in the data?
The data appear to follow a trend pattern.The data appear to follow a horizontal pattern. The data appear to follow a seasonal pattern.The data appear to follow a cyclical pattern.
(b)
Develop three-month and four-month moving averages for this time series. (Round your answers to two decimal places.)
Month
Time SeriesValue
3-Month MovingAverage Forecast
4-Month MovingAverage Forecast
1
9.5
2
9.4
3
9.5
4
9.7
5
9.8
6
9.8
7
9.9
8
10.6
9
10.0
10
9.8
11
9.7
12
9.7
Does the three-month or four-month moving average provide the better forecasts based on MSE? Explain.
The four-month moving average provides more accurate forecasts, because its MSE is smaller than that of the three-month moving…
arrow_forward
Dr. Lillian Fok, a New Orleans psychologist, spe-cializes in treating patients who are agoraphobic (i.e., afraid
to leave their homes). The following table indicates how manypatients Dr. Fok has seen each year for the past 10 years. It alsoindicates what the robbery rate was in New Orleans during thesame year:
Using trend (linear regression) analysis, predict the number ofpatients Dr. Fok will see in years 11 and 12 as a function of time.How well does the model fit the data?
arrow_forward
Sales for the past 12 months at computer success are given here:
January 3,000 July 6,300
february 3,400 August 7,200
March 3,700 Sept 6,400
April 4,100 Oct 4,600
May 4,700 Nov 4,200
June 5,700 December 3,900
a. Use a 3-month moving average to forecast the sales for the months May through December
b. Use a 4-month moving average to forecast the sales for the months May through December
C. Compare the performance of the two methods by using the mean absolute deviation as the performance criterion. Which method would you recommend?
d. Compare the performance of the two methods by using the mean absolute percent error as the performance criterion. Which method would you recommend?
e. Compare the performance of the two methods by using the mean squared error as the performance criterion. Which method would you recommend?
arrow_forward
b. The following table shows the number of televisions sold over the last ten years at a
local electronic store.
YEAR
1
2
3
4
5
6
TV SALES
150
300
480
600
630
640
700
825
900
980
7
8
9
10
i. Using trend projection, develop a formula to predict sales for years 11 and 12. You
have to show all working. You will need to develop a table to calculate the slope and
the intercept.
ii. Use that formula to forecast television sales for years 11 and 12.
arrow_forward
The table below shows the sales figures for a brand of shoe over the last 12 months.Months SalesJanuary 69February 75March 86April 92May 95June 100July 108August 115September 125October 131November 140December 150
Using the following, forecast the sales for the months up to January the following year:-
1. Use exponential Smoothing when α= .6 and the forecast for March is 350.2. Determine which of the three forecasting technique is the most accurate using Mean Absolute Deviation MAD
arrow_forward
Mark Gershon, owner of a musical instrument distributorship, thinks that demand for guitars may be related to the
number of television appearances by the popular group Maroon 5 during the previous month. Mark has collected the data shown in the following table:
Demand for Guitars
3
6
7
5
10
7
Maroon 5 TV Appearances
3
4
7
6
8
5
Graph these data to see whether a linear equation might describe the relationship between the group’s television shows and guitar sales.
Use the least-squares regression method to derive a forecasting equation.
Use the least-squares regression method to derive a forecasting equation.
What is your estimate for guitar sales if Maroon 5 performed on TV nine times last month?
Weekly sales of copy paper at Cubicle Suppliers are provided in the table below. Compute a three-period moving average and a four-period moving average for weeks 5, 6, and 7. Compute the MAD for both forecasting methods. Which model is more…
arrow_forward
17. The manager of a utility company in the Texas panhandle wants to develop quarterly
forecasts of power loads for the next year. The power loads are seasonal, and the data on
the quarterly loads in megawatts (MW) for the past 4 years are as follows:
Quarter
1
2
3
Year 1
103.5
126.1
144.5
166.1
Year 2
94.7
116.0
137.1
152.5
Year 3
118.6
141.2
159.0
178.2
The manager estimates the total demand for the next year at 600 MW. Use the
multiplicative seasonal method to develop the forecast for each quarter.
Year 4
109.3
131.6
149.5
169.0
arrow_forward
7.
The following multiple-regression model was developed to predict job performance as measured by a company job performance evaluation index based on a preemployment test score and college grade point average (GPA):
y=35+20x1+50x2,
where
y=job
performance evaluation index,
x1=preemployment
test score, and
x2=college
GPA.
Part 2
a) For an applicant who had a 3.0 GPA and scored 80 on the preemployment test, the forecast for the job performance
index= __________ (enter your response as a whole number).
arrow_forward
please solve within 30 minutes.
arrow_forward
3) Seasonality: The following data represent dinner sales at a busy restaurant. Use linear
regression to predict sales for each day of week 5 and the total sales for week 5. Estimate the
percentage of weekly sales that occur over the weekend (include Friday, Saturday, and
Sunday). Finally, determine which days of the week are increasing and decreasing in sales, using
the slopes of the LR lines.
Week
Mon
Wed
Fri
Sat
Sun
Tue
177
170
Thu
190
Total
270
152
180
321
386
166
218
203
402
427
167
333
357
229
3
158
170
170
205
163
173
158
225
349
433
212
a) Graph the seasonal data and attach the graph to this page.
b) Determine the slope for each day of the week.
Mon
Tue
Wed
Thu
Fri
Sat
Sun
Total
Slope
c) Estimate the percentage of weekend sales in week 5:
d) For which day are sales increasing the fastest?
e) For which day are sales decreasing the fastest?
arrow_forward
Corporate triple-A bond interest rates for 12 consecutive months follow.
9.5
9.4
9.5
9.6
9.8
9.8
9.9
10.6
9.9
9.7
9.6
9.6
(a)
Construct a time series plot.
What type of pattern exists in the data?
The data appear to follow a trend pattern.The data appear to follow a cyclical pattern. The data appear to follow a horizontal pattern.The data appear to follow a seasonal pattern.
(b)
Develop three-month and four-month moving averages for this time series. (Round your answers to two decimal places.)
Month
Time SeriesValue
3-Month MovingAverage Forecast
4-Month MovingAverage Forecast
1
9.5
2
9.4
3
9.5
4
9.6
5
9.8
6
9.8
7
9.9
8
10.6
9
9.9
10
9.7
11
9.6
12
9.6
Does the three-month or four-month moving average provide more accurate forecasts based on MSE? Explain.
The three-month moving average provides more accurate forecasts, because its MSE is larger than that of the four-month moving…
arrow_forward
The number of internal disk drives (in millions) made at a plant in Taiwan during the past 5 years follows:
Year Disk Drives
1 142
2 156
3 184
4 204
5 210
a) Using simple linear regression the forecast for the number of disk drives to be made next year= 234.4 disk drives
b) The mean squared error when using simple linear regression = 24.64 drives2
c) The mean absolute percentage error (mape) when using simple linear regression= [___]% (round your response to 1 decimal place)
arrow_forward
The following time series represents the number of automobiles sold by a car dealership each of the past five months.
t
1
2
3
4
5
Yt
7
12
10
13
14
(a) Construct a time series plot.
What type of pattern exists in the data?
The time series plot shows a linear trend.The time series plot shows a horizontal pattern. The time series plot shows a seasonal pattern.The time series plot shows a nonlinear trend.
(b)
Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series.
t =
(c)
What is the forecast for
t = 6?
arrow_forward
Answer in Excel:
Consider the data below for the sales of widgets: 1. Using seasonal percentages or seasonal indexes, forecast the sales for each season in year 4, if the annual widgets sales is predicted to be 1500. 2. Develop a regression equation that captures both the trend and seasonality in this data. Use this equation to forecast the sales for each season in year 4.
Season
Year 1
Year 2
Year 3
Fall
505
240
210
Winter
555
460
365
Spring
400
310
204
Summer
560
450
394
arrow_forward
The table below summarizes September 20223 financial performance by two market segments. The dates highlighted in orange indicate occupancy higher than 90% Please find the correct statement about the data analysis based on the table. Market segment 1 shows pattern of peak on Weekdays. Market segment 2 shows a pattern of peak on Weekends. Based on the demand pattern, Market Segment 2 could be leisure travelers. Most guests in the market segment 2 are less price sensitive than those in the market segment 1.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Practical Management Science
Operations Management
ISBN:9781337406659
Author:WINSTON, Wayne L.
Publisher:Cengage,
Related Questions
- Corporate triple-A bond interest rates for 12 consecutive months are as follows. 9.5 9.4 9.5 9.7 9.8 9.8 9.9 10.6 10.0 9.8 9.7 9.7 (a) Construct a time series plot. What type of pattern exists in the data? The data appear to follow a trend pattern.The data appear to follow a horizontal pattern. The data appear to follow a seasonal pattern.The data appear to follow a cyclical pattern. (b) Develop three-month and four-month moving averages for this time series. (Round your answers to two decimal places.) Month Time SeriesValue 3-Month MovingAverage Forecast 4-Month MovingAverage Forecast 1 9.5 2 9.4 3 9.5 4 9.7 5 9.8 6 9.8 7 9.9 8 10.6 9 10.0 10 9.8 11 9.7 12 9.7 Does the three-month or four-month moving average provide the better forecasts based on MSE? Explain. The four-month moving average provides more accurate forecasts, because its MSE is smaller than that of the three-month moving…arrow_forwardDr. Lillian Fok, a New Orleans psychologist, spe-cializes in treating patients who are agoraphobic (i.e., afraid to leave their homes). The following table indicates how manypatients Dr. Fok has seen each year for the past 10 years. It alsoindicates what the robbery rate was in New Orleans during thesame year: Using trend (linear regression) analysis, predict the number ofpatients Dr. Fok will see in years 11 and 12 as a function of time.How well does the model fit the data?arrow_forwardSales for the past 12 months at computer success are given here: January 3,000 July 6,300 february 3,400 August 7,200 March 3,700 Sept 6,400 April 4,100 Oct 4,600 May 4,700 Nov 4,200 June 5,700 December 3,900 a. Use a 3-month moving average to forecast the sales for the months May through December b. Use a 4-month moving average to forecast the sales for the months May through December C. Compare the performance of the two methods by using the mean absolute deviation as the performance criterion. Which method would you recommend? d. Compare the performance of the two methods by using the mean absolute percent error as the performance criterion. Which method would you recommend? e. Compare the performance of the two methods by using the mean squared error as the performance criterion. Which method would you recommend?arrow_forward
- b. The following table shows the number of televisions sold over the last ten years at a local electronic store. YEAR 1 2 3 4 5 6 TV SALES 150 300 480 600 630 640 700 825 900 980 7 8 9 10 i. Using trend projection, develop a formula to predict sales for years 11 and 12. You have to show all working. You will need to develop a table to calculate the slope and the intercept. ii. Use that formula to forecast television sales for years 11 and 12.arrow_forwardThe table below shows the sales figures for a brand of shoe over the last 12 months.Months SalesJanuary 69February 75March 86April 92May 95June 100July 108August 115September 125October 131November 140December 150 Using the following, forecast the sales for the months up to January the following year:- 1. Use exponential Smoothing when α= .6 and the forecast for March is 350.2. Determine which of the three forecasting technique is the most accurate using Mean Absolute Deviation MADarrow_forwardMark Gershon, owner of a musical instrument distributorship, thinks that demand for guitars may be related to the number of television appearances by the popular group Maroon 5 during the previous month. Mark has collected the data shown in the following table: Demand for Guitars 3 6 7 5 10 7 Maroon 5 TV Appearances 3 4 7 6 8 5 Graph these data to see whether a linear equation might describe the relationship between the group’s television shows and guitar sales. Use the least-squares regression method to derive a forecasting equation. Use the least-squares regression method to derive a forecasting equation. What is your estimate for guitar sales if Maroon 5 performed on TV nine times last month? Weekly sales of copy paper at Cubicle Suppliers are provided in the table below. Compute a three-period moving average and a four-period moving average for weeks 5, 6, and 7. Compute the MAD for both forecasting methods. Which model is more…arrow_forward
- 17. The manager of a utility company in the Texas panhandle wants to develop quarterly forecasts of power loads for the next year. The power loads are seasonal, and the data on the quarterly loads in megawatts (MW) for the past 4 years are as follows: Quarter 1 2 3 Year 1 103.5 126.1 144.5 166.1 Year 2 94.7 116.0 137.1 152.5 Year 3 118.6 141.2 159.0 178.2 The manager estimates the total demand for the next year at 600 MW. Use the multiplicative seasonal method to develop the forecast for each quarter. Year 4 109.3 131.6 149.5 169.0arrow_forward7. The following multiple-regression model was developed to predict job performance as measured by a company job performance evaluation index based on a preemployment test score and college grade point average (GPA): y=35+20x1+50x2, where y=job performance evaluation index, x1=preemployment test score, and x2=college GPA. Part 2 a) For an applicant who had a 3.0 GPA and scored 80 on the preemployment test, the forecast for the job performance index= __________ (enter your response as a whole number).arrow_forwardplease solve within 30 minutes.arrow_forward
- 3) Seasonality: The following data represent dinner sales at a busy restaurant. Use linear regression to predict sales for each day of week 5 and the total sales for week 5. Estimate the percentage of weekly sales that occur over the weekend (include Friday, Saturday, and Sunday). Finally, determine which days of the week are increasing and decreasing in sales, using the slopes of the LR lines. Week Mon Wed Fri Sat Sun Tue 177 170 Thu 190 Total 270 152 180 321 386 166 218 203 402 427 167 333 357 229 3 158 170 170 205 163 173 158 225 349 433 212 a) Graph the seasonal data and attach the graph to this page. b) Determine the slope for each day of the week. Mon Tue Wed Thu Fri Sat Sun Total Slope c) Estimate the percentage of weekend sales in week 5: d) For which day are sales increasing the fastest? e) For which day are sales decreasing the fastest?arrow_forwardCorporate triple-A bond interest rates for 12 consecutive months follow. 9.5 9.4 9.5 9.6 9.8 9.8 9.9 10.6 9.9 9.7 9.6 9.6 (a) Construct a time series plot. What type of pattern exists in the data? The data appear to follow a trend pattern.The data appear to follow a cyclical pattern. The data appear to follow a horizontal pattern.The data appear to follow a seasonal pattern. (b) Develop three-month and four-month moving averages for this time series. (Round your answers to two decimal places.) Month Time SeriesValue 3-Month MovingAverage Forecast 4-Month MovingAverage Forecast 1 9.5 2 9.4 3 9.5 4 9.6 5 9.8 6 9.8 7 9.9 8 10.6 9 9.9 10 9.7 11 9.6 12 9.6 Does the three-month or four-month moving average provide more accurate forecasts based on MSE? Explain. The three-month moving average provides more accurate forecasts, because its MSE is larger than that of the four-month moving…arrow_forwardThe number of internal disk drives (in millions) made at a plant in Taiwan during the past 5 years follows: Year Disk Drives 1 142 2 156 3 184 4 204 5 210 a) Using simple linear regression the forecast for the number of disk drives to be made next year= 234.4 disk drives b) The mean squared error when using simple linear regression = 24.64 drives2 c) The mean absolute percentage error (mape) when using simple linear regression= [___]% (round your response to 1 decimal place)arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Practical Management ScienceOperations ManagementISBN:9781337406659Author:WINSTON, Wayne L.Publisher:Cengage,
Practical Management Science
Operations Management
ISBN:9781337406659
Author:WINSTON, Wayne L.
Publisher:Cengage,