Bartleby Sitemap - Textbook Solutions

All Textbook Solutions for Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card) (MindTap Course List)

22E23. Because of staffing decisions, managers of the Gibson-Marimont Hotel are interested in the variability in the number of rooms occupied per day during a particular season of the year. A sample of 20 days of operation shows a sample mean of 290 rooms occupied per day and a sample standard deviation of 30 rooms. What is the point estimate of the population variance? Provide a 90% confidence interval estimate of the population variance. Provide a 90% confidence interval estimate of the population standard deviation. 24SEBusiness Travel Costs. According to the 2017 Corporate Travel Index compiled by Business Travel News, the average daily cost for business travel in the United States rose to $321 per day (Executive Travel website, https://executivetravel.com/new-business-travel-study-says-average-per-diem-is-now-321day/). The file Travel contains sample data for an analogous study on the estimated daily living costs for an executive traveling to various international cities. The estimates include a single room at a four-star hotel, beverages, breakfast, taxi fares, and incidental costs. Compute the sample mean. Compute the sample standard deviation. Compute a 95% confidence interval for the population standard deviation. Manufacture of Ball Bearings. Ball bearing manufacturing is a highly precise business in which minimal part variability is critical. Large variances in the size of the ball bearings cause bearing failure and rapid wearout. Production standards call for a maximum variance of .0001 inches2. Gerry Liddy has gathered a sample of 15 bearings that shows a sample standard deviation of .014 inches. Use α = .10 to determine whether the sample indicates that the maximum acceptable variance is being exceeded. Compute the 90% confidence interval estimate of the variance of the ball bearings in the population. The filling variance for boxes of cereal is designed to be .02 or less. A sample of 41 boxes of cereal shows a sample standard deviation of .16 ounces. Use α = .05 to determine whether the variance in the cereal box fillings is exceeding the design specification. City Trucking, Inc., claims consistent delivery times for its routine customer deliveries. A sample of 22 truck deliveries shows a sample variance of 1.5. Test to determine whether H0: can be rejected. Use α = .10. Daily Patient Volume at Dental Clinic. A sample of 9 days over the past six months showed that Philip Sherman, DDS, treated the following numbers of patients at his dental clinic: 22, 25, 20, 18, 15, 22, 24, 19, and 26. If the number of patients seen per day is normally distributed, would an analysis of these sample data reject the hypothesis that the variance in the number of patients seen per day is equal to 10? Use a .10 level of significance. What is your conclusion? 30SEGolf Scores. Is there any difference in the variability in golf scores for players on the LPGA Tour (the women’s professional golf tour) and players on the PGA Tour (the men’s professional golf tour)? A sample of 20 tournament scores from LPGA events showed a standard deviation of 2.4623 strokes, and a sample of 30 tournament scores from PGA events showed a standard deviation of 2.2118. Conduct a hypothesis test for equal population variances to determine if there is any statistically significant difference in the variability of golf scores for male and female professional golfers. Use α = .10. What is your conclusion? Grade Point Average Comparison. The grade point averages of 352 students who completed a college course in financial accounting have a standard deviation of .940. The grade point averages of 73 students who dropped out of the same course have a standard deviation of .797. Do the data indicate a difference between the variances of grade point averages for students who completed a financial accounting course and students who dropped out? Use a .05 level of significance. Note: F.025 with 351 and 72 degrees of freedom is 1.466. 33SETwo new assembly methods are tested and the variances in assembly times are reported. Use α = .10 and test for equality of the two population variances. An Air Force introductory course in electronics uses a personalized system of instruction whereby each student views a videotaped lecture and then is given a programmed instruction text. The students work independently with the text until they have completed the training and passed a test. Of concern is the varying pace at which the students complete this portion of their training program. Some students are able to cover the programmed instruction text relatively quickly, whereas other students work much longer with the text and require additional time to complete the course. The fast students wait until the slow students complete the introductory course before the entire group proceeds together with other aspects of their training. A proposed alternative system involves use of computer-assisted instruction. In this method, all students view the same videotaped lecture and then each is assigned to a computer terminal for further instruction. The computer guides the student, working independently, through the self-training portion of the course. To compare the proposed and current methods of instruction, an entering class of 122 students was assigned randomly to one of the two methods. One group of 61 students used the current programmed-text method and the other group of 61 students used the proposed computer-assisted method. The time in hours was recorded for each student in the study. The following data are provided in the data set Training. Managerial Report Use appropriate descriptive statistics to summarize the training time data for each method. What similarities or differences do you observe from the sample data? Conduct a hypothesis test on the difference between the population means for the two methods. Discuss your findings. Compute the standard deviation and variance for each training method. Conduct a hypothesis test about the equality of population variances for the two training methods. Discuss your findings. What conclusion can you reach about any differences between the two methods? What is your recommendation? Explain. Can you suggest other data or testing that might be desirable before making a final decision on the training program to be used in the future? Meticulous Drill & Reamer (MD&R) specializes in drilling and boring precise holes in hard metals (e.g., steel alloys, tungsten carbide, and titanium). The company recently contracted to drill holes with 3-centimeter diameters in large carbon-steel alloy disks, and it will have to purchase a special drill to complete this job. MD&R has eliminated all but two of the drills it has been considering: Davis Drills’ T2005 and Worth Industrial Tools’ AZ100. These producers have each agreed to allow MD&R to use a T2005 and an AZ100 for one week to determine which drill it will purchase. During the one-week trial, MD&R uses each of these drills to drill 31 holes with a target diameter of 3 centimeters in one large carbon-steel alloy disk, then measures the diameter of each hole and records the results. MD&R’s results are provided in the table that follows and are available in the DATAfile named MeticulousDrills. MD&R wants to consider both the accuracy (closeness of the diameter to 3 centimeters) and the precision (the variance of the diameter) of the holes drilled by the T2005 and the AZ100 when deciding which model to purchase. Managerial Report In making this assessment for MD&R, consider the following four questions: Are the holes drilled by the T2005 or the AZ100 more accurate? That is, which model of drill produces holes with a mean diameter closer to 3 centimeters? Are the holes drilled by the T2005 or the AZ100 more precise? That is, which model of drill produces holes with a smaller variance? Conduct a test of the hypothesis that the T2005 and the AZ100 are equally precise (that is, have equal variances) at a = .05. Discuss your findings. Which drill do you recommend to MD&R? Why? Test the following hypotheses by using the χ2 goodness of fit test. H0: pA = .40, pB = .40, and pC = .20 Ha: The population proportions are not pA = .40, pB = .40, and pC = .20 A sample of size 200 yielded 60 in category A, 120 in category B, and 20 in category C. Use α = .01 and test to see whether the proportions are as stated in H0. Use the p-value approach. Repeat the test using the critical value approach. Suppose we have a multinomial population with four categories: A, B, C, and D. The null hypothesis is that the proportion of items is the same in every category. The null hypothesis is A sample of size 300 yielded the following results. A: 85 B: 95 C: 50 D: 70 Use α = .05 to determine whether H0 should be rejected. What is the p-value? Television Audiences Across Networks. During the first 13 weeks of the television season, the Saturday evening 8 p.m. to 9 p.m. audience proportions were recorded as ABC 29%, CBS 28%, NBC 25%, and independents 18%. A sample of 300 homes two weeks after a Saturday night schedule revision yielded the following viewing audience data: ABC 95 homes, CBS 70 homes, NBC 89 homes, and independents 46 homes. Test with α = .05 to determine whether the viewing audience proportions changed. M&M Candy Colors. Mars, Inc. manufactures M&M’s, one of the most popular candy treats in the world. The milk chocolate candies come in a variety of colors including blue, brown, green, orange, red, and yellow. The overall proportions for the colors are .24 blue, .13 brown, .20 green, .16 orange, .13 red, and .14 yellow. In a sampling study, several bags of M&M milk chocolates were opened and the following color counts were obtained. Use a .05 level of significance and the sample data to test the hypothesis that the all proportions for the colors are as stated above. What is your conclusion? America’s Favorite Sports. The Harris Poll tracks the favorite sport of Americans who follow at least one sport. Results of the poll show that professional football is the favorite sport of 33% of Americans who follow at least one sport, followed by baseball at 15%, men’s college football at 10%, auto racing at 6%, men’s professional basketball at 5%, and ice hockey at 5%, with other sport at 26%. Consider a survey in which 344 college undergraduates who follow at least one sport were asked to identify their favorite sport produced the following results: Do college undergraduate students differ from the general public with regard to their favorite sports? Use α = .05. Traffic Accidents by Day of Week. The National Highway Traffic Safety Administration reported the percentage of traffic accidents occurring each day of the week. Assume that a sample of 420 accidents provided the following data. Conduct a hypothesis test to determine if the proportion of traffic accidents is the same for each day of the week. What is the p-value? Using a .05 level of significance, what is your conclusion? Compute the percentage of traffic accidents occurring on each day of the week. What day has the highest percentage of traffic accidents? Does this seem reasonable? Discuss. The following table contains observed frequencies for a sample of 200. Test for independence of the row and column variables using α = .05. 8EAirline Ticket Purchases for Domestic and International Flights. A Bloomberg Businessweek subscriber study asked, “In the past 12 months, when traveling for business, what type of airline ticket did you purchase most often?” A second question asked if the type of airline ticket purchased most often was for domestic or international travel. Sample data obtained are shown in the following table. Type of Flight Type of Ticket Domestic International First class 29 22 Business class 95 121 Economy class 518 135 Using a .05 level of significance, is the type of ticket purchased independent of the type of flight? What is your conclusion? Discuss any dependence that exists between the type of ticket and type of flight. Hiring and Firing Plans at Private and Public Companies. A Deloitte employment survey asked a sample of human resource executives how their company planned to change its workforce over the next 12 months. A categorical response variable showed three options: The company plans to hire and add to the number of employees, the company plans no change in the number of employees, or the company plans to lay off and reduce the number of employees. Another categorical variable indicated if the company was private or public. Sample data for 180 companies are summarized as follows. Conduct a test of independence to determine if the employment plan for the next 12 months is independent of the type of company. At a .05 level of significance, what is your conclusion? Discuss any differences in the employment plans for private and public companies over the next 12 months. Health insurance benefits vary by the size of the company (the Henry J. Kaiser Family Foundation website, June 23, 2016). The sample data below show the number of companies providing health insurance for small, medium, and large companies. For purposes of this study, small companies are companies that have fewer than 100 employees. Medium-sized companies have 100 to 999 employees, and large companies have 1000 or more employees. The questionnaire sent to 225 employees asked whether or not the employee had health insurance and then asked the employee to indicate the size of the company. Conduct a test of independence to determine whether health insurance coverage is independent of the size of the company. What is the p-value? Using a .05 level of significance, what is your conclusion? A newspaper article indicated employees of small companies are more likely to lack health insurance coverage. Use percentages based on the above data to support this conclusion. Vehicle Quality Ratings. A J. D. Power and Associates vehicle quality survey asked new owners a variety of questions about their recently purchased automobile. One question asked for the owner’s rating of the vehicle using categorical responses of average, outstanding, and exceptional. Another question asked for the owner’s education level with the categorical responses some high school, high school graduate, some college, and college graduate. Assume the sample data below are for 500 owners who had recently purchased an automobile. Use a .05 level of significance and a test of independence to determine if a new owner’s vehicle quality rating is independent of the owner’s education. What is the p-value and what is your conclusion? Use the overall percentage of average, outstanding, and exceptional ratings to comment upon how new owners rate the quality of their recently purchased automobiles. Company Reputation and Management Quality Survey. The Wall Street Journal Annual Corporate Perceptions Study surveyed readers and asked how they rated the quality of management and the reputation of the company for more than 250 worldwide corporations. Both the quality of management and the reputation of the company were rated on a categorical scale of excellent, good, and fair categorical. Assume the sample data for 200 respondents below applies to this study. Use a .05 level of significance and test for independence of the quality of management and the reputation of the company. What is the p-value and what is your conclusion? If there is a dependence or association between the two ratings, discuss and use probabilities to justify your answer. 14EThe Carnegie Classification of Institutes of Higher Education categorizes colleges and universities on the basis of their research and degree-granting activities. Universities that grant doctoral degrees are placed into one of three classifications: highest research activity, higher research activity, or moderate research activity. The Carnegie classifications for public and not-for-profit private doctoral degree–granting universities follow. 16EUse the sample data below to test the hypotheses H0: p1 = p2 = p3 Ha: Not all population proportions are equal where pi is the population proportion of Yes responses for population i. Using a .05 level of significance, what is the p-value and what is your conclusion? 2. reconsider the observed frequencies in exercise 1 a. Compute the sample proportion for each population. b. use the multiple comparison procedure to determine which population proportions differ significantly. use a .05 level of significance. Populations Response 1 2 3 Yes 150 150 96 No 100 150 104 Late Flight Comparison Across Airlines. The sample data below represent the number of late and on time flights for Delta, United, and US Airways. a. Formulate the hypotheses for a test that will determine if the population proportion of late flights is the same for all three airlines. b. Conduct the hypothesis test with a .05 level of significance. What is the p-value and what is your conclusion? c. Compute the sample proportion of late flights for each airline. What is the overall proportion of late flights for the three airlines? Electronic Component Supplier Quality Comparison. Benson Manufacturing is considering ordering electronic components from three different suppliers. The suppliers may differ in terms of quality in that the proportion or percentage of defective components may differ among the suppliers. To evaluate the proportion of defective components for the suppliers, Benson has requested a sample shipment of 500 components from each supplier. The number of defective components and the number of good components found in each shipment are as follows. a. Formulate the hypotheses that can be used to test for equal proportions of defective components provided by the three suppliers. b. Using a .05 level of significance, conduct the hypothesis test. What is the p-value and what is your conclusion? c. Conduct a multiple comparison test to determine if there is an overall best supplier or if one supplier can be eliminated because of poor quality. Kate Sanders, a researcher in the department of biology at IPFW University, studied the effect of agriculture contaminants on the stream fish population in northeastern Indiana (April 2012). Specially designed traps collected samples of fish at each of four stream locations. A research question was, Did the differences in agricultural contaminants found at the four locations alter the proportion of the fish population by gender? Observed frequencies were as follows. Focusing on the proportion of male fish at each location, test the hypothesis that the population proportions are equal for all four locations. Use a .05 level of significance. What is the p-value and what is your conclusion? Does it appear that differences in agricultural contaminants found at the four locations altered the fish population by gender? Error Rates in Tax Preparation. A tax preparation firm is interested in comparing the quality of work at two of its regional offices. The observed frequencies showing the number of sampled returns with errors and the number of sampled returns that were correct are as follows. What are the sample proportions of returns with errors at the two offices? Use the chi-square test procedure to see if there is a significant difference between the population proportion of error rates for the two offices. Test the null hypothesis H0: p1 = p2 with a .10 level of significance. What is the p-value and what is your conclusion? Note: We generally use the chi-square test of equal proportions when there are three or more populations, but this example shows that the same chi-square test can be used for testing equal proportions with two populations. In the Section 10.2, a z test was used to conduct the above test. Either a χ2 test statistic or a z test statistic may be used to test the hypothesis. However, when we want to make inferences about the proportions for two populations, we generally prefer the z test statistic procedure. Refer to the Notes and Comments at the end of this section and comment on why the z test statistic provides the user with more options for inferences about the proportions of two populations. Social networking is becoming more and more popular around the world. Pew Research Center used a survey of adults in several countries to determine the percentage of adults who use social networking sites (USA Today, February 8, 2012). Assume that the results for surveys in Great Britain, Israel, Russia, and United States are as follows. Conduct a hypothesis test to determine whether the proportion of adults using social networking sites is equal for all four countries. What is the p-value? Using a .05 level of significance, what is your conclusion? What are the sample proportions for each of the four countries? Which country has the largest proportion of adults using social networking sites? Using a .05 level of significance, conduct multiple pairwise comparison tests among the four countries. What is your conclusion? Supplier Quality: Three Inspection Outcomes. The Ertl Company is well known for its high-quality die-cast metal alloy toy replicas of tractors and other farm equipment. As part of a periodic procurement evaluation, Ertl is considering purchasing parts for a toy tractor line from three different suppliers. The parts received from the suppliers are classified as having a minor defect, having a major defect, or being good. Test results from samples of parts received from each of the three suppliers are shown below. Note that any test with these data is no longer a test of proportions for the three supplier populations because the categorical response variable has three outcomes: minor defect, major defect, and good. Using the data above, conduct a hypothesis test to determine if the distribution of defects is the same for the three suppliers. Use the chi-square test calculations as presented in this section with the exception that a table with r rows and c columns results in a chi-square test statistic with (r – 1)(c – 1) degrees of freedom. Using a .05 level of significance, what is the p-value and what is your conclusion? In 2011, the industries with the most complaints to the Better Business Bureau were banks, cable and satellite television companies, collection agencies, cellular phone providers, and new car dealerships (USA Today, April 16, 2012). The results for a sample of 200 complaints are contained in the DATAfile named BBB. Construct a frequency distribution for the number of complaints by industry. Using α = .01, conduct a hypothesis test to determine whether the probability of a complaint is the same for the five industries. What is your conclusion? Drop the industry with the most complaints. Using α = .05, conduct a hypothesis test to determine whether the probability of a complaint is the same for the remaining four industries. Bistro 65 is a chain of Italian restaurants with locations in Ohio and Kentucky. The Bistro 65 menu has four categories of entrees: Pasta, Steak & Chops, Seafood, and Other (e.g., pizza, sandwiches, etc.). Historical data for the chain show that the probability a customer will order an entrée from one of the four categories is .4 for Pasta, .1 for Steak & Chops, .2 for Seafood, and .3 for Other. A new Bistro 65 restaurant has just opened in Dayton, Ohio, and the following purchase frequencies have been observed for the first 200 customers. Conduct a hypothesis test to determine whether the order pattern for the new restaurant in Dayton is the same as the historical pattern for the established Bistro 65 restaurants. Use α = .05. If the difference in part (a) is significant, prepare a bar chart to show where the differences occur. Comment on any differences observed. Based on sales over a six-month period, the five top-selling compact cars are Chevy Cruze, Ford Focus, Hyundai Elantra, Honda Civic, and Toyota Corolla (Motor Trend, November 2, 2011). Based on total sales, the market shares for these five compact cars were Chevy Cruze 24%, Ford Focus 21%, Hyundai Elantra 20%, Honda Civic 18%, and Toyota Corolla 17%. A sample of 400 compact car sales in Chicago showed the following number of vehicles sold. Use a goodness of fit test to determine whether the sample data indicate that the market shares for the five compact cars in Chicago are different than the market shares reported by Motor Trend. Using a .05 level of significance, what is the p-value and what is your conclusion? What market share differences, if any, exist in Chicago? Pace-of-Life Preference By Gender. A Pew Research Center survey asked respondents if they would rather live in a place with a slower pace of life or a place with a faster pace of life. The survey also asked the respondent’s gender. Consider the following sample data. Is the preferred pace of life independent of gender? Using a .05 level of significance, what is the p-value and what is your conclusion? Discuss any differences between the preferences of men and women. Church Attendance by Age Group. The Barna Group conducted a survey about church attendance. The survey respondents were asked about their church attendance and asked to indicate their age. Use the sample data to determine whether church attendance is independent of age. Using a .05 level of significance, what is the p-value and what is your conclusion? What conclusion can you draw about church attendance as individuals grow older? Ambulance Calls by Day of Week. An ambulance service responds to emergency calls for two counties in Virginia. One county is an urban county and the other is a rural county. A sample of 471 ambulance calls over the past two years showed the county and the day of the week for each emergency call. Data are as follows. Test for independence of the county and the day of the week. Using a .05 level of significance, what is the p-value and what is your conclusion? 31SEPhoenix Marketing International identified Bridgeport, Connecticut; Los Alamos, New Mexico; Naples, Florida; and Washington, DC, as the four U.S. cities with the highest percentage of millionaires. Data consistent with that study show the following number of millionaires for samples of individuals from each of the four cities. What is the estimate of the percentage of millionaires in each of these cities? Using a .05 level of significance, test for the equality of the population proportion of millionaires for these four cities. What is the p-value and what is your conclusion? The five most popular art museums in the world are Musée du Louvre, the Metropolitan Museum of Art. British Museum, National Gallery, and Tate Modern (The Art Newspaper, April 2012). Which of these five museums would visitors most frequently rate as spectacular? Samples of recent visitors of each of these museums were taken, and the results of these samples follow. Use the sample data to calculate the point estimate of the population proportion of visitors who rated each of these museums as spectacular. Conduct a hypothesis test to determine if the population proportion of visitors who rated the museum as spectacular is equal for these five museums. Using a .05 level of significance, what is the p-value and what is your conclusion? In a study conducted by Zogby International for the Democrat and Chronicle, more than 700 New Yorkers were polled to determine whether the New York state government works. Respondents surveyed were asked questions involving pay cuts for state legislators, restrictions on lobbyists, term limits for legislators, and whether state citizens should be able to put matters directly on the state ballot for a vote. The results regarding several proposed reforms had broad support, crossing all demographic and political lines. Suppose that a follow-up survey of 100 individuals who live in the western region of New York was conducted. The party affiliation (Democrat, Independent, Republican) of each individual surveyed was recorded, as well as their responses to the following three questions. Should legislative pay be cut for every day the state budget is late? Yes ____ No ____ Should there be more restrictions on lobbyists? Yes ____ No ____ Should there be term limits requiring that legislators serve a fixed number of years? Yes ____ No ____ The responses were coded using 1 for a Yes response and 2 for a No response. The complete data set is available in the file NYReform. Managerial Report Use descriptive statistics to summarize the data from this study. What are your preliminary conclusions about the independence of the response (Yes or No) and party affiliation for each of the three questions in the survey? With regard to question 1, test for the independence of the response (Yes and No) and party affiliation. Use α = .05. With regard to question 2, test for the independence of the response (Yes and No) and party affiliation. Use α = .05. With regard to question 3, test for the independence of the response (Yes and No) and party affiliation. Use α = .05. Does it appear that there is broad support for change across all political lines? Explain. Six months ago, Fuentes Salty Snacks, Inc., added a new flavor to its line of potato chips. The new flavor, candied bacon, was introduced through a nationwide rollout supported by an extensive promotional campaign. Fuentes’ management is convinced that quick penetration into grocery stores is a key to the successful introduction of a new salty snack product, and management now wants determine whether availability of Fuentes’ Candied Bacon Potato Chips is consistent in grocery stores across regions of the United States. The marketing department has selected random samples of 40 grocery stores in each of its eight U.S. sales regions: New England (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont) Mid-Atlantic (New Jersey, New York, and Pennsylvania) Midwest (Illinois, Indiana, Michigan, Ohio, and Wisconsin) Great Plains (Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota Oklahoma, and South Dakota) South Atlantic (Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, West Virginia, and Washington, D.C.) Deep South (Alabama, Arkansas, Kentucky, Louisiana, Mississippi, Tennessee, and Texas) Mountain (Arizona, Colorado Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming) Pacific (Alaska, California, Hawaii, Oregon, and Washington) The stores in each sample were then contacted, and the manager of each store was asked whether the store currently carries Fuentes’ Candied Bacon Potato Chips. The complete data set is available in the file FuentesChips. Fuentes’ senior management now wants to use these data to assess whether penetration of Fuentes’ Candied Bacon Potato Chips in grocery stores is consistent across its eight U.S. sales regions. If penetration of Fuentes’ Candied Bacon Potato Chips in grocery stores differs across its eight U.S. sales regions, Fuentes’ management would also like to identify sales regions in which penetration of Fuentes’ Candied Bacon Potato Chips is lower or higher than expected. Managerial Report Prepare a managerial report that addresses the following issues. Use descriptive statistics to summarize the data from Fuentes’ study. Based on your descriptive statistics, what are your preliminary conclusions about the penetration of Fuentes’ Candied Bacon Potato Chips in grocery stores across its eight U.S. sales regions? Use the data from Fuentes’ study to test the hypothesis that the proportion of grocery stores that currently carries Fuentes’ Candied Bacon Potato Chips is equal across its eight U.S. sales regions. Use α = .05. Do the results of your hypothesis test provide evidence that Fuentes’ Candied Bacon Potato Chips have penetrated grocery stores across its eight U.S. sales regions? In which sales region(s) is penetration of Fuentes’ Candied Bacon Potato Chips lower or higher than expected? Use the Marascuilo pairwise comparison procedure at α = .05 to test for differences between regions. Fresno Board Games manufactures and sells several different board games online and through department stores nationwide. Fresno’s most popular game, ¡Cabestrillo Cinco!, is played with 5 six-sided dice. Fresno has purchased dice for this game from Box Cars, Ltd., for twenty-five years, but the company is now considering a move to Big Boss Gaming, Inc. (BBG), a new supplier that has offered to sell dice to Fresno at a substantially lower price. Fresno management is intrigued by the potential savings offered by BBG, but is also concerned about the quality of the dice produced by the new supplier. Fresno has a reputation for high integrity, and its management feels that it is imperative that the dice included with ¡Cabestrillo Cinco! are fair. To alleviate concerns about the quality of the dice it produces, BBG allows Fresno’s manager of product quality to randomly sample five dice from its most recent production run. While being observed by several members of the BBG management team, Fresno’s manager of product quality rolls each of these five randomly selected dice 500 times and records each outcome. The results for each of these five randomly selected dice are available in the file BBG. Fresno management now wants to use these data to assess whether any of these five six-sided dice is not fair; that is, does one outcome occur more frequently or less frequently than the other outcomes? Managerial Report Prepare a managerial report that addresses the following issues. Use descriptive statistics to summarize the data collected by Fresno’s manager of product quality for each of the five randomly selected dice. Based on these descriptive statistics, what are your preliminary conclusions about the fairness of the five selected dice? Use the data collected by Fresno’s manager of product quality to test the hypothesis that the first of the five randomly selected dice is fair, i.e., the distribution of outcomes for the first of the five randomly selected dice is multinomial with p1 = p2 = p3 = p4 = p5 = p6 = 1/6. Repeat this process for each of the other four randomly selected dice. Use α = .01. Do the results of your hypothesis tests provide evidence that BBG is producing unfair dice? 1. The following data are from a completely randomized design. Treatment A B C 162 142 126 142 156 122 165 124 138 145 142 140 148 136 150 174 152 128 Sample mean 156 142 134 Sample variance 164.4 131.2 110.4 Compute the sum of squares between treatments. Compute the mean square between treatments. Compute the sum of squares due to error. Compute the mean square due to error. Set up the ANOVA table for this problem. At the α = .05 level of significance, test whether the means for the three treatments are equal. 2. In a completely randomized design, seven experimental units were used for each of the five levels of the factor. Complete the following ANOVA table. Source of Variation Sum of Squares Degrees of Freedom Mean Square F p-value Treatments 300 Error Total 460 3. Refer to exercise 2. what hypotheses are implied in this problem? At the α = .05 level of significance, can we reject the null hypothesis in part (a)? Explain. 4E5. In a completely randomized design, 12 experimental units were used for the first treatment, 15 for the second treatment, and 20 for the third treatment. Complete the following analysis of variance. At a .05 level of significance, is there a significant difference between the treatments? Source of Variation Sum of Squares Degrees of Freedom Mean Square F p-value Treatments 1200 Error Total 1800 6E7. Three different methods for assembling a product were proposed by an industrial engineer. To investigate the number of units assembled correctly with each method, 30 employees were randomly selected and randomly assigned to the three proposed methods in such a way that each method was used by 10 workers. The number of units assembled correctly was recorded, and the analysis of variance procedure was applied to the resulting data set. The following results were obtained: SST = 10,800; SSTR = 4560. Set up the ANOVA table for this problem. Use α = .05 to test for any significant difference in the means for the three assembly methods. 8E9. To study the effect of temperature on yield in a chemical process, five batches were produced at each of three temperature levels. The results follow. Construct an analysis of variance table. Use a .05 level of significance to test whether the temperature level has an effect on the mean yield of the process. Temperature 50°C 60°C 70°C 34 30 23 24 31 28 36 34 28 39 23 30 32 27 31 10EFour different paints are advertised as having the same drying time. To check the manufacturer’s claims, five samples were tested for each of the paints. The time in minutes until the paint was dry enough for a second coat to be applied was recorded. The following data were obtained. At the α = .05 level of significance, test to see whether the mean drying time is the same for each type of paint. Restaurant Satisfaction. The Consumer Reports Restaurant Customer Satisfaction Survey is based upon 148,599 visits to full-service restaurant chains (Consumer Reports website, https://www.consumerreports.org/cro/restaurants/buying-guide/index.htm). One of the variables in the study is meal price, the average amount paid per person for dinner and drinks, minus the tip. Suppose a reporter for the Sun Coast Times thought that it would be of interest to her readers to conduct a similar study for restaurants located on the Grand Strand section in Myrtle Beach, South Carolina. The reporter selected a sample of 8 seafood restaurants, 8 Italian restaurants, and 8 steakhouses. The following data show the meal prices ($) obtained for the 24 restaurants sampled. Use α = .05 to test whether there is a significant difference among the mean meal price for the three types of restaurants. The following data are from a completely randomized design. At the α = .05 level of significance, can we reject the null hypothesis that the means of the three treatments are equal? Use Fisher’s LSD procedure to test whether there is a significant difference between the means for treatments A and B, treatments A and C, and treatments B and C. Use α = .05. Use Fisher’s LSD procedure to develop a 95% confidence interval estimate of the difference between the means of treatments A and B. 14ETesting Chemical Processes. To test whether the mean time needed to mix a batch of material is the same for machines produced by three manufacturers, the Jacobs Chemical Company obtained the following data on the time (in minutes) needed to mix the material. Use these data to test whether the population mean times for mixing a batch of material differ for the three manufacturers. Use α = .05. At the α = .05 level of significance, use Fisher’s LSD procedure to test for the equality of the means for manufacturers 1 and 3. What conclusion can you draw after carrying out this test? 16EMarketing Ethics. In the digital age of marketing, special care must be taken to make sure that programmatic ads appearing on websites align with a company’s strategy, culture and ethics. For example, in 2017, Nordstrom, Amazon and Whole Foods each faced boycotts form social media users when automated ads for these companies showed up on the Breitbart website (ChiefMarketer.com). It is important for marketing professionals to understand a company’s values and culture. The following data are from an experiment designed to investigate the perception of corporate ethical values among individuals specializing in marketing (higher scores indicate higher ethical values). Use α = .05 to test for significant differences in perception among the three groups. At the α = .05 level of significance, we can conclude that there are differences in the perceptions for marketing managers, marketing research specialists, and advertising specialists. Use the procedures in this section to determine where the differences occur. Use α = .05. 18E19EMinor league Baseball Attendance. The International League of Triple-A minor league baseball consists of 14 teams organized into three divisions: North, South, and West. The following data show the average attendance for the 14 teams in the International League. Also shown are the teams’ records; W denotes the number of games won, L denotes the number of games lost, and PCT is the proportion of games played that were won. Use α = .05 to test for any difference in the mean attendance for the three divisions. Use Fisher’s LSD procedure to determine where the differences occur. Use α = .05. Consider the experimental results for the following randomized block design. Make the calculations necessary to set up the analysis of variance table. Use α = .05 to test for any significant differences. 22E23. An experiment has been conducted for four treatments with eight blocks. Complete the following analysis of variance table. Source of Variation Sum of Squares Degrees of Freedom Mean Square Treatments 900     Blocks 400     Error       Total 1800     Use α = .05 to test for any significant differences. 24EThe price drivers pay for gasoline often varies a great deal across regions throughout the United States. The following data show the price per gallon for regular gasoline for a random sample of gasoline service stations for three major brands of gasoline (Shell, BP, and Marathon) located in 11 metropolitan areas across the upper Midwest region (OhioGasPrices.com website, March 18, 2012). Use α = .05 to test for any significant difference in the mean price of gasoline for the three brands. SAT Performance. The Scholastic Aptitude Test (SAT) contains three areas: critical reading, mathematics, and writing. Each area is scored on an 800-point scale. A sample of SAT scores for six students follows. a. Using a .05 level of significance, do students perform differently on the three areas of the SAT? b. Which area of the test seems to give the students the most trouble? Explain.A study reported in the Journal of the American Medical Association investigated the cardiac demands of heavy snow shoveling. Ten healthy men underwent exercise testing with a treadmill and a cycle ergometer modified for arm cranking. The men then cleared two tracts of heavy, wet snow by using a lightweight plastic snow shovel and an electric snow thrower. Each subject’s heart rate, blood pressure, oxygen uptake, and perceived exertion during snow removal were compared with the values obtained during treadmill and arm-crank ergometer testing. Suppose the following table gives the heart rates in beats per minute for each of the 10 subjects. At the .05 level of significance, test for any significant differences. A factorial experiment involving two levels of factor A and three levels of factor B resulted in the following data. Test for any significant main effects and any interaction. Use α = .05. 29E30EAmusement Park Queues. An amusement park studied methods for decreasing the waiting time (minutes) for rides by loading and unloading riders more efficiently. Two alternative loading/unloading methods have been proposed. To account for potential differences due to the type of ride and the possible interaction between the method of loading and unloading and the type of ride, a factorial experiment was designed. Use the following data to test for any significant effect due to the loading and unloading method, the type of ride, and interaction. Use = .05.32E33EIn a completely randomized experimental design, three brands of paper towels were tested for their ability to absorb water. Equal-size towels were used, with four sections of towels tested per brand. The absorbency rating data follow. At a .05 level of significance, does there appear to be a difference in the ability of the brands to absorb water? 35SE36SE37SEAssembly Methods. Three different assembly methods have been proposed for a new product. A completely randomized experimental design was chosen to determine which assembly method results in the greatest number of parts produced per hour, and 30 workers were randomly selected and assigned to use one of the proposed methods. The number of units produced by each worker follows. Use these data and test to see whether the mean number of parts produced is the same with each method. Use = .05.In a study conducted to investigate browsing activity by shoppers, each shopper was initially classified as a nonbrowser, light browser, or heavy browser. For each shopper, the study obtained a measure to determine how comfortable the shopper was in a store. Higher scores indicated greater comfort. Suppose the following data were collected. a. Use α = .05 to test for differences among comfort levels for the three types of browsers. b. Use Fisher’s LSD procedure to compare the comfort levels of nonbrowsers and light browsers. Use α = .05. What is your conclusion? Fuel Efficiency of Gasoline Brands. A research firm tests the miles-per-gallon characteristics of three brands of gasoline. Because of different gasoline performance characteristics in different brands of automobiles, five brands of automobiles are selected and treated as blocks in the experiment: that is, each brand of automobile is tested with each type of gasoline. The results of the experiment (in miles per gallon) follow. At α = .05, is there a significant difference in the mean miles-per-gallon characteristics of the three brands of gasoline? Analyze the experimental data using the ANOVA procedure for completely randomized designs. Compare your findings with those obtained in part (a). What is the advantage of attempting to remove the block effect? Late-Night Talk Show Viewership. Jimmy Kimmel Live! on ABC, The Tonight Show Starring Jimmy Fallon on NBC, and The Late Show with Stephen Colbert on CBS are three popular late-night talk shows. The following table shows the number of viewers in millions for a 10-week period during the spring for each of these shows (TV by the Numbers website, https://tvbythenumbers.zap2it.com/). At the .05 level of significance, test for a difference in the mean number of viewers per week for the three late-night talk shows.42SE43SE44SECASE PROBLEM 1: WENTWORTH MEDICAL CENTER As part of a long-term study of individuals 65 years of age or older, sociologists and physicians at the Wentworth Medical Center in upstate New York investigated the relationship between geographic location and depression. A sample of 60 individuals, all in reasonably good health, was selected; 20 individuals were residents of Florida, 20 were residents of New York, and 20 were residents of North Carolina. Each of the individuals sampled was given a standardized test to measure depression. The data collected follow; higher test scores indicate higher levels of depression. These data are contained in the file Medical1. A second part of the study considered the relationship between geographic location and depression for individuals 65 years of age or older who had a chronic health condition such as arthritis, hypertension, and/or heart ailment. A sample of 60 individuals with such conditions was identified. Again, 20 were residents of Florida, 20 were residents of New York, and 20 were residents of North Carolina. The levels of depression recorded for this study follow. These data are contained in the file named Medical2. Managerial Report 1. Use descriptive statistics to summarize the data from the two studies. What are your preliminary observations about the depression scores? 2. Use analysis of variance on both data sets. State the hypotheses being tested in each case. What are your conclusions? 3. Use inferences about individual treatment means where appropriate. What are your conclusions?CASE PROBLEM 2: COMPENSATION FOR SALES PROFESSIONALS Suppose that a local chapter of sales professionals in the greater San Francisco area conducted a survey of its membership to study the relationship, if any, between the years of experience and salary for individuals employed in inside and outside sales positions. On the survey, respondents were asked to specify one of three levels of years of experience: low (1–10 years), medium (11–20 years), and high (21 or more years). A portion of the data obtained follow. The complete data set, consisting of 120 observations, is contained in the file named SalesSalary. Managerial Report Use descriptive statistics to summarize the data. Develop a 95% confidence interval estimate of the mean annual salary for all salespersons, regardless of years of experience and type of position. Develop a 95% confidence interval estimate of the mean salary for inside salespersons. Develop a 95% confidence interval estimate of the mean salary for outside salespersons. Use analysis of variance to test for any significant differences due to position. Use a .05 level of significance, and for now, ignore the effect of years of experience. Use analysis of variance to test for any significant differences due to years of experience. Use a .05 level of significance, and for now, ignore the effect of position. At the .05 level of significance test for any significant differences due to position, years of experience, and interaction. TOURISTOPIA TRAVEL TourisTopia Travel (Triple T) is an online travel agency that specializes in trips to exotic locations around the world for groups of ten or more travelers. Triple Ts marketing manager has been working on a major revision of the homepage of Triple Ts website. The content for the homepage has been selected and the only remaining decisions involve the selection of the background color (white, green, or pink) and the type of font (Arial, Calibri, or Tahoma). Triple Ts IT group has designed prototype homepages featuring every combination of these background colors and fonts, and it has implemented computer code that will randomly direct each Triple T website visitor to one of these prototype homepages. For three weeks, the prototype homepage to which each visitor was directed and the amount of time in seconds spent at Triple Ts website during each visit were recorded. Ten visitors to each of the prototype homepages were then selected randomly; the complete data set for these visitors is available in the DATAfile named TourisTopia. Triple T wants to use these data to determine if the time spent by visitors to Triple Ts website differs by background color or font. It would also like to know if the time spent by visitors to the Triple T website differs by different combinations of background color and font. Managerial Report Prepare a managerial report that addresses the following issues. 1. Use descriptive statistics to summarize the data from Triple Ts study. Based on descriptive statistics, what are your preliminary conclusions about whether the time spent by visitors to the Triple T website differs by background color or font? What are your preliminary conclusions about whether time spent by visitors to the Triple T website differs by different combinations of background color and font? 2. Has Triple T used an observational study or a controlled experiment? Explain. 3. Use the data from Triple Ts study to test the hypothesis that the time spent by visitors to the Triple T website is equal for the three background colors. Include both factors and their interaction in the ANOVA model, and use = .05. 4. Use the data from Triple Ts study to test the hypothesis that the time spent by visitors to the Triple T website is equal for the three fonts. Include both factors and their interaction in the ANOVA model, and use = .05. 5. Use the data from Triple Ts study to test the hypothesis that time spent by visitors to the Triple T website is equal for the nine combinations of background color and font. Include both factors and their interaction in the ANOVA model, and use = .05. 6. Do the results of your analysis of the data provide evidence that the time spent by visitors to the Triple T website differs by background color, font, or combination of background color and font? What is your recommendation?Given are five observations for two variables, x and y. Develop a scatter diagram for these data. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Try to approximate the relationship between x and y by drawing a straight line through the data. Develop the estimated regression equation by computing the values of b0 and b1 using equations (14.6) and (14.7). Use the estimated regression equation to predict the value of y when x = 4. Given are five observations for two variables, x and y. Develop a scatter diagram for these data. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Try to approximate the relationship between x and y by drawing a straight line through the data. Develop the estimated regression equation by computing the values of b0 and b1 using equations (14.6) and (14.7). Use the estimated regression equation to predict the value of y when x = 10. Given are five observations collected in a regression study on two variables. Develop a scatter diagram for these data. Develop the estimated regression equation for these data. Use the estimated regression equation to predict the value of v when x = 6. Retail and Trade: Female Managers. The following data give the percentage of women working in five companies in the retail and trade industry. The percentage of management jobs held by women in each company is also shown. Develop a scatter diagram for these data with the percentage of women working in the company as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Try to approximate the relationship between the percentage of women working in the company and the percentage of management jobs held by women in that company. Develop the estimated regression equation by computing the values of b0 and b1. Predict the percentage of management jobs held by women in a company that has 60% women employees. Production Line Speed and Quality Control. Brawdy Plastics, Inc., produces plastic seat belt retainers for General Motors at the Brawdy Plastics plant in Buffalo, New York. After final assembly and painting, the parts are placed on a conveyor belt that moves the parts past a final inspection station. How fast the parts move past the final inspection station depends upon the line speed of the conveyor belt (feet per minute). Although faster line speeds are desirable, management is concerned that increasing the line speed too much may not provide enough time for inspectors to identify which parts are actually defective. To test this theory, Brawdy Plastics conducted an experiment in which the same batch of parts, with a known number of defective parts, was inspected using a variety of line speeds. The following data were collected. Develop a scatter diagram with the line speed as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Use the least squares method to develop the estimated regression equation. Predict the number of defective parts found for a line speed of 25 feet per minute. The National Football League (NFL) records a variety of performance data for individuals and teams. To investigate the importance of passing on the percentage of games won by a team, the following data show the average number of passing yards per attempt (Yds/Att) and the percentage of games won (WinPct) in a season for a random sample of 10 NFL teams. Develop a scatter diagram with the number of passing yards per attempt on the horizontal axis and the percentage of games won on the vertical axis. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Develop the estimated regression equation that could be used to predict the percentage of games won given the average number of passing yards per attempt. Provide an interpretation for the slope of the estimated regression equation. For the 2011 season, the average number of passing yards per attempt for the Kansas City Chiefs was 6.2. Use the estimated regression equation developed in part (c) to predict the percentage of games won by the Kansas City Chiefs. (Note: For the 2011 season the Kansas City Chiefs record was 7 wins and 9 losses.) Compare your prediction to the actual percentage of games won by the Kansas City Chiefs. Sales Experience and Performance. A sales manager collected the following data on annual sales for new customer accounts and the number of years of experience for a sample of 10 salespersons. Develop a scatter diagram for these data with years of experience as the independent variable. Develop an estimated regression equation that can be used to predict annual sales given the years of experience. Use the estimated regression equation to predict annual sales for a salesperson with 9 years of experience. Broker Satisfaction. The American Association of Individual Investors (AAII) On-Line Discount Broker Survey polls members on their experiences with discount brokers. As part of the survey, members were asked to rate the quality of the speed of execution with their broker as well as provide an overall satisfaction rating for electronic trades. Possible responses (scores) were no opinion (0), unsatisfied (1), somewhat satisfied (2), satisfied (3), and very satisfied (4). For each broker summary scores were computed by calculating a weighted average of the scores provided by each respondent. A portion of the survey results follow (AAII website). Develop a scatter diagram for these data with the speed of execution as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Develop the least squares estimated regression equation. Provide an interpretation for the slope of the estimated regression equation. Suppose Zecco.com developed new software to increase their speed of execution rating. If the new software is able to increase their speed of execution rating from the current value of 2.5 to the average speed of execution rating for the other 10 brokerage firms that were surveyed, what value would you predict for the overall satisfaction rating? Companies in the U.S. car rental market vary greatly in terms of the size of the fleet, the number of locations, and annual revenue. In 2011 Hertz had 320,000 cars in service and annual revenue of approximately $4.2 billion. The following data show the number of cars in service (1000s) and the annual revenue ($millions) for six smaller car rental companies (Auto Rental News website, August 7, 2012). Develop a scatter diagram with the number of cars in service as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Use the least squares method to develop the estimated regression equation. For every additional car placed in service, estimate how much annual revenue will change. Fox Rent A Car has 11,000 cars in service. Use the estimated regression equation developed in part (c) to predict annual revenue for Fox Rent A Car. Age and the Price of Wine. For a particular red wine, the following data show the auction price for a 750-milliliter bottle and the age of the wine in June of 2016 (WineX website). Develop a scatter diagram for these data with age as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between age and price? Develop the least squares estimated regression equation. Provide an interpretation for the slope of the estimated equation. Laptop Ratings. To help consumers in purchasing a laptop computer, Consumer Reports calculates an overall test score for each computer tested based upon rating factors such as ergonomics, portability, performance, display, and battery life. Higher overall scores indicate better test results. The following data show the average retail price and the overall score for ten 13-inch models (Consumer Reports website). Develop a scatter diagram with price as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Use the least squares method to develop the estimated regression equation. Provide an interpretation of the slope of the estimated regression equation. Another laptop that Consumer Reports tested is the Acer Aspire S3-951-6646 Ultrabook; the price for this laptop was $700. Predict the overall score for this laptop using the estimated regression equation developed in part (c). Stock Beta. In June of 2016, Yahoo Finance reported the beta value for Coca-Cola was .82 (Yahoo Finance website). Betas for individual stocks are determined by simple linear regression. The dependent variable is the total return for the stock, and the independent variable is the total return for the stock market, such as the return of the S&P 500. The slope of this regression equation is referred to as the stock’s beta. Many financial analysts prefer to measure the risk of a stock by computing the stock’s beta value. The data contained in the DATAfile named CocaCola show the monthly percentage returns for the S&P 500 and the Coca-Cola Company for August 2015 to May 2016. Develop a scatter diagram with the S&P % Return as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the returns of the S&P 500 and those of the Coca-Cola Company? Develop the least squares estimated regression equation. Provide an interpretation for the slope of the estimated equation (that is, the beta). Is your beta estimate close to .82? If not, why might your estimate be different? Distance and Absenteeism. A large city hospital conducted a study to investigate the relationship between the number of unauthorized days that employees are absent per year and the distance (miles) between home and work for the employees. A sample of 10 employees was selected and the following data were collected. Develop a scatter diagram for these data. Does a linear relationship appear reasonable? Explain. Develop the least squares estimated regression equation that relates the distance to work to the number of days absent. Predict the number of days absent for an employee who lives 5 miles from the hospital. Using a global-positioning-system (GPS)-based navigator for your car, you enter a destination and the system will plot a route, give spoken turn-by-turn directions, and show your progress along the route. Today, even budget units include features previously available only on more expensive models. Consumer Reports conducted extensive tests of GPS-based navigators and developed an overall rating based on factors such as ease of use, driver information, display, and battery life. The following data show the price and rating for a sample of 20 GPS units with a 4.3-inch screen that Consumer Reports tested (Consumer Reports website, April 17, 2012). Develop a scatter diagram with price as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Use the least squares method to develop the estimated regression equation. Predict the rating for a GPS system with a 4.3-inch screen that has a price of $200. 15. The data from exercise 1 follow. xi 1 2 3 4 5 yi 3 7 5 11 14 The estimated regression equation for these data is ŷ = .20 + 2.60x. Compute SSE, SST, and SSR using equations (14.8), (14.9), and (14.10). Compute the coefficient of determination r2. Comment on the goodness of fit. Compute the sample correlation coefficient. The data from exercise 2 follow. The estimated regression equation for these data is . Compute SSE, SST, and SSR. Compute the coefficient of determination r2. Comment on the goodness of fit. Compute the sample correlation coefficient. 17EPrice and Quality of Headphones. The following data show the brand, price ($), and the overall score for six stereo headphones that were tested by Consumer Reports (Consumer Reports website). The overall score is based on sound quality and effectiveness of ambient noise reduction. Scores range from 0 (lowest) to 100 (highest). The estimated regression equation for these data is , where x = price ($) and y = overall score. Compute SST, SSR, and SSE. Compute the coefficient of determination r2. Comment on the goodness of fit. What is the value of the sample correlation coefficient? Sales Experience and Sales Performance. In exercise 7 a sales manager collected the following data on x = annual sales and y = years of experience. The estimated regression equation for these data is . Compute SST, SSR, and SSE. Compute the coefficient of determination r2. Comment on the goodness of fit. What is the value of the sample correlation coefficient? Price and Weight of Bicycles. Bicycling, the world’s leading cycling magazine, reviews hundreds of bicycles throughout the year. Their “Road-Race” category contains reviews of bikes used by riders primarily interested in racing. One of the most important factors in selecting a bike for racing is the weight of the bike. The following data show the weight (pounds) and price ($) for 10 racing bikes reviewed by the magazine (Bicycling website). Use the data to develop an estimated regression equation that could be used to estimate the price for a bike given the weight. Compute r2. Did the estimated regression equation provide a good fit? Predict the price for a bike that weighs 15 pounds. Cost Estimation. An important application of regression analysis in accounting is in the estimation of cost. By collecting data on volume and cost and using the least squares method to develop an estimated regression equation relating volume and cost, an accountant can estimate the cost associated with a particular manufacturing volume. Consider the following sample of production volumes and total cost data for a manufacturing operation. Use these data to develop an estimated regression equation that could be used to predict the total cost for a given production volume. What is the variable cost per unit produced? Compute the coefficient of determination. What percentage of the variation in total cost can be explained by production volume? The company’s production schedule shows 500 units must be produced next month. Predict the total cost for this operation. 22. Refer to exercise 9, where the following data were used to investigate the relationship between the number of cars in service (1000s) and the annual revenue ($millions) for six smaller car rental companies (Auto Rental News website, August 7, 2012). Company Cars (1000s) Revenue ($ millions) U-Save Auto Rental System, Inc. 11.5 118 Payless Car Rental System, Inc. 10.0 135 ACE Rent A Car 9.0 100 Rent-A-Wreck of America 5.5 37 Triangle Rent-A-Car 4.2 40 Affordable/Sensible 3.3 32 With x = cars in service (1000s) and y = annual revenue ($ millions), the estimated regression equation is ŷ = −17.005 + 12.966x. For these data SSE = 1043.03. Compute the coefficient of determination r2. Did the estimated regression equation provide a good fit? Explain. What is the value of the sample correlation coefficient? Does it reflect a strong or weak relationship between the number of cars in service and the annual revenue? The data from exercise 1 follow. Compute the mean square error using equation (14.15). Compute the standard error of the estimate using equation (14.16). Compute the estimated standard deviation of b1 using equation (14.18). Use the t test to test the following hypotheses (α = .05): Use the F test to test the hypotheses in part (d) at a .05 level of significance. Present the results in the analysis of variance table format. The data from exercise 2 follow. Compute the mean square error using equation (14.15). Compute the standard error of the estimate using equation (14.16). Compute the estimated standard deviation of b1 using equation (14.18). Use the t test to test the following hypotheses (α = .05): Use the F test to test the hypotheses in part (d) at a .05 level of significance. Present the results in the analysis of variance table format. The data from exercise 3 follow. What is the value of the standard error of the estimate? Test for a significant relationship by using the t test. Use α = .05. Use the F test to test for a significant relationship. Use α = .05. What is your conclusion? 26ETo identify high-paying jobs for people who do not like stress, the following data were collected showing the average annual salary ($1000s) and the stress tolerance for a variety of occupations (Business Insider website, November 8, 2013). The stress tolerance for each job is rated on a scale from 0 to 100, where a lower rating indicates less stress. Develop a scatter diagram for these data with average annual salary as the independent variable. What does the scatter diagram indicate about the relationship between the two variables? Use these data to develop an estimated regression equation that can be used to predict stress tolerance given the average annual salary. At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables? Would you feel comfortable in predicting the stress tolerance for a different occupation given the average annual salary for the occupation? Explain. Does the relationship between average annual salary and stress tolerance for these data seem reasonable to you? Explain. Broker Satisfaction Conclusion. In exercise 8, ratings data on x = the quality of the speed of execution and y = overall satisfaction with electronic trades provided the estimated regression equation . At the .05 level of significance, test whether speed of execution and overall satisfaction are related. Show the ANOVA table. What is your conclusion? 8. Broker Satisfaction. The American Association of Individual Investors (AAII) On-Line Discount Broker Survey polls members on their experiences with discount brokers. As part of the survey, members were asked to rate the quality of the speed of execution with their broker as well as provide an overall satisfaction rating for electronic trades. Possible responses (scores) were no opinion (0), unsatisfied (l), somewhat satisfied (2), satisfied (3), and very satisfied (4). For each broker summary scores were computed by calculating a weighted average of the scores provided by each respondent. A portion of the survey results follow (AAII website). Develop a scatter diagram for these data with the speed of execution as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Develop the least squares estimated regression equation. Provide an interpretation for the slope of the estimated regression equation. Suppose Zecco.com developed new software to increase their speed of execution rating. If the new software is able to increase their speed of execution rating from the current value of 2.5 to the average speed of execution rating for the other 10 brokerage firms that were surveyed, what value would you predict for the overall satisfaction rating? Cost Estimation Conclusion. Refer to exercise 21, where data on production volume and cost were used to develop an estimated regression equation relating production volume and cost for a particular manufacturing operation. Use α = .05 to test whether the production volume is significantly related to the total cost. Show the ANOVA table. What is your conclusion? 21. Cost Estimation. An important application of regression analysis in accounting is in the estimation of cost. By collecting data on volume and cost and using the least squares method to develop an estimated regression equation relating volume and cost, an accountant can estimate the cost associated with a particular manufacturing volume. Consider the following sample of production volumes and total cost data for a manufacturing operation. Use these data to develop an estimated regression equation that could be used to predict the total cost for a given production volume. What is the variable cost per unit produced? Compute the coefficient of determination. What percentage of the variation in total cost can be explained by production volume? The company’s production schedule shows 500 units must be produced next month. Predict the total cost for this operation. Significance of Fleet Size on Rental Car Revenue. Companies in the U.S. car rental market vary greatly in terms of the size of the fleet, the number of locations, and annual revenue. The following data were used to investigate the relationship between the number of cars in service (1000s) and the annual revenue ($ millions) for six smaller car rental companies (Auto Rental News website). With x = cars in service (1000s) and y = annual revenue ($ millions), the estimated regression equation is ŷ= −17.005 + 12.966x. For these data SSE = 1043.03 and SST = 10,568. Do these results indicate a significant relationship between the number of cars in service and the annual revenue? Significance of Racing Bike Weight on Price. In exercise 20, data on x = weight (pounds) and y = price ($) for 10 road-racing bikes provided the estimated regression equation . (Bicycling website). For these data SSE = 7,102,922.54 and SST = 52,120,800. Use the F test to determine whether the weight for a bike and the price are related at the .05 level of significance. 20. Price and Weight of Bicycles. Bicycling, the world’s leading cycling magazine, reviews hundreds of bicycles throughout the year. Their “Road-Race” category contains reviews of bikes used by riders primarily interested in racing. One of the most important factors in selecting a bike for racing is the weight of the bike. The following data show the weight (pounds) and price ($) for 10 racing bikes reviewed by the magazine (Bicycling website). Use the data to develop an estimated regression equation that could be used to estimate the price for a bike given the weight. Compute r2. Did the estimated regression equation provide a good fit? Predict the price for a bike that weighs 15 pounds. 32. The data from exercise 1 follow. xi 1 2 3 4 5 yi 3 7 5 11 14 Use equation (14.23) to estimate the standard deviation of ŷ* when x = 4. Use expression (14.24) to develop a 95% confidence interval for the expected value of y when x = 4. Use equation (14.26) to estimate the standard deviation of an individual value of y when x = 4. Use expression (14.27) to develop a 95% prediction interval for y when x = 4. 33. The data from exercise 2 follow. xi 3 12 6 20 14 yi 55 40 55 10 15 Estimate the standard deviation of ŷ* when x = 8. Develop a 95% confidence interval for the expected value of y when x = 8. Estimate the standard deviation of an individual value of y when x = 8. Develop a 95% prediction interval for y when x = 8. 34E35. The following data are the monthly salaries y and the grade point averages x for students who obtained a bachelor’s degree in business administration. GPA Monthly Salary ($) 2.6 3600 3.4 3900 3.6 4300 3.2 3800 3.5 4200 2.9 3900 The estimated regression equation for these data is ŷ = 2090.5 + 581.1x and MSE = 21,284. Develop a point estimate of the starting salary for a student with a GPA of 3.0. Develop a 95% confidence interval for the mean starting salary for all students with a 3.0 GPA. Develop a 95% prediction interval for Ryan Dailey, a student with a GPA of 3.0. Discuss the differences in your answers to parts (b) and (c). 36. In exercise 7, the data on y = annual sales ($ 1000s) for new customer accounts and x = number of years of experience for a sample of 10 salespersons provided the estimated regression equation ŷ = 80 + 4x. For these data , and s = 4.6098. Develop a 95% confidence interval for the mean annual sales for all salespersons with nine years of experience. The company is considering hiring Tom Smart, a salesperson with nine years of experience. Develop a 95% prediction interval of annual sales for Tom Smart. Discuss the differences in your answers to parts (a) and (b). In exercise 5, the following data on x = the number of defective parts found and y = the line speed (feet per minute) for a production process at Brawdy Plastics provided the estimated regression equation . For these data SSE = 16. Develop a 95% confidence interval for the mean number of defective parts for a line speed of 25 feet per minute. 38E39. In exercise 12, the following data on x = average daily hotel room rate and y = amount spent on entertainment (The Wall street Journal, August 18, 2011) lead to the estimated regression equation ŷ = 17.49 + 1.0334x. For these data SSE = 1541.4. City Room Rate ($) Entertainment ($) Boston 148 161 Denver 96 105 Nashville 91 101 New Orleans 110 142 Phoenix 90 100 San Diego 102 120 San Francisco 136 167 San Jose 90 140 Tampa 82 98 Predict the amount spent on entertainment for a particular city that has a daily room rate of $89. Develop a 95% confidence interval for the mean amount spent on entertainment for all cities that have a daily room rate of $89. The average room rate in Chicago is $128. Develop a 95% prediction interval for the amount spent on entertainment in Chicago. The commercial division of a real estate firm conducted a study to determine the extent of the relationship between annual gross rents ($1000s) and the selling price ($1000s) for apartment buildings. Data were collected on several properties sold, and Excel’s Regression tool was used to develop an estimated regression equation. A portion of the regression output follows. How many apartment buildings were in the sample? Write the estimated regression equation. Use the t test to determine whether the selling price is related to annual gross rents. Use α = .05. Use the F test to determine whether the selling price is related to annual gross rents. Use α = .05. Predict the selling price of an apartment building with gross annual rents of $50,000. Following is a portion of the regression output for an application relating maintenance expense (dollars per month) to usage (hours per week) for a particular brand of computer terminal. Write the estimated regression equation. Use a t test to determine whether monthly maintenance expense is related to usage at the .05 level of significance. Did the estimated regression equation provide a good fit? Explain. 43EAuto Racing Helmet. Automobile racing, high-performance driving schools, and driver education programs run by automobile clubs continue to grow in popularity. All these activities require the participant to wear a helmet that is certified by the Snell Memorial Foundation, a not-for-profit organization dedicated to research, education, testing, and development of helmet safety standards. Snell “SA” (Sports Application)-rated professional helmets are designed for auto racing and provide extreme impact resistance and high fire protection. One of the key factors in selecting a helmet is weight, since lower weight helmets tend to place less stress on the neck. Consider the following data showing the weight and price for 18 SA helmets. Develop a scatter diagram with weight as the independent variable. Does there appear to be any relationship between these two variables? Develop the estimated regression equation that could be used to predict the price given the weight. Test for the significance of the relationship at the .05 level of significance. Did the estimated regression equation provide a good fit? Explain. 45E46E47E48E49EConsider the following data for two variables, x and y. Compute the standardized residuals for these data. Do the data include any outliers? Explain. Plot the standardized residuals against Does this plot reveal any outliers? Develop a scatter diagram for these data. Does the scatter diagram indicate any outliers in the data? In general, what implications does this finding have for simple linear regression? 51EPredicting Charity Expenses. Charity Navigator is America’s leading independent charity evaluator. The following data show the total expenses ($), the percentage of the total budget spent on administrative expenses, the percentage spent on fundraising, and the percentage spent on program expenses for 10 supersized charities (Charity Navigator website). Administrative expenses include overhead, administrative staff and associated costs, and organizational meetings. Fundraising expenses are what a charity spends to raise money, and program expenses are what the charity spends on the programs and services it exists to deliver. The sum of the three percentages does not add to 100% because of rounding. Source: Charity Navigator website, (https://www.charilynavigator.org/) Develop a scatter diagram with fundraising expenses (%) on the horizontal axis and program expenses (%) on the vertical axis. Looking at the data, do there appear to be any outliers and/or influential observations? Develop an estimated regression equation that could be used to predict program expenses (%) given fundraising expenses (%). Does the value for the slope of the estimated regression equation make sense in the context of this problem situation? Use residual analysis to determine whether any outliers and/or influential observations are present. Briefly summarize your findings and conclusions. Many countries, especially those in Europe, have significant gold holdings. But many of these countries also have massive debts. The following data show the total value of gold holdings in billions of U.S. dollars and the debt as a percentage of the gross domestic product for nine countries (WordPress and Trading Economics websites, February 24, 2012). Develop a scatter diagram for the total value of a country’s gold holdings ($ billions) as the independent variable. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Does there appear to be any outliers and/or influential observations? Explain. Using the entire data set, develop the estimated regression equation that can be used to predict the debt of a country given the total value of its gold holdings. Suppose that after looking at the scatter diagram in part (a) that you were able to visually identify what appears to be an influential observation. Drop this observation from the data set and fit an estimated regression equation to the remaining data. Compare the estimated slope for the new estimated regression equation to the estimated slope obtained in part (c). Does this approach confirm the conclusion you reached in part (d)? Explain. Valuation of a Major League Baseball Team. The following data show the annual revenue ($ millions) and the estimated team value ($ millions) for 30 Major League Baseball teams (Forbes website). Develop a scatter diagram with Revenue on the horizontal axis and Value on the vertical axis. Looking at the scatter diagram, does it appear that there are any outliers and/or influential observations in the data? Develop the estimated regression equation that can be used to predict team value given the annual revenue. Use residual analysis to determine whether any outliers and/or influential observations are present. Briefly summarize your findings and conclusions. The Dow Jones Industrial Average (DJIA) and the Standard & Poor’s 500 (S&P 500) indexes are used as measures of overall movement in the stock market. The DJIA is based on the price movements of 30 large companies; the S&P 500 is an index composed of 500 stocks. Some say the S&P 500 is a better measure of stock market performance because it is broader based. The closing price for the DJIA and the S&P 500 for 15 weeks, beginning with January 6, 2012, follow (Barron’s website, April 17, 2012). Develop a scatter diagram with DJIA as the independent variable. Develop the estimated regression equation. Test for a significant relationship. Use α = .05. Did the estimated regression equation provide a good fit? Explain. Suppose that the closing price for the DJIA is 13,500. Predict the closing price for the S&P 500. Should we be concerned that the DJIA value of 13,500 used to predict the S&P 500 value in part (e) is beyond the range of the data used to develop the estimated regression equation? Home Sire and Price. Is the number of square feet of living space a good predictor of a house’s selling price? The following data collected in April, 2015, show the square footage and selling price for fifteen houses in Winston Salem, North Carolina (Zillow.com) Develop a scatter diagram with square feet of living space as the independent variable and selling price as the dependent variable. What does the scatter diagram indicate about the relationship between the size of a house and the selling price? Develop the estimated regression equation that could be used to predict the selling price given the number of square feet of living space. At the .05 level, is there a significant relationship between the two variables? Use the estimated regression equation to predict the selling price of a 2000 square foot house in Winston Salem. North Carolina. Do you believe the estimated regression equation developed in part (b) will provide a good prediction of selling price of a particular house in Winston Salem, North Carolina? Explain. Would you be comfortable using the estimated regression equation developed in part (b) to predict the selling price of a particular house in Seattle, Washington? Why or why not? Online Education. One of the biggest changes in higher education in recent years has been the growth of online universities. The Online Education Database is an independent organization whose mission is to build a comprehensive list of the top accredited online colleges. The following table shows the retention rate (%) and the graduation irate (%) for 29 online colleges. Develop a scatter diagram with retention rate as the independent variable. What does the scatter diagram indicate about the relationship between the two variables? Develop the estimated regression equation. Test for a significant relationship. Use α = .05. Did the estimated regression equation provide a good fit? Machine Maintenance. Jensen Tire & Auto is in the process of deciding whether to purchase a maintenance contract for its new computer wheel alignment and balancing machine. Managers feel that maintenance expense should be related to usage, and they collected the following information on weekly usage (hours) and annual maintenance expense (in hundreds of dollars). Develop the estimated regression equation that relates annual maintenance expense to weekly usage. Test the significance of the relationship in part (a) at a .05 level of significance. Jensen expects to use the new machine 30 hours per week. Develop a 95% prediction interval for the company’s annual maintenance expense. If the maintenance contract costs $3000 per year, would you recommend purchasing it? Why or why not? Bus Maintenance. The regional transit authority for a major metropolitan area wants to determine whether there is any relationship between the age of a bus and the annual maintenance cost. A sample of 10 buses resulted in the following data. Develop the least squares estimated regression equation. Test to see whether the two variables are significantly related with α = .05. Did the least squares line provide a good fit to the observed data? Explain. Develop a 95% prediction interval for the maintenance cost for a specific bus that 4 years old. Studying and Grades. A marketing professor at Givens College is interested in the relationship between hours spent studying and total points earned in a course. Data collected on 10 students who took the course last quarter follow. Develop an estimated regression equation showing how total points earned is related to hours spent studying. Test the significance of the model with α = .05. Predict the total points earned by Mark Sweeney. He spent 95 hours studying. Develop a 95% prediction interval for the total points earned by Mark Sweeney. Used Car Mileage and Price. The Toyota Camry is one of the best-selling cars in North America. The cost of a previously owned Camry depends upon many factors, including the model year, mileage, and condition. To investigate the relationship between the car’s mileage and the sales price for a 2007 model year Camry, the following data show the mileage and sale price for 19 sales (PriceHub website). Develop a scatter diagram with the car mileage on the horizontal axis and the price on the vertical axis. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? Develop the estimated regression equation that could be used to predict the price ($ 1000s) given the miles (1000s). Test for a significant relationship at the .05 level of significance. Did the estimated regression equation provide a good fit? Explain. Provide an interpretation for the slope of the estimated regression equation. Suppose that you are considering purchasing a previously owned 2007 Camry that has been driven 60.000 miles. Using the estimated regression equation developed in part (c), predict the price for this car. Is this the price you would offer the seller? One measure of the risk or volatility of an individual stock is the standard deviation of the total return (capital appreciation plus dividends) over several periods of time. Although the standard deviation is easy to compute, it does not take into account the extent to which the price of a given stock varies as a function of a standard market index, such as the S&P 500. As a result, many financial analysts prefer to use another measure of risk referred to as beta. Betas for individual stocks are determined by simple linear regression. The dependent variable is the total return for the stock and the independent variable is the total return for the stock market.* For this case problem we will use the S&P 500 index as the measure of the total return for the stock market, and an estimated regression equation will be developed using monthly data. The beta for the stock is the slope of the estimated regression equation (b1). The data contained in the file named Beta provides the total return (capital appreciation plus dividends) over 36 months for eight widely traded common stocks and the S&P 500. The value of beta for the stock market will always be 1; thus, stocks that tend to rise and fall with the stock market will also have a beta close to 1. Betas greater than 1 indicate that the stock is more volatile than the market, and betas less than 1 indicate that the stock is less volatile than the market. For instance, if a stock has a beta of 1.4, it is 40% more volatile than the market, and if a stock has a beta of .4, it is 60% less volatile than the market. You have been assigned to analyze the risk characteristics of these stocks. Prepare a report that includes but is not limited to the following items. Compute descriptive statistics for each stock and the S&P 500. Comment on your results. Which stocks are the most volatile? Compute the value of beta for each stock. Which of these stocks would you expect to perform best in an up market? Which would you expect to hold their value best in a down market? Comment on how much of the return for the individual stocks is explained by the market. As part of a study on transportation safety, the U.S. Department of Transportation collected data on the number of fatal accidents per 1000 licenses and the percentage of licensed drivers under the age of 21 in a sample of 42 cities. Data collected over a one-year period follow. These data are contained in the file named Safety. Develop numerical and graphical summaries of the data. Use regression analysis to investigate the relationship between the number of fatal accidents and the percentage of drivers under the age of 21. Discuss your findings. What conclusion and recommendations can you derive from your analysis? Consumer Reports tested 166 different point-and-shoot digital cameras. Based upon factors such as the number of megapixels, weight (oz.), image quality, and ease of use, they developed an overall score for each camera tested. The overall score ranges from 0 to 100, with higher scores indicating better overall test results. Selecting a camera with many options can be a difficult process, and price is certainly a key issue for most consumers. By spending more, will a consumer really get a superior camera? And, do cameras that have more megapixels, a factor often considered to be a good measure of picture quality, cost more than cameras with fewer megapixels? Table 14.13 shows the brand, average retail price ($), number of megapixels, weight (oz.), and the overall score for 13 Canon and 15 Nikon subcompact cameras tested by Consumer Reports (Consumer Reports website). Develop numerical summaries of the data. Using overall score as the dependent variable, develop three scatter diagrams, one using price as the independent variable, one using the number of megapixels as the independent variable, and one using weight as the independent variable. Which of the three independent variables appears to be the best predictor of overall score? Using simple linear regression, develop an estimated regression equation that could be used to predict the overall score given the price of the camera. For this estimated regression equation, perform an analysis of the residuals and discuss your findings and conclusions. Analyze the data using only the observations for the Canon cameras. Discuss the appropriateness of using simple linear regression and make any recommendations regarding the prediction of overall score using just the price of the camera. When trying to decide what car to buy, real value is not necessarily determined by how much you spend on the initial purchase. Instead, cars that are reliable and don’t cost much to own often represent the best values. But, no matter how reliable or inexpensive a car may cost to own, it must also perform well. To measure value, Consumer Reports developed a statistic referred to as a value score. The value score is based upon five-year owner costs, overall road-test scores, and predicted reliability ratings. Five-year owner costs are based on the expenses incurred in the first five years of ownership, including depreciation, fuel, maintenance and repairs, and so on. Using a national average of 12,000 miles per year, an average cost per mile driven is used as the measure of five-year owner costs. Road-test scores are the results of more than 50 tests and evaluations and are based upon a 100-point scale, with higher scores indicating better performance, comfort, convenience, and fuel economy. The highest road-test score obtained in the tests conducted by Consumer Reports was a 99 for a Lexus LS 460L. Predicted-reliability ratings (1 = Poor, 2 = Fair, 3 = Good, 4 = Very Good, and 5 = Excellent) are based on data from Consumer Reports’ Annual Auto Survey. A car with a value score of 1.0 is considered to be “average-value.” A car with a value score of 2.0 is considered to be twice as good a value as a car with a value score of 1.0; a car with a value score of .5 is considered half as good as average; and so on. The data for 20 family sedans, including the price ($) of each car tested, follow. Develop numerical summaries of the data. Use regression analysis to develop an estimated regression equation that could be used to predict the value score given the price of the car. Use regression analysis to develop an estimated regression equation that could be used to predict the value score given the five-year owner costs (cost/mile). Use regression analysis to develop an estimated regression equation that could be used to predict the value score given the road-test score. Use regression analysis to develop an estimated regression equation that could be used to predict the value score given the predicted-reliability. What conclusions can you derive from your analysis? Buckeye Creek Amusement Park is open from the beginning of May to the end of October. Buckeye Creek relies heavily on the sale of season passes. The sale of season passes brings in significant revenue prior to the park opening each season, and season pass holders contribute a substantial portion of the food, beverage, and novelty sales in the park. Greg Ross, director of marketing at Buckeye Creek, has been asked to develop a targeted marketing campaign to increase season pass sales. Greg has data for last season that show the number of season pass holders for each zip code within 50 miles of Buckeye Creek. He has also obtained the total population of each zip code from the U.S. Census bureau website. Greg thinks it may be possible to use regression analysis to predict the number of season pass holders in a zip code given the total population of a zip code. If this is possible, he could then conduct a direct mail campaign that would target zip codes that have fewer than the expected number of season pass holders. Compute descriptive statistics and construct a scatter diagram for the data. Discuss your findings. Using simple linear regression, develop an estimated regression equation that could be used to predict the number of season pass holders in a zip code given the total population of the zip code. Test for a significant relationship at the .05 level of significance. Did the estimated regression equation provide a good fit? Use residual analysis to determine whether the assumed regression model is appropriate. Discuss if/how the estimated regression equation should be used to guide the marketing campaign. What other data might be useful to predict the number of season pass holders in a zip code? 1. The estimated regression equation for a model involving two independent variables and 10 observations follows. ŷ = 29.1270 + .5906x1 + .4980x2 Interpret b1 and b2 in this estimated regression equation. Predict y when x1 = 180 and x2 = 310. Consider the following data for a dependent variable y and two independent variables, x1 and x2. Develop an estimated regression equation relating y to x1. Predict y if x1 = 45. Develop an estimated regression equation relating y to x2. Predict y if x2 = 15. Develop an estimated regression equation relating y to x1 and x2. Predict y if x1 = 45 and x2 = 15. 3E4. A shoe store developed the following estimated regression equation relating sales to inventory investment and advertising expenditures. ŷ = 25 + 10x1 + 8x2 where x1 = inventory investment ($1000s) x2 = advertising expenditures ($1000s) y = sales ($1000s) Predict the sales resulting from a $15,000 investment in inventory and an advertising budget of $10,000. Interpret b1 and b2 in this estimated regression equation. The owner of Showtime Movie Theaters, Inc. would like to predict weekly gross revenue as a function of advertising expenditures. Historical data for a sample of eight weeks follow. Develop an estimated regression equation with the amount of television advertising as the independent variable. Develop an estimated regression equation with both television advertising and newspaper advertising as the independent variables. Is the estimated regression equation coefficient for television advertising expenditures the same in part (a) and in part (b)? Interpret the coefficient in each case. Predict weekly gross revenue for a week when $3500 is spent on television advertising and $1800 is spent on newspaper advertising? NFL Winning Percentage. The National Football League (NFL) records a variety of performance data for individuals and teams. To investigate the importance of passing on the percentage of games won by a team, the following data show the conference (Conf), average number of passing yards per attempt (Yds/Att), the number of interceptions thrown per attempt (Int/Att), and the percentage of games won (Win%) for a random sample of 16 NFL teams for one full season. Develop the estimated regression equation that could be used to predict the percentage of games won given the average number of passing yards per attempt. Develop the estimated regression equation that could be used to predict the percentage of games won given the number of interceptions thrown per attempt. Develop the estimated regression equation that could be used to predict the percentage of games won given the average number of passing yards per attempt and the number of interceptions thrown per attempt. The average number of passing yards per attempt for the Kansas City Chiefs was 6.2 and the number of interceptions thrown per attempt was .036. Use the estimated regression equation developed in part (c) to predict the percentage of games won by the Kansas City Chiefs. (Note: For this season the Kansas City Chiefs’ record was 7 wins and 9 losses.) Compare your prediction to the actual percentage of games won by the Kansas City Chiefs. 7EScoring Cruise Ships. The Condé Nast Traveler Gold List provides ratings for the top 20 small cruise ships. The data shown below are the scores each ship received based upon the results from Condé Nast Traveler’s annual Readers’ Choice Survey. Each score represents the percentage of respondents who rated a ship as excellent or very good on several criteria, including Shore Excursions and Food/Dining. An overall score was also reported and used to rank the ships. The highest ranked ship, the Sea-bourn Odyssey, has an overall score of 94.4, the highest component of which is 97.8 for Food/Dining. Source: Condé Nast Traveler, (https://www.cntraveler.com/galleries/2014-10-20/top-cruise-lines-readers-choice-awards-2014) Determine an estimated regression equation that can be used to predict the overall score given the score for Shore Excursions. Consider the addition of the independent variable Food/Dining. Develop the estimated regression equation that can be used to predict the overall score given the scores for Shore Excursions and Food/Dining. Predict the overall score for a cruise ship with a Shore Excursions score of 80 and a Food/Dining Score of 90. The Professional Golfers Association (PGA) maintains data on performance and earnings for members of the PGA Tour. For the 2012 season Bubba Watson led all players in total driving distance, with an average of 309.2 yards per drive. Some of the factors thought to influence driving distance are club head speed, ball speed, and launch angle. For the 2012 season Bubba Watson had an average club head speed of 124.69 miles per hour, an average ball speed of 184.98 miles per hour, and an average launch angle of 8.79 degrees. The DATAfile named PGADrivingDist contains data on total driving distance and the factors related to driving distance for 190 members of the PGA Tour (PGA Tour website, November 1, 2012). Descriptions for the variables in the data set follow. Club Head Speed: Speed at which the club impacts the ball (mph). Ball Speed: Peak speed of the golf ball at launch (mph). Launch Angle: Vertical launch angle of the ball immediately after leaving the club (degrees). Total Distance: The average number of yards per drive. Develop an estimated regression equation that can be used to predict the average number of yards per drive given the club head speed. Develop an estimated regression equation that can be used to predict the average number of yards per drive given the ball speed. A recommendation has been made to develop an estimated regression equation that uses both club head speed and ball speed to predict the average number of yards per drive. Do you agree with this? Explain. Develop an estimated regression equation that can be used to predict the average number of yards per drive given the ball speed and the launch angle. Suppose a new member of the PGA Tour for 2013 has a ball speed of 170 miles per hour and a launch angle of 11 degrees. Use the estimated regression equation in part (d) to predict the average number of yards per drive for this player. Baseball Pitcher Performance. Major League Baseball (MLB) consists of teams that play in the American League and the National League. MLB collects a wide variety of team and player statistics. Some of the statistics often used to evaluate pitching performance are as follows: ERA: The average number of earned runs given up by the pitcher per nine innings. An earned run is any run that the opponent scores off a particular pitcher except for runs scored as a result of errors. SO/IP: The average number of strikeouts per inning pitched. HR/IP: The average number of home runs per inning pitched. R/IP: The number of runs given up per inning pitched. The following data show values for these statistics for a random sample of 20 pitchers from the American League for a full season. Develop an estimated regression equation that can be used to predict the average number of runs given up per inning given the average number of strikeouts per inning pitched. Develop an estimated regression equation that can be used to predict the average number of runs given up per inning given the average number of home runs per inning pitched. Develop an estimated regression equation that can be used to predict the average number of runs given up per inning given the average number of strikeouts per inning pitched and the average number of home runs per inning pitched. A. J. Burnett, a pitcher for the New York Yankees, had an average number of strikeouts per inning pitched of .91 and an average number of home runs per inning of .16. Use the estimated regression equation developed in part (c) to predict the average number of runs given up per inning for A. J. Burnett. (Note: The actual value for R/IP was .6.) Suppose a suggestion was made to also use the earned run average as another independent variable in part (c). What do you think of this suggestion? 11. In exercise 1, the following estimated regression equation based on 10 observations was presented. ŷ = 29.1270 + .5906x1 + .4980x2 The values of SST and SSR are 6724.125 and 6216.375, respectively. a Find SSE. Compute R2. Compute . Comment on the goodness of fit. 12. In exercise 2, 10 observations were provided for a dependent variable 5 and two independent variables x1 and x2; for these data SST = 15,182.9, and SSR = 14,052.2. Compute R2. Compute . Does the estimated regression equation explain a large amount of the variability in the data? Explain. 13E14E15. In exercise 5, the owner of Showtime Movie Theaters, Inc., used multiple regression analysis to predict gross revenue (y) as a function of television advertising (x1) and newspaper advertising (x2). The estimated regression equation was ŷ = 83.2 + 2.29x1 + 1.30x2 The computer solution provided SST = 25.5 and SSR = 23.435. Compute and interpret R2 and . When television advertising was the only independent variable, R2 = .653 and . Do you prefer the multiple regression results? Explain. 16EIn part (d) of exercise 9, data contained in the DATAfile named PGADrivingDist (PGA Tour website, November 1, 2012) was used to develop an estimated regression equation to predict the average number of yards per drive given the ball speed and the launch angle. Does the estimated regression equation provide a good fit to the data? Explain. In part (b) of exercise 9, an estimated regression equation was developed using only ball speed to predict the average number of yards per drive. Compare the fit obtained using just ball speed to the fit obtained using ball speed and the launch angle. 18EIn exercise 1, the following estimated regression equation based on 10 observations was presented. Here SST = 6724.125, SSR = 6216.375, , and . Compute MSR and MSE. Compute F and perform the appropriate F test. Use α = .05. Perform a t test for the significance of β1. Use α = .05. Perform a t test for the significance of β2. Use α = .05. The estimated regression equation for a model involving two independent variables and 10 observations follows. Interpret b1 and b2 in this estimated regression equation. Predict y when x1 = 180 and x2 = 310. 20E21E22ETesting Significance in Theater Revenue. Refer to exercise 5. Use α = .01 to test the hypotheses for the model y = β0 + β1x1 + β2x2 + ϵ, where Use α = .05 to test the significance of β1. Should x1 be dropped from the model? Use α = .05 to test the significance of β2. Should x2 be dropped from the model? 5. Theater Revenue. The owner of Showtime Movie Theaters, Inc., would like to predict weekly gross revenue as a function of advertising expenditures. Historical data for a sample of eight weeks follow. Develop an estimated regression equation with the amount of television advertising as the independent variable. Develop an estimated regression equation with both television advertising and newspaper advertising as the independent variables. Is the estimated regression equation coefficient for television advertising expenditures the same in part (a) and in part (b)? Interpret the coefficient in each case. Predict weekly gross revenue for a week when $3500 is spent on television advertising and $2300 is spent on newspaper advertising. Testing Significance in Predicting NFL Wins. The National Football League (NFL) records a variety of performance data for individuals and teams. A portion of the data showing the average number of passing yards obtained per game on offense (OffPassYds/G), the average number of yards given up per game on defense (DefYds/G), and the percentage of games won (Win%), for one full season follows. Develop an estimated regression equation that can be used to predict the percentage of games won given the average number of passing yards obtained per game on offense and the average number of yards given up per game on defense. Use the F test to determine the overall significance of the relationship. What is your conclusion at the .05 level of significance? Use the t test to determine the significance of each independent variable. What is your conclusion at the .05 level of significance? The Condé Nast Traveler Gold List provides ratings for the top 20 small cruise ships. The following data shown are the scores each ship received based upon the results from Condé Nast Traveler’s annual Readers’ Choice Survey. Each score represents the percentage of respondents who rated a ship as excellent or very good on several criteria, including Itineraries/Schedule, Shore Excursions, and Food/Dining. An overall score is also reported and used to rank the ships. The highest ranked ship, the Seabourn Odyssey, has an overall score of 94.4, the highest component of which is 97.8 for Food/Dining. Determine the estimated regression equation that can be used to predict the overall score given the scores for Itineraries/Schedule, Shore Excursions, and Food/Dining. Use the F test to determine the overall significance of the relationship. What is your conclusion at the .05 level of significance? Use the t test to determine the significance of each independent variable. What is your conclusion at the .05 level of significance? Remove any independent variable that is not significant from the estimated regression equation. What is your recommended estimated regression equation? 26E27E32. Consider a regression study involving a dependent variable y, a quantitative independent variable x1, and a categorical independent variable with two levels (level 1 and level 2). Write a multiple regression equation relating x1 and the categorical variable to y. What is the expected value of y corresponding to level 1 of the categorical variable? What is the expected value of y corresponding to level 2 of the categorical variable? Interpret the parameters in your regression equation. 33E34. Management proposed the following regression model to predict sales at a fast-food outlet. y = β0 + β1x1 + β2x2 + β3x3 + ε where x1 = number of competitors within one mile x2 = population within one mile (1000s) y = sales ($1000s) The following estimated regression equation was developed after 20 outlets were surveyed. ŷ = 10.1 − 4.2x1 + 6.8x2 + 15.3x3 What is the expected amount of sales attributable to the drive-up window? Predict sales for a store with two competitors, a population of 8000 within one mile, and no drive-up window. Predict sales for a store with one competitor, a population of 3000 within one mile, and a drive-up window. Repair Time. Refer to the Johnson Filtration problem introduced in this section. Suppose that in addition to information on the number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers obtained a list showing which repairperson performed the service. The revised data follow. Ignore for now the months since the last maintenance service (x1) and the repair-person who performed the service. Develop the estimated simple linear regression equation to predict the repair time (y) given the type of repair (x2). Recall that x2 = 0 if the type of repair is mechanical and 1 if the type of repair is electrical. Does the equation that you developed in part (a) provide a good fit for the observed data? Explain. Ignore for now the months since the last maintenance service and the type of repair associated with the machine. Develop the estimated simple linear regression equation to predict the repair time given the repairperson who performed the service. Let x3 = 0 if Bob Jones performed the service and x3 = 1 if Dave Newton performed the service. Does the equation that you developed in part (c) provide a good fit for the observed data? Explain. 36E37E40EExercise 5 gave the following data on weekly gross revenue ($1000s), television advertising expenditures ($1000s), and newspaper advertising expenditures ($1000s) for Showtime Movie Theaters. Find an estimated regression equation relating weekly gross revenue to television advertising expenditures and newspaper advertising expenditures. Plot the standardized residuals against . Does the residual plot support the assumptions about ϵ? Explain. Check for any outliers in these data. What are your conclusions? The following table reports the price, horsepower, and -mile speed for 16 popular sports and GT cars. Find the estimated regression equation, which uses price and horsepower to predict -mile speed. Plot the standardized residuals against . Does the residual plot support the assumption about ϵ? Explain. Check for any outliers. What are your conclusions? 49. The admissions officer for Clearwater College developed the following estimated regression equation relating the final college GPA to the student’s SAT mathematics score and highschool GPA. ŷ = −1.41 + .0235x1 + .00486x2 where x1 = high-school grade point average x2 = SAT mathematics score y = final college grade point average Interpret the coefficients in this estimated regression equation. Predict the final college GPA for a student who has a high-school average of 84 and a score of 540 on the SAT mathematics test. The personnel director for Electronics Associates developed the following estimated regression equation relating an employee’s score on a job satisfaction test to his or her length of service and wage rate. where x1 = length of service (years) x2 = wage rate (dollars) y = job satisfaction test score (higher scores indicate greater job satisfaction) Interpret the coefficients in this estimated regression equation. Predict the job satisfaction test score for an employee who has four years of service and makes $6.50 per hour. 46SERecall that in exercise 44, the admissions officer for Clearwater College developed the following estimated regression equation relating final college GPA to the student’s SAT mathematics score and high-school GPA. where x1 = high-school grade point average x2 = SAT mathematics score y = final college grade point average A portion of the Excel Regression tool output follows. Complete the missing entries in this output. Using α = .05, test for overall significance. Did the estimated regression equation provide a good fit to the data? Explain. Use the t test and α = .05 to test H0: β1 = 0 and H0: β2 = 0. 44. The admissions officer for Clearwater College developed the following estimated regression equation relating the final college GPA to the student’s SAT mathematics score and high-school GPA. where x1 = high-school grade point average x2 = SAT mathematics score y = final college grade point average Interpret the coefficients in this estimated regression equation. Predict the final college GPA for a student who has a high-school average of 84 and a score of 540 on the SAT mathematics test. Recall that in exercise 45 the personnel director for Electronics Associates developed the following estimated regression equation relating an employee’s score on a job satisfaction test to length of service and wage rate. where x1 = length of service (years) x2 = wage rate (dollars) y = job satisfaction test score (higher scores indicate greater job satisfaction) A portion of the Excel Regression tool output follows. Complete the missing entries in this output. Using α = .05, test for overall significance. Did the estimated regression equation provide a good fit to the data? Explain. Use the t test and α = .05 to test H0: β1 = 0 and H0: β2 = 0. 45. The personnel director for Electronics Associates developed the following estimated regression equation relating an employee’s score on a job satisfaction test to his or her length of service and wage rate. where x1 = length of service (years) x2 = wage rate (dollars) y = job satisfaction test score (higher scores indicate greater job satisfaction) Interpret the coefficients in this estimated regression equation. Predict the job satisfaction test score for an employee who has four years of service and makes $6.50 per hour. Fortune magazine publishes an annual list of the 100 best companies to work for. The data in the DATAfile named FortuneBest shows a portion of the data for a random sample of 30 of the companies that made the top 100 list for 2012 (Fortune, February 6, 2012). The column labeled Rank shows the rank of the company in the Fortune 100 list; the column labeled Size indicates whether the company is a small, midsize, or large company; the column labeled Salaried ($1000s) shows the average annual salary for salaried employees rounded to the nearest $1000; and the column labeled Hourly ($1000s) shows the average annual salary for hourly employees rounded to the nearest $1000. Fortune defines large companies as having more than 10,000 employees, midsize companies as having between 2500 and 10,000 employees, and small companies as having fewer than 2500 employees. Use these data to develop an estimated regression equation that could be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees. Use α = .05 to test for overall significance. To incorporate the effect of size, a categorical variable with three levels, we used two dummy variables: Size-Midsize and Size-Small. The value of Size-Midsize = 1 if the company is a midsize company and 0 otherwise. And the value of Size-Small = 1 if the company is a small company and 0 otherwise. Develop an estimated regression equation that could be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees and the size of the company. For the estimated regression equation developed in part (c), use the t test to determine the significance of the independent variables. Use α = .05. Based upon your findings in part (d), develop an estimated regression equation that can be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees and the size of the company. The Department of Energy and the U.S. Environmental Protection Agency provides fuel efficiency data for cars and trucks. The DATAfile named FuelEcon provides a portion of the data for 309 cars. The column labeled Manufacturer shows the name of the company that manufactured the car; the column labeled Displacement shows the engine’s displacement in liters; the column labeled Fuel shows the required or recommended type of fuel (regular or premium gasoline); the column labeled Drive identifies the type of drive (F for front wheel, R for rear wheel, and A for all wheel); and the column labeled Hwy MPG shows the fuel efficiency rating for highway driving in terms of miles per gallon. Develop an estimated regression equation that can be used to predict the fuel efficiency for highway driving given the engine’s displacement. Test for significance using α = .05. Consider the addition of the dummy variable FuelPremium, where the value of FuelPremium is 1 if the required or recommended type of fuel is premium gasoline and 0 if the type of fuel is regular gasoline. Develop the estimated regression equation that can be used to predict the fuel efficiency for highway driving given the engine’s displacement and the dummy variable FuelPremium. Use α = .05 to determine whether the dummy variable added in part (b) is significant. Consider the addition of the dummy variables FrontWheel and RearWheel. The value of FrontWheel is 1 if the car has front wheel drive and 0 otherwise; the value of RearWheel is 1 if the car has rear wheel drive and 0 otherwise. Thus, for a car that has all-wheel drive, the value of FrontWheel and the value of RearWheel is 0. Develop the estimated regression equation that can be used to predict the fuel efficiency for highway driving given the engine’s displacement, the dummy variable FuelPremium, and the dummy variables FrontWheel and RearWheel. For the estimated regression equation developed in part (d), test for overall significance and individual significance using α = .05. The Tire Rack, an online distributor of tires and wheels, conducts extensive testing to provide customers with products that are right for their vehicle, driving style, and driving conditions. In addition, The Tire Rack maintains an independent consumer survey to help drivers help each other by sharing their long-term tire experiences (The Tire Rack website, August 1, 2016). The following data show survey ratings (1 to 10 scale with 10 the highest rating) for 18 high-performance all-season tires. The variable Tread Wear rates quickness of wear based on the driver’s expectations, the variable Dry Traction rates the grip of a tire on a dry road, the variable Steering rates the tire’s steering responsiveness, and the variable Buy Again rates the driver’s desire to purchase the same tire again. Develop an estimated simple linear regression equation that can be used to predict the Buy Again rating given the Tread Wear rating. At the .01 level of significance, test for a significant relationship. Does this estimated regression equation provide a good fit to the data? Explain. Develop an estimated multiple regression equation that can be used to predict the Buy Again rating given the Tread Wear rating and the Dry Traction rating. Is the addition of the Dry Traction independent variable significant at α = .01? Explain. Develop an estimated multiple regression equation that can be used to predict the Buy Again rating given the Tread Wear rating, the Dry Traction rating, and the Steering rating. Is the addition of the Steering independent variable significant at α = .01? Explain. The National Basketball Association (NBA) records a variety of statistics for each team. Five of these statistics are the percentage of games won (Win%), the percentage of field goals made (FG%), the percentage of three-point shots made (3P%), the percentage of free throws made (FT%), the average number of offensive rebounds per game (RBOff), and the average number of defensive rebounds per game (RBDef). The data contained in the DATAfile named NBAStats show the values of these statistics for the 30 teams in the NBA for one full season. A portion of the data follows. Develop an estimated regression equation that can be used to predict the percentage of games won given the percentage of field goals made. At the .05 level of significance, test for a significant relationship. Provide an interpretation for the slope of the estimated regression equation developed in part (a). Develop an estimated regression equation that can be used to predict the percentage of games won given the percentage of field goals made, the percentage of three-point shots made, the percentage of free throws made, the average number of offensive rebounds per game, and the average number of defensive rebounds per game (RBDef). For the estimated regression equation developed in part (c), remove any independent variables that are not significant at the .05 level of significance and develop a new estimated regression equation using the remaining independent variables. Assuming the estimated regression equation developed in part (d) can be used for the 2012–2013 season, predict the percentage of games won for a team with the following values for the four independent variables: FG% = 45, 3P% = 35, RBOff = 12, and RBDef = 30. Consumer Research, Inc., is an independent agency that conducts research on consumer attitudes and behaviors for a variety of firms. In one study, a client asked for an investigation of consumer characteristics that can be used to predict the amount charged by credit card users. Data were collected on annual income, household size, and annual credit card charges for a sample of 50 consumers. The following data are contained in the file Consumer. Source: Consumer Research, Inc. (https://www.bbb.org/us/ny/rochester/profile/secret-shopper/consumer-research -inc-0041-45625697) Managerial Report Use methods of descriptive statistics to summarize the data. Comment on the findings. Develop estimated regression equations, first using annual income as the independent variable and then using household size as the independent variable. Which variable is the better predictor of annual credit card charges? Discuss your findings. Develop an estimated regression equation with annual income and household size as the independent variables. Discuss your findings. What is the predicted annual credit card charge for a three-person household with an annual income of $40,000? Discuss the need for other independent variables that could be added to the model. What additional variables might be helpful? Matt Kenseth won the 2012 Daytona 500, the most important race of the NASCAR season. His win was no surprise because for the 2011 season he finished fourth in the point standings with 2330 points, behind Tony Stewart (2403 points), Carl Edwards (2403 points), and Kevin Harvick (2345 points). In 2011 he earned $6,183,580 by winning three Poles (fastest driver in qualifying), winning three races, finishing in the top five 12 times, and finishing in the top ten 20 times. NASCAR’s point system in 2011 allocated 43 points to the driver who finished first, 42 points to the driver who finished second, and so on down to 1 point for the driver who finished in the 43rd position. In addition any driver who led a lap received 1 bonus point, the driver who led the most laps received an additional bonus point, and the race winner was awarded 3 bonus points. But, the maximum number of points a driver could earn in any race was 48. Table 15.13 shows data for the 2011 season for the top 35 drivers (NASCAR website). Managerial Report Suppose you wanted to predict Winnings ($) using only the number of poles won (Poles), the number of wins (Wins), the number of top five finishes (Top 5), or the number of top ten finishes (Top 10). Which of these four variables provides the best single predictor of winnings? Develop an estimated regression equation that can be used to predict Winnings ($) given the number of poles won (Poles), the number of wins (Wins), the number of top five finishes (Top 5), and the number of top ten (Top 10) finishes. Test for individual significance and discuss your findings and conclusions. Create two new independent variables: Top 2–5 and Top 6–10. Top 2–5 represents the number of times the driver finished between second and fifth place and Top 6–10 represents the number of times the driver finished between sixth and tenth place. Develop an estimated regression equation that can be used to predict Winnings ($) using Poles, Wins, Top 2–5, and Top 6–10. Test for individual significance and discuss your findings and conclusions. TABLE 15.13 Nascar Results for the 2011 Season Source: NASCAR website, February 28, 2011. (https://www.nascar.com/) Based upon the results of your analysis, what estimated regression equation would you recommend using to predict Winnings ($)? Provide an interpretation of the estimated regression coefficients for this equation. When trying to decide what car to buy, real value is not necessarily determined by how much you spend on the initial purchase. Instead, cars that are reliable and don’t cost much to own often represent the best values. But no matter how reliable or inexpensive a car may cost to own, it must also perform well. To measure value, Consumer Reports developed a statistic referred to as a value score. The value score is based upon five-year owner costs, overall road-test scores, and predicted-reliability ratings. Five-year owner costs are based upon the expenses incurred in the first five years of ownership, including depreciation, fuel, maintenance and repairs, and so on. Using a national average of 12,000 miles per year, an average cost per mile driven is used as the measure of five-year owner costs. Road-test scores are the results of more than 50 tests and evaluations and are based on a 100-point scale, with higher scores indicating better performance, comfort, convenience, and fuel economy. The highest road-test score obtained in the tests conducted by Consumer Reports was a 99 for a Lexus LS 460L. Predicted-reliability ratings (1 = Poor, 2 = Fair, 3 = Good, 4 = Very Good, and 5 = Excellent) are based upon data from Consumer Reports’ Annual Auto Survey. A car with a value score of 1.0 is considered to be an “average-value” car. A car with a value score of 2.0 is considered to be twice as good a value as a car with a value score of 1.0; a car with a value score of .5 is considered half as good as average; and so on. The data for three sizes of cars (13 small sedans, 20 family sedans, and 21 upscale sedans), including the price ($) of each car tested, are contained in the file CarValues (Consumer Reports website). To incorporate the effect of size of car, a categorical variable with three values (small sedan, family sedan, and upscale sedan), use the following dummy variables: Managerial Report Treating Cost/Mile as the dependent variable, develop an estimated regression with Family-Sedan and Upscale-Sedan as the independent variables. Discuss your findings. Treating Value Score as the dependent variable, develop an estimated regression equation using Cost/Mile, Road-Test Score, Predicted Reliability, Family-Sedan, and Upscale-Sedan as the independent variables. Delete any independent variables that are not significant from the estimated regression equation developed in part 2 using a .05 level of significance. After deleting any independent variables that are not significant, develop a new estimated regression equation. Suppose someone claims that “smaller cars provide better values than larger cars.” For the data in this case, the Small Sedans represent the smallest type of car and the Upscale Sedans represent the largest type of car. Does your analysis support this claim? Use regression analysis to develop an estimated regression equation that could be used to predict the value score given the value of the Road-Test Score. Use regression analysis to develop an estimated regression equation that could be used to predict the value score given the Predicted Reliability. What conclusions can you derive from your analysis? Consider the following data for two variables, x and y. Develop an estimated regression equation for the data of the form . Using the results from part (a), test for a significant relationship between x and y; use α = .05. Develop a scatter diagram for the data. Does the scatter diagram suggest an estimated regression equation of the form ? Explain. Develop an estimated regression equation for the data of the form . Refer to part (d). Is the relationship between x, x2, and y significant? Use α = .05. Predict the value of y when x = 25. Consider the following data for two variables, x and y. Develop an estimated regression equation for the data of the form . Comment on the adequacy of this equation for predicting y. Develop an estimated regression equation for the data of the form . Comment on the adequacy of this equation for predicting y. Predict the value of y when x = 20. 3EA highway department is studying the relationship between traffic flow and speed. The following model has been hypothesized. where y = traffic flow in vehicles per hour x = vehicle speed in miles per hour The following data were collected during rush hour for six highways leading out of the city. a. Develop an estimated regression equation for the data. b. Using α = .01, test for a significant relationship. In working further with the problem of exercise 4, statisticians suggested the use of the following curvilinear estimated regression equation. Use the data of exercise 4 to compute the coefficients of this estimated regression equation. Using α = .01, test for a significant relationship. Estimate the traffic flow in vehicles per hour at a speed of 38 miles per hour. 4. A highway department is studying the relationship between traffic flow and speed. The following model has been hypothesized. where y = traffic flow in vehicles per hour x = vehicle speed in miles per hour The following data were collected during rush hour for six highways leading out of the city. Develop an estimated regression equation for the data. Using α = .01, test for a significant relationship. A study of emergency service facilities investigated the relationship between the number of facilities and the average distance traveled to provide the emergency service. The following table gives the data collected. Develop a scatter diagram for these data, treating average distance traveled as the dependent variable. Does a simple linear model appear to be appropriate? Explain. Develop an estimated regression equation for the data that you believe will best explain the relationship between these two variables. Home Depot, a nationwide home improvement retailer, sells several brands of washing machines. A sample of 24 models of full-size washing machines sold by Home Depot and the corresponding capacity (Cu Ft) and list price follow (Home Depot website, September 5, 2016). Develop a scatter diagram for these data, treating cubic feet as the independent variable. Does a simple linear regression model appear to be appropriate? Use a simple linear regression model to develop an estimated regression equation to predict the list price given the cubic feet. Construct a standardized residual plot. Based upon the standardized residual plot, does a simple linear regression model appear to be appropriate? Using a second-order model, develop an estimated regression equation to predict the list price given the cubic feet. Do you prefer the estimated regression equation developed in part (a) or part (c)? Explain. Are there other factors that should be considered for inclusion as independent variables in this regression? Corvette, Ferrari, and Jaguar produced a variety of classic cars that continue to increase in value. The following data, based upon the Martin Rating System for Collectible Cars, show the rarity rating (1–20) and the high price ($1000) for 15 classic cars. Develop a scatter diagram of the data using the rarity rating as the independent variable and price as the dependent variable. Does a simple linear regression model appear to be appropriate? Develop an estimated multiple regression equation with x = rarity rating and x2 as the two independent variables. Consider the nonlinear relationship shown by equation (16.7). Use logarithms to develop an estimated regression equation for this model. Do you prefer the estimated regression equation developed in part (b) or part (c)? Explain. The film Suicide Squad has an average rating of 3.7 out of 5 based on 117,323 user ratings (Rotten Tomatoes website, September 4, 2016). How are the user ratings of Suicide Squad related to the age of the user and the rating the user gave The Secret Life of Pets? Suppose we have a sample of users’ ages and their ratings for the movies Suicide Squad and The Secret Life of Pets as included in the DATAfile RottenTomatoes. Develop a scatter diagram for these data with the users’ ages as the independent variable and their ratings of Suicide Squad as the dependent variable. Does a simple linear regression model appear to be appropriate? Use the data provided to develop the regression equation for estimating the user ratings of Suicide Squad that is suggested by the scatter diagram in part (a). Include the user rating of The Secret Life of Pets as an independent variable in the regression model developed in part (b). Interpret the regression coefficient for the user rating of The Secret Life of Pets. Is the regression equation developed in part (b) or the regression equation developed in part (c) superior? Explain. Suppose a 31-year-old user gave The Secret Life of Pets a rating of 4. Use the model you selected in part (d) to predict that user’s ratings of Suicide Squad. In a regression analysis involving 27 observations, the following estimated regression equation was developed. For this estimated regression equation SST = 1550 and SSE = 520. a. At α = .05, test whether x1 is significant. Suppose that variables x2 and x3 are added to the model and the following regression equation is obtained. For this estimated regression equation SST = 1550 and SSE = 100. b. Use an F test and a .05 level of significance to determine whether x2 and x3 together contribute significantly to the model. 11EThe Professional Golfers’ Association of America (PGA) collects a wide variety of performance data for members of the PGA Tour. A portion of the 2012 year-end data for 191 PGA Tour players are contained in the DATAfile named PGATour (CBS Sports website, 2013). For each player the variable named Scoring Avg is the average number of strokes per completed round; GIR is the percentage of time the player is able to hit a green in regulation; DriveDist is the average number of yards per measured drive; PPR is the average number of putts the player takes per round; and BPR is the average number of birdies per round. Develop an estimated regression equation that can be used to predict Scoring Average given the percentage of time a player is able to hit a green in regulation. Develop an estimated regression equation that can be used to predict Scoring Average based upon the percentage of time a player is able to hit a green in regulation, the average driving distance, and the average number of putts per round. At the .05 level of significance, test whether the two independent variables added in part (b), the average driving distance and the average number of putts per round, contribute to the estimated regression equation developed in part (a). Explain. Refer to exercise 12. Develop an estimated regression equation that can be used to predict the average number of birdies per round based upon the percentage of time the player is able to hit a green in regulation. Develop an estimated regression equation that can be used to predict the average number of birdies per round based upon the percentage of time a player is able to hit a green in regulation, the average driving distance, and the average number of putts per round. At the .05 level of significance, test whether the two independent variables added in part (b), the average driving distance and the average number of putts per round, contribute significantly to the estimated regression equation developed in part (a). Explain. The Professional Golfers’ Association of America (PGA) collects a wide variety of performance data for members of the PGA Tour. A portion of the 2012 year-end data for 191 PGA Tour players are contained in the DATAfile named PGATour (CBS Sports website, 2013). For each player the variable named Scoring Avg is the average number of strokes per completed round; GIR is the percentage of time the player is able to hit a green in regulation; DriveDist is the average number of yards per measured drive; PPR is the average number of putts the player takes per round; and BPR is the average number of birdies per round. Develop an estimated regression equation that can be used to predict Scoring Average given the percentage of time a player is able to hit a green in regulation. Develop an estimated regression equation that can be used to predict Scoring Average based upon the percentage of time a player is able to hit a green in regulation, the average driving distance, and the average number of putts per round. At the .05 level of significance, test whether the two independent variables added in part (b), the average driving distance and the average number of putts per round, contribute to the estimated regression equation developed in part (a). Explain. A 10-year study conducted by the American Heart Association provided data on how age, systolic blood pressure, and smoking relate to the risk of strokes. Data from a portion of this study follow. Risk is interpreted as the probability (times 100) that a person will have a stroke over the next 10-year period. For the smoker variable, 1 indicates a smoker and 0 indicates a nonsmoker. Develop an estimated regression equation that can be used to predict the risk of stroke given the age and blood-pressure level. Consider adding two independent variables to the model developed in part (a), one for the interaction between age and blood-pressure level and the other for whether the person is a smoker. Develop an estimated regression equation using these four independent variables. At a .05 level of significance, test to see whether the addition of the interaction term and the smoker variable contribute significantly to the estimated regression equation developed in part (a). The average monthly residential gas bill for Black Hills Energy customers in Cheyenne, Wyoming is $67.95 (Wyoming Public Service Commission website, August 21, 2016). How is the average monthly gas bill for a Cheyenne residence related to the square footage, number of rooms, and age of the residence? The following data show the average monthly gas bill for last year, square footage, number of rooms, and age for 20 typical Cheyenne residences. Develop an estimated regression equation that can be used to predict a residence’s average monthly gas bill for last year given its age. Develop an estimated regression equation that can be used to predict a residence’s average monthly gas bill for last year given its age, square footage, and number of rooms. At the .05 level of significance, test whether the two independent variables added in part (b), the square footage and the number of rooms, contribute significantly to the estimated regression equation developed in part (a). 16E17E18E19E20E21E22E23EThe following data show the daily closing prices (in dollars per share) for a stock. Define the independent variable Period, where Period = 1 corresponds to the data for November 3, Period = 2 corresponds to the data for November 4, and so on. Develop the estimated regression equation that can be used to predict the closing price given the value of Period. At the .05 level of significance, test for any positive autocorrelation in the data. Refer to the Cravens data set in Table 16.5. In Section 16.3 we showed that the estimated regression equation involving Accounts, AdvExp, Poten, and Share had an adjusted coefficient of determination of 88.1%. Use the .05 level of significance and apply the Durbin–Watson test to determine whether positive autocorrelation is present. TABLE 16.5 THE CRAVENS DATA A sample containing years to maturity and yield (%) for 40 corporate bonds is contained in the data file named CorporateBonds (Barron’s, April 2, 2012). Develop a scatter diagram of the data using x = years to maturity as the independent variable. Does a simple linear regression model appear to be appropriate? Develop an estimated regression equation with x = years to maturity and x2 as the independent variables. As an alternative to fitting a second-order model, fit a model using the natural logarithm of price as the independent variable; that is, . Does the estimated regression using the natural logarithm of x provide a better fit than the estimated regression developed in part (b)? Explain. Consumer Reports tested 19 different brands and models of road, fitness, and comfort bikes. Road bikes are designed for long road trips; fitness bikes are designed for regular workouts or daily commutes; and comfort bikes are designed for leisure rides on typically flat roads. The following data show the type, weight (lb.), and price ($) for the 19 bicycles tested (Consumer Reports website, February 2009). Develop a scatter diagram with weight as the independent variable and price as the dependent variable. Does a simple linear regression model appear to be appropriate? Develop an estimated multiple regression equation with x = weight and x2 as the two independent variables. Use the following dummy variables to develop an estimated regression equation that can be used to predict the price given the type of bike: Type_Fitness = 1 if the bike is a fitness bike, 0 otherwise; and Type_Comfort = 1 if the bike is a comfort bike; 0 otherwise. Compare the results obtained to the results obtained in part (b). To account for possible interaction between the type of bike and the weight of the bike, develop a new estimated regression equation that can be used to predict the price of the bike given the type, the weight of the bike, and any interaction between weight and each of the dummy variables defined in part (c). What estimated regression equation appears to be the best predictor of price? Explain. A study investigated the relationship between audit delay (Delay), the length of time from a company’s fiscal year-end to the date of the auditor’s report, and variables that describe the client and the auditor. Some of the independent variables that were included in this study follow. A sample of 40 companies provided the following data. Develop the estimated regression equation using all of the independent variables. Did the estimated regression equation developed in part (a) provide a good fit? Explain. Develop a scatter diagram showing Delay as a function of Finished. What does this scatter diagram indicate about the relationship between Delay and Finished? On the basis of your observations about the relationship between Delay and Finished, develop an alternative estimated regression equation to the one developed in (a) to explain as much of the variability in Delay as possible. Refer to the data in exercise 28. Consider a model in which only Industry is used to predict Delay. At a .01 level of significance, test for any positive autocorrelation in the data. Refer to the data in exercise 28. Develop an estimated regression equation that can be used to predict Delay by using Industry and Quality. At the .05 level of significance, test for any positive autocorrelation in the data. 31SEThe Ladies Professional Golf Association (LPGA) maintains data on performance for members of the LPGA Tour. Scoring average is generally considered the most important statistic in terms of a player’s success. To investigate the relationship between scoring average and variables such as driving distance, driving accuracy, greens in regulation, sand saves, and average putts per round, year-end performance data for 140 players on the LPGA Tour for 2012 are contained in the DATAfile named LPGATour2012 (LPGA website, May 2013). Each row of the data set corresponds to a LPGA Tour player. Descriptions for the variables in the data set follow. Managerial Report Suppose that you have been hired by the commissioner of the LPGA to analyze the data for a presentation to be made at the annual LPGA Tour meeting. The commissioner has asked whether it would be possible to use these data to determine the performance measures that are the best predictors of a player’s average score. Use the methods presented in this and previous chapters to analyze the data. Prepare a report that summarizes your analysis, including key statistical results, conclusions, and recommendations. Wine Spectator magazine contains articles and reviews on every aspect of the wine industry, including ratings of wine from around the world. In a recent issue they reviewed and scored 475 wines from the Piedmont region of Italy using a 100-point scale. The following table shows how the Wine Spectator score each wine received is used to rate each wine as being classic, outstanding, very good, good, mediocre, or not recommended. A key question for most consumers is whether paying more for a bottle of wine will result in a better wine. To investigate this question for wines from the Piedmont region we selected a random sample of 100 of the 475 wines that Wine Spectator reviewed. The data, contained in the DATAfile named WineRatings, shows the price ($), the Wine Spectator score, and the rating for each wine. Managerial Report Develop a table that shows the number of wines that were classified as classic, outstanding, very good, good, mediocre, and not recommended and the average price. Does there appear to be any relationship between the price of the wine and the Wine Spectator rating? Are there any other aspects of your initial summary of the data that stand out? Develop a scatter diagram with price on the horizontal axis and the Wine Spectator score on the vertical axis. Does the relationship between price and score appear to be linear? Using linear regression, develop an estimated regression equation that can be used to predict the score given the price of the wine. Using a second-order model, develop an estimated regression equation that can be used to predict the score given the price of the wine. Compare the results from fitting a linear model and fitting a second-order model. As an alternative to fitting a second-order model, fit a model using the natural logarithm of price as the independent variable. Compare the results with the second-order model. Based upon your analysis, would you say that spending more for a bottle of wine will provide a better wine? Suppose that you want to spend a maximum of $30 for a bottle of wine. In this case, will spending closer to your upper limit for price result in a better wine than a much lower price? 1. Consider the following time series data. Week 1 2 3 4 5 6 Value 18 13 16 11 17 14 Using the naive method (most recent value) as the forecast for the next week, compute the following measures of forecast accuracy. Mean absolute error. Mean squared error. Mean absolute percentage error. What is the forecast for week 7? 2. Refer to the time series data in exercise 1. Using the average of all the historical data as a forecast for the next period, compute the following measures of forecast accuracy. Mean absolute error. Mean squared error. Mean absolute percentage error. What is the forecast for week 7? 3E4. Consider the following time series data. Month 1 2 3 4 5 6 7 Value 24 13 20 12 19 23 15 Compute MSE using the most recent value as the forecast for the next period. What is the forecast for month 8? Compute MSE using the average of all the data available as the forecast for the next period. What is the forecast for month 8? Which method appears to provide the better forecast? Consider the following time series data. Construct a time series plot. What type of pattern exists in the data? Develop the three-week moving average forecasts for this time series. Compute MSE and a forecast for week 7. Use α = .2 to compute the exponential smoothing forecasts for the time series. Compute MSE and a forecast for week 7. Compare the three-week moving average approach with the exponential smoothing approach using α = .2. Which appears to provide more accurate forecasts based on MSE? Explain. Use a smoothing constant of α = .4 to compute the exponential smoothing forecasts. Does a smoothing constant of .2 or .4 appear to provide more accurate forecasts based on MSE? Explain. Consider the following time series data. Construct a time series plot. What type of pattern exists in the data? Develop the three-week moving average forecasts for this time series. Compute MSE and a forecast for week 8. Use α = .2 to compute the exponential smoothing forecasts for the time series. Compute MSE and a forecast for week 8. Compare the three-week moving average approach with the exponential smoothing approach using α = .2. Which appears to provide more accurate forecasts based on MSE? Use a smoothing constant of α = .4 to compute the exponential smoothing forecasts. Does a smoothing constant of .2 or .4 appear to provide more accurate forecasts based on MSE? Explain. Refer to the gasoline sales time series data in Table 17.1. Compute four-week and five-week moving averages for the time series. Compute the MSE for the four-week and five-week moving average forecasts. What appears to be the best number of weeks of past data (three, four, or five) to use in the moving average computation? Recall that MSE for the three-week moving average is 10.22. 8E9. With the gasoline time series data from Table 17.1, show the exponential smoothing forecasts using α = .1. Applying the MSE measure of forecast accuracy, would you prefer a smoothing constant of α = . 1 or α = .2 for the gasoline sales time series? Are the results the same if you apply MAE as the measure of accuracy? What are the results if MAPE is used? 10. With a smoothing constant of α = .2, equation (17.2) shows that the forecast for week 13 of the gasoline sales data from Table 17.1 is given by F13 = .2Y12 + .8F12. However, the forecast for week 12 is given by F12 = .2Y11 + .8F11. Thus, we could combine these two results to show that the forecast for week 13 can be written Making use of the fact that F11 = .2Y10 + .8F10 (and similarly for F10 and F9), continue to expand the expression for F13 until it is written in terms of the past data values Y12, Y11, Y10, Y9, Y8, and the forecast for period 8. Refer to the coefficients or weights for the past values Y12, Y11, Y10, Y9, Y8. What observation can you make about how exponential smoothing weights past data values in arriving at new forecasts? Compare this weighting pattern with the weighting pattern of the moving averages method. For the Hawkins Company, the monthly percentages of all shipments received on time over the past 12 months are 80, 82, 84, 83, 83, 84, 85, 84, 82, 83, 84, and 83. Construct a time series plot. What type of pattern exists in the data? Compare the three-month moving average approach with the exponential smoothing approach for α = .2. Which provides more accurate forecasts using MSE as the measure of forecast accuracy? What is the forecast for next month? Corporate triple-A bond interest rates for 12 consecutive months follow. Construct a time series plot. What type of pattern exists in the data? Develop three-month and four-month moving averages for this time series. Does the three-month or four-month moving average provide more accurate forecasts based on MSE? Explain. What is the moving average forecast for the next month? The values of Alabama building contracts (in $ millions) for a 12-month period follow. Construct a time series plot. What type of pattern exists in the data? Compare the three-month moving average approach with the exponential smoothing forecast using α = .2. Which provides more accurate forecasts based on MSE? What is the forecast for the next month? The following time series shows the sales of a particular product over the past 12 months. Construct a time series plot. What type of pattern exists in the data? Use α = .3 to compute the exponential smoothing forecasts for the time series. Use a smoothing constant of α = .5 to compute the exponential smoothing forecasts. Does a smoothing constant of .3 or .5 appear to provide more accurate forecasts based on MSE? Ten weeks of data on the Commodity Futures Index are 7.35, 7.40, 7.55, 7.56, 7.60, 7.52, 7.52, 7.70, 7.62, and 7.55. Construct a time series plot. What type of pattern exists in the data? Compute the exponential smoothing forecasts for α = .2. Compute the exponential smoothing forecasts for α = .3. Which exponential smoothing constant provides more accurate forecasts based on MSE? Forecast week 11. 16EConsider the following time series data. Construct a time series plot. What type of pattern exists in the data? Develop the linear trend equation for this time series. What is the forecast for t = 6? 18E19E20E21E22EThe president of a small manufacturing firm is concerned about the continual increase in manufacturing costs over the past several years. The following figures provide a time series of the cost per unit for the firm’s leading product over the past eight years. Construct a time series plot. What type of pattern exists in the data? Develop the linear trend equation for this time series. What is the average cost increase that the firm has been realizing per year? Compute an estimate of the cost/unit for next year. The following data shows the average interest rate (%) for a 30-year fixed-rate mortgage over a ten-year period (FreddieMac website, July 30, 2016). Construct a time series plot. Do you think a linear trend or a quadratic trend will provide a better fit for this time series? Why? Develop the linear trend equation for this time series. Using this linear trend equation, forecast the average interest rate for 2016. Develop the quadratic trend equation for this time series. Using this quadratic trend equation, forecast the average interest rate for 2016. Compare your answers to parts (b) and (c). Which model would you recommend? Why? Quarterly revenue ($ millions) for Twitter for the first quarter of 2012 through the first quarter of 2014 are shown below (adexchange.com, April 2015): Construct a time series plot. What type of pattern exists in the data? Using Excel or Minitab, develop a linear trend equation for this time series. Using Excel or Minitab, develop a quadratic trend equation for this time series. Compare the MSE for each model. Which model appears better according to MSE? Use the models developed in parts (b) and (c) to forecast revenue for the tenth quarter. Which of the two forecasts in part (e) would you use? Explain. Giovanni Food Products produces and sells frozen pizzas to public schools throughout the eastern United States. Using a very aggressive marketing strategy they have been able to increase their annual revenue by approximately $10 million over the past 10 years. But increased competition has slowed their growth rate in the past few years. The annual revenue, in millions of dollars, for the previous 10 years is shown. Construct a time series plot. Comment on the appropriateness of a linear trend. Develop a quadratic trend equation that can be used to forecast revenue. Using the trend equation developed in part (b), forecast revenue in year 11. The number of users of Facebook from 2004 through 2011 follows (Facebook website, April 16, 2012). Construct a time series plot. What type of pattern exists? Develop a quadratic trend equation. Consider the following time series. Construct a time series plot. What type of pattern exists in the data? Use the following dummy variables to develop an estimated regression equation to account for seasonal effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. Consider the following time series data. Construct a time series plot. What type of pattern exists in the data? Use the following dummy variables to develop an estimated regression equation to account for any seasonal and linear trend effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. The quarterly sales data (number of copies sold) for a college textbook over the past three years follow. Construct a time series plot. What type of pattern exists in the data? Use the following dummy variables to develop an estimated regression equation to account for any seasonal effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. Let t = 1 to refer to the observation in quarter 1 of year 1; t = 2 to refer to the observation in quarter 2 of year 1; … and t = 12 to refer to the observation in quarter 4 of year 3. Using the dummy variables defined in part (b) and t, develop an estimated regression equation to account for seasonal effects and any linear trend in the time series. Based upon the seasonal effects in the data and linear trend, compute the quarterly forecasts for next year. Air pollution control specialists in southern California monitor the amount of ozone, carbon dioxide, and nitrogen dioxide in the air on an hourly basis. The hourly time series data exhibit seasonality, with the levels of pollutants showing patterns that vary over the hours in the day. On July 15, 16, and 17, the following levels of nitrogen dioxide were observed for the 12 hours from 6:00 a.m. to 6:00 p.m. Construct a time series plot. What type of pattern exists in the data? Use the following dummy variables to develop an estimated regression equation to account for the seasonal effects in the data. Hour1 = 1 if the reading was made between 6:00 a.m. and 7:00 a.m.; 0 otherwise Hour2 = 1 if if the reading was made between 7:00 a.m. and 8:00 a.m.; 0 otherwise . . . Hour11 = 1 if the reading was made between 4:00 p.m. and 5:00 p.m., 0 otherwise. Note that when the values of the 11 dummy variables are equal to 0, the observation corresponds to the 5:00 p.m. to 6:00 p.m. hour. Using the estimated regression equation developed in part (a), compute estimates of the levels of nitrogen dioxide for July 18. Let t = 1 to refer to the observation in hour 1 on July 15; t = 2 to refer to the observation in hour 2 of July 15; … and t = 36 to refer to the observation in hour 12 of July 17. Using the dummy variables defined in part (b) and t, develop an estimated regression equation to account for seasonal effects and any linear trend in the time series. Based upon the seasonal effects in the data and linear trend, compute estimates of the levels of nitrogen dioxide for July 18. South Shore Construction builds permanent docks and seawalls along the southern shore of Long Island, New York. Although the firm has been in business only five years, revenue has increased from $308,000 in the first year of operation to $1,084,000 in the most recent year. The following data show the quarterly sales revenue in thousands of dollars. Construct a time series plot. What type of pattern exists in the data? Use the following dummy variables to develop an estimated regression equation to account for seasonal effects in the data. Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; Qtr3 = 1 if Quarter 3, 0 otherwise. Based only on the seasonal effects in the data, compute estimates of quarterly sales for year 6. Let Period = 1 to refer to the observation in quarter 1 of year 1; Period = 2 to refer to the observation in quarter 2 of year 1; … and Period = 20 to refer to the observation in quarter 4 of year 5. Using the dummy variables defined in part (b) and Period, develop an estimated regression equation to account for seasonal effects and any linear trend in the time series. Based upon the seasonal effects in the data and linear trend, compute estimates of quarterly sales for year 6. 33E34EConsider the following time series data. Construct a time series plot. What type of pattern exists in the data? Show the four-quarter and centered moving average values for this time series. Compute seasonal indexes and adjusted seasonal indexes for the four quarters. Refer to exercise 35. Deseasonalize the time series using the adjusted seasonal indexes computed in part (c) of exercise 35. Compute the linear trend regression equation for the deseasonalized data. Compute the deseasonalized quarterly trend forecast for Year 4. Use the seasonal indexes to adjust the deseasonalized trend forecasts computed in part (c). 35. Consider the following time series data. Construct a time series plot. What type of pattern exists in the data? Show the four-quarter and centered moving average values for this time series. Compute seasonal indexes and adjusted seasonal indexes for the four quarters. The quarterly sales data (number of copies sold) for a college textbook over the past three years follow. Construct a time series plot. What type of pattern exists in the data? Show the four-quarter and centered moving average values for this time series. Compute the seasonal and adjusted seasonal indexes for the four quarters. When does the publisher have the largest seasonal index? Does this result appear reasonable? Explain. Deseasonalize the time series. Compute the linear trend equation for the deseasonalized data and forecast sales using the linear trend equation. Adjust the linear trend forecasts using the adjusted seasonal indexes computed in part (c). Three years of monthly lawn-maintenance expenses ($) for a six-unit apartment house in southern Florida follow. Construct a time series plot. What type of pattern exists in the data? Identify the monthly seasonal indexes for the three years of lawn-maintenance expenses for the apartment house in southern Florida as given here. Use a 12-month moving average calculation. Deseasonalize the time series. Compute the linear trend equation for the deseasonalized data. Compute the deseasonalized trend forecasts and then adjust the trend forecasts using the seasonal indexes to provide a forecast for monthly expenses in Year 4. Air pollution control specialists in southern California monitor the amount of ozone, carbon dioxide, and nitrogen dioxide in the air on an hourly basis. The hourly time series data exhibit seasonality, with the levels of pollutants showing patterns over the hours in the day. On July 15, 16, and 17, the following levels of nitrogen dioxide were observed in the downtown area for the 12 hours from 6:00 a.m. to 6:00 p.m. Construct a time series plot. What type of pattern exists in the data? Identify the hourly seasonal indexes for the 12 readings each day. Deseasonalize the time series. Compute the linear trend equation for the deseasonalized data. Compute the deseasonalized trend forecasts for the 12 hours for July 18 and then adjust the trend forecasts using the seasonal indexes computed in part (b). Electric power consumption is measured in kilowatt-hours (kWh). The local utility company offers an interrupt program whereby commercial customers that participate receive favorable rates but must agree to cut back consumption if the utility requests them to do so. Timko Products cut back consumption at 12:00 noon Thursday. To assess the savings, the utility must estimate Timko’s usage without the interrupt. The period of interrupted service was from noon to 8:00 p.m. Data on electric power consumption for the previous 72 hours are available. Is there a seasonal effect over the 24-hour period? Compute seasonal indexes for the six 4-hour periods. Use trend adjusted for seasonal indexes to estimate Timko’s normal usage over the period of interrupted service. The weekly demand (in cases) for a particular brand of automatic dishwasher detergent for a chain of grocery stores located in Columbus, Ohio, follows. Construct a time series plot. What type of pattern exists in the data? Use a three-week moving average to develop a forecast for week 11. Use exponential smoothing with a smoothing constant of α = .2 to develop a forecast for week 11. Which of the two methods do you prefer? Why? The following table reports the percentage of stocks in a portfolio for nine quarters from 2011 to 2013. Construct a time series plot. What type of pattern exists in the data? Use exponential smoothing to forecast this time series. Consider smoothing constants of α = .2, .3, and .4. What value of the smoothing constant provides the most accurate forecasts? What is the forecast of the percentage of stocks in a typical portfolio for the second quarter of 2013? United Dairies, Inc., supplies milk to several independent grocers throughout Dade County, Florida. Managers at United Dairies want to develop a forecast of the number of half-gallons of milk sold per week. Sales data for the past 12 weeks follow. Construct a time series plot. What type of pattern exists in the data? Use exponential smoothing with α = .4 to develop a forecast of demand for week 13. Annual retail store revenue for Apple from 2007 to 2014 are shown below (source: apple.com). Construct a time series plot. What type of pattern exists in the data? Using Minitab or Excel, develop a linear trend equation for this time series. Use the trend equation developed in part (b) to forecast retail store revenue for 2015. The Mayfair Department Store in Davenport, Iowa, is trying to determine the amount of sales lost while it was shut down during July and August because of damage caused by the Mississippi River flood. Sales data for January through June follow. Use exponential smoothing, with α = .4, to develop a forecast for July and August. (Hint: Use the forecast for July as the actual sales in July in developing the August forecast.) Comment on the use of exponential smoothing for forecasts more than one period into the future. Use trend projection to forecast sales for July and August. Mayfair’s insurance company proposed a settlement based on lost sales of $240,000 in July and August. Is this amount fair? If not, what amount would you recommend as a counteroffer? 47SEThe Costello Music Company has been in business for five years. During that time, sales of pianos increased from 12 units in the first year to 76 units in the most recent year. Fred Costello, the firm’s owner, wants to develop a forecast of piano sales for the coming year. The historical data follow. Construct a time series plot. What type of pattern exists in the data? Develop the linear trend equation for the time series. What is the average increase in sales that the firm has been realizing per year? Forecast sales for years 6 and 7. Consider the Costello Music Company problem in exercise 48. The quarterly sales data follow. 50SERefer to the Costello Music Company time series in exercise 49. Deseasonalize the data and use the deseasonalized time series to identify the trend. Use the results of part (a) to develop a quarterly forecast for next year based on trend. Use the seasonal indexes developed in exercise 50 to adjust the forecasts developed in part (b) to account for the effect of season. 49. Consider the Costello Music Company problem in exercise 48. The quarterly sales data follow. Use the following dummy variables to develop an estimated regression equation to account for any seasonal and linear trend effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; and Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. 52SERefer to the Hudson Marine problem in exercise 52. Suppose the quarterly sales values for the seven years of historical data are as follows. Use the following dummy variables to develop an estimated regression equation to account for any season and linear trend effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; and Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. 52. Hudson Marine has been an authorized dealer for C&D marine radios for the past seven years. The following table reports the number of radios sold each year. Construct a time series plot. Does a linear trend appear to be present? Compute the linear trend equation for this time series. Use the linear trend equation developed in part (b) to develop a forecast for annual sales in year 8. Refer to the Hudson Marine problem in exercise 53. Compute the centered moving average values for this time series. Construct a time series plot that also shows the centered moving average and original time series on the same graph. Discuss the differences between the original time series plot and the centered moving average time series. Compute the seasonal indexes for the four quarters. When does Hudson Marine experience the largest seasonal effect? Does this result seem reasonable? Explain. 53. Refer to the Hudson Marine problem in exercise 52. Suppose the quarterly sales values for the seven years of historical data are as follows. Use the following dummy variables to develop an estimated regression equation to account for any season and linear trend effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; and Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. Refer to the Hudson Marine data in exercise 53. Deseasonalize the data and use the deseasonalized time series to identify the trend. Use the results of part (a) to develop a quarterly forecast for next year based on trend. Use the seasonal indexes developed in exercise 54 to adjust the forecasts developed in part (b) to account for the effect of season. 53. Refer to the Hudson Marine problem in exercise 52. Suppose the quarterly sales values for the seven years of historical data are as follows. Use the following dummy variables to develop an estimated regression equation to account for any season and linear trend effects in the data: Qtr1 = 1 if Quarter 1, 0 otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; and Qtr3 = 1 if Quarter 3, 0 otherwise. Compute the quarterly forecasts for next year. Forecasting Food and Beverage Sales The Vintage Restaurant, on Captiva Island near Fort Myers, Florida, is owned and operated by Karen Payne. The restaurant just completed its third year of operation. Since opening her restaurant, Karen has sought to establish a reputation for the Vintage as a high-quality dining establishment that specializes in fresh seafood. Through the efforts of Karen and her staff, her restaurant has become one of the best and fastest growing restaurants on the island. To better plan for future growth of the restaurant, Karen needs to develop a system that will enable her to forecast food and beverage sales by month for up to one year in advance. Table 17.25 shows the value of food and beverage sales ($1000s) for the first three years of operation. Managerial Report Perform an analysis of the sales data for the Vintage Restaurant. Prepare a report for Karen that summarizes your findings, forecasts, and recommendations. Include the following: A time series plot. Comment on the underlying pattern in the time series. An analysis of the seasonality of the data. Indicate the seasonal indexes for each month, and comment on the high and low seasonal sales months. Do the seasonal indexes make intuitive sense? Discuss. TABLE 17.25 Food and beverage sales for the vintage restaurant ($1000s) Deseasonalize the time series. Does there appear to be any trend in the deseasonalized time series? Using the time series decomposition method, forecast sales for January through December of the fourth year. Using the dummy variable regression approach, forecast sales for January through December of the fourth year. Provide summary tables of your calculations and any graphs in the appendix of your report. Assume that January sales for the fourth year turn out to be $295,000. What was your forecast error? If this error is large, Karen may be puzzled about the difference between your forecast and the actual sales value. What can you do to resolve her uncertainty in the forecasting procedure? The Carlson Department Store suffered heavy damage when a hurricane struck on August 31. The store was closed for four months (September through December), and Carlson is now involved in a dispute with its insurance company about the amount of lost sales during the time the store was closed. Two key issues must be resolved: (1) the amount of sales Carlson would have made if the hurricane had not struck and (2) whether Carlson is entitled to any compensation for excess sales due to increased business activity after the storm. More than $8 billion in federal disaster relief and insurance money came into the county, resulting in increased sales at department stores and numerous other businesses. Table 17.26 gives Carlson’s sales data for the 48 months preceding the storm. Table 17.27 reports total sales for the 48 months preceding the storm for all department stores in the county, as well as the total sales in the county for the four months the Carlson Department Store was closed. Carlson’s managers asked you to analyze these data and develop estimates of the lost sales at the Carlson Department Store for the months of September through December. They also asked you to determine whether a case can be made for excess storm-related sales during the same period. If such a case can be made, Carlson is entitled to compensation for excess sales it would have earned in addition to ordinary sales. TABLE 17.26 SALES FOR CARLSON DEPARTMENT STORE ($MILLIONS) Managerial Report Prepare a report for the managers of the Carlson Department Store that summarizes your findings, forecasts, and recommendations. Include the following: An estimate of sales for Carlson Department Store had there been no hurricane. An estimate of countywide department store sales had there been no hurricane. An estimate of lost sales for the Carlson Department Store for September through December. TABLE 17.27 DEPARTMENT STORE SALES FOR THE COUNTY ($MILLIONS)