Bartleby Sitemap - Textbook Solutions

All Textbook Solutions for Mind on Statistics

Refer to the data and five-number summaries given in Case Study 1.1. Give a numerical value for each of the following. The fastest speed driven by anyone in the class. The slowest of the “fastest speeds” driven by a male. The speed for which one-fourth of the women had driven at that speed or faster. The proportion of females who had driven 89 mph or faster. The number of females who had driven 89 mph or faster.A five-number summary for the heights in inches of the women who participated in the survey in Case Study 1.1 is as shown: What is the median height for these women? What is the range of heights—that is, the difference in heights between the shortest woman and the tallest woman? What is the interval of heights containing the shortest one-fourth of the women? What is the interval of heights containing the middle one-half of the women?In recent years, Vietnamese American women have had the highest rate of cervical cancer in the country. Suppose that among 200,000 Vietnamese American women, 86 developed cervical cancer in the past year. Calculate the rate of cervical cancer for these women. What is the estimated risk of developing cervical cancer for Vietnamese American women in the next year? Explain the conceptual difference between the rate and the risk, in the context of this example.The risk of getting lung cancer at some point in one’s life for men who have never smoked is about 13 in 1000. The risk for men who smoke is just over 13 times the risk for non smokers. (Source: Villenueve and Lau, 1994) a. What is the base rate for lung cancer in men over a lifetime? b. What is the approximate lifetime risk of getting lung cancer for men who smoke?Refer to Case Study 1.3, in which teens were asked about their dating behavior. a. What population is represented by the random sample of 602 teens? b. What population is represented by the 496 teens in the sample that had dated?Using Case Study 1.6 as an example, explain the difference between a population and a sample.A CBS News poll taken in December 2009, asked a random sample of 1048 adults in the United States, “In general, do you think the education most children are getting today in public schools is better, is about the same, or is worse than the education you received?” About 34% said “Better,” 24% said “About the same,” and 38% said “Worse.” (The remaining 4% were unsure.) What is the population for this survey? What is the approximate margin of error for this survey? Provide an interval that is 95% certain to cover the true percentage of U.S. adults in December 2009 who would have answered “Better” to this question if asked.A telephone survey of 2000 Canadians conducted March 20-30, 2001, found that ‘Overa1l, about half of Canadians in the poll say the right number of immigrants is coming into the country and that immigration has a positive effect on Canadian communities. Only 16 percent view it as a negative impact while one third said it had no impact at all” (The Ottawa Citizen, August 17, 2001, p. A6). What is the population for this survey? How many people were in the sample used for this survey? What is the approximate margin of error for this survey? Provide an interval of numbers that is 95% certain to cover the true percentage of Canadians who view immigration as having a negative impact.In Case Study 1.3, the margin of error for the sample of 496 teenagers was about 4.5%. How many teenagers should be in the sample to produce an approximate margin of error of .05 or 5%?About how many people would need to be in a random sample from a large population to produce an approximate margin of error of .30 or 30%?1.11E1.12E1.13EFor each of the studies described, explain whether the study was an observational study or a randomized experiment. A group of 100 students was randomly divided, with 50 assigned to receive vitamin C and the remaining 50 to receive a placebo, to determine whether or not vitamin C helps to prevent colds. A random sample of patients who received a hip trans plant operation at Stanford University Hospital during 2000 to 2010 will be followed for 10 years after their operation to determine the success (or failure) of the transplant. Volunteers with high blood pressure were randomly divided into two groups. One group was taught to practice meditation and the other group was given a low-fat diet. After 8 weeks, reduction in blood pressure was compared for the two groups.1.15ESuppose that an observational study showed that students who got at least 7 hours of sleep performed better on exams than students who got less than 7 hours of sleep. Which of the following are possible confounding variables, and which are not? Explain why in each case. Number of courses the student took that term. Weight of the student. Number of hours the student spent partying in a typical week.A randomized experiment was done in which overweight men were randomly assigned to either exercise or go on a diet for a year. At the end of the study there was a statistically significant difference in average weight loss for the two groups. What additional information would you need in order to determine if the difference in average weight loss had practical importance?Explain the distinction between statistical significance and practical significance. Can the result of a study be statistically significant but not practically significant?A (hypothetical) study of what people do in their spare time found that people born under the astrological sign of Aries were significantly more likely to be regular swimmers than people born under other signs. What additional information would you want to know to help you determine if this result is a false positive?1.20ERefer to Case Study 1.6, in which the relationship between aspirin and heart attack rates was examined. Using the results of this experiment, what do you think is the base rate of heart attacks for men like the ones in this study? Explain.Students in a statistics class at Penn State were asked, "About how many minutes do you typically exercise in a week?” Responses from the women in the class were 60, 240, 0, 360, 450, 200, 100, 70, 240, 0, 60, 360, 180, 300, 0, 270 Responses from the men in the class were 180, 300, 60, 480, 0, 90, 300, 14, 600, 360, 120, 0, 240 Compare the women to the men using a dotplot. What does your plot show you about the difference between the men and the women? For each sex, determine the median response. Do you think there’s a "significant" difference between the weekly amount that men and women exercise? Explain.1.23E1.24EAn article in the magazine Science (Service, 1994) discussed a study comparing the health of 6000 vegetarians and a similar number of their friends and relatives who were not vegetarians. The vegetarians had a 28% lower death rate from heart attacks and a 39% lower death rate from cancer, even after the researchers accounted for differences in smoking, weight, and social class. In other words, the reported percentages were the remaining differences after adjusting for differences in death rates due to those factors. Is this an observational study or a randomized experiment? Explain. On the basis of this information, can we conclude that a vegetarian diet causes lower death rates from heart attacks and cancer? Explain. Give an example of a potential confounding variable and explain what it means to say that it is a confounding variable.1.26E1.27E1.28ERefer to the study in Exercise 1.28, in which there was a statistically significant difference in the percentage of smokers who quit using a nicotine patch and a placebo patch. Now read the two cautions in the “moral of the story” for Case Study 1.7. Discuss each of them in the context of this study.1.30E1.31E1.32E1.33E1.34ERefer to Exercise 1.33. The Roper Organization selected a random sample of adults in the United States for this poll. Suppose listeners to a late-night radio talk show were asked to call and report whether or not they had ever seen a ghost. What is this type of sample called? Do you think the proportion reporting that they had seen a ghost for the radio poll would be higher or lower than the proportion for the Roper poll? Explain.1.36E1.37E1.38E1.39E1.40E1.41ESuppose you were to read the following news story: “Researchers compared a new drug to a placebo for treating high blood pressure, and it seemed to work. But the researchers were concerned because they found that significantly more people got headaches when taking the new drug than when taking the placebo. Headaches were the only problem out of the 20 possible side effects the researchers tested.” Do you think the research used an observational study or a randomized experiment? Explain. Do you think the researchers are justified in thinking the new drug would cause more headaches in the population than the placebo would? Explain.Refer to Case Study 1.5. Explain what mistakes were made in the implementation of steps 4 and 5 of “The Discovery of Knowledge’ when USA Today reported the results of this study.Refer to Case Study 1.6. Go through the five steps listed under “The Discovery of Knowledge” in Section 1.3, and show how each step was addressed in this study.A sociologist assembles a dataset consisting of the poverty rate, per capita income, serious crime rate, and teen birthrate for the 50 states of the United States. How many variables are in this dataset? What is an observational unit in this dataset? What is the sample size for the dataset?Suppose that in a national survey of 620 randomly selected adults, each person is asked how important religion is to him or her (very, fairly, not very), and whether the person favors or opposes stricter regulation of what can be broadcast on network television. a. How many variables are measured in this survey? b. What is an observational unit in this study? c. What is the sample size for this survey?In each situation, explain whether it would be more appropriate to treat the observed data as a sample from a larger population or as data from the whole population. An instructor surveys all the students in her class to determine whether students would prefer a take-home exam or an in-class exam. The Gallup Organization polls 1000 individuals to estimate the percent of American adults who approve of the President’s job performance.In each situation, explain whether it would be more appropriate to treat the observed dataset as sample data or as population data. A historian summarizes the ages at death for all deceased past presidents of the United States. A nutritionist wants to determine which of two weight loss programs is more effective. He assigns 25 volunteers to each program and records each participant’s weight loss after 2 months.For each of the following statistical summaries, explain whether it is a population parameter or a sample statistic. In the 2000 census of the United States, it was determined that the average household size was 2.59 persons per household (http://www.census.gov). In an ABC News poll completed in June 2004, 36% of n =500 persons surveyed said that they supported replacing the portrait of Alexander Hamilton on the U.S. $10 bill with a portrait of Ronald Reagan (www.pollingreport.com/news.htm). To estimate average normal body temperature of all adults, a doctor measures the temperatures of 100 healthy adults. The average temperature for that group is 98.2 degrees Fahrenheit.2.6E2.7ERead Case Study 1.5 (p. 4) about prayer and blood pressure. What was the sample size for the observational study conducted by the National Institutes of Health? Describe the observational units in this study. Describe two variables that the researchers related to each other in Case Study 1.5. Explain whether you think the researchers treated the observed data as sample data or as population data.2.9ERead Case Study 1.6 (p.5) about aspirin and heart attack rates. What two variables are measured on each individual in Case Study 1.6? Describe the observational units in this study. What was the sample size for the study? Explain whether you think the researchers treated the observed data as sample data or as population data.2.11EFor each of the following characteristics of an individual, indicate whether the variable is categorical or quantitative. Length of forearm from elbow to wrist (in centimeters). Whether or not the person has ever been the victim of a clime. Number of songs on his or her digital music player. Feeling about own weight (overweight, about right, underweight).2.13E2.14E2.15E2.16E2.17EFor each pair of variables, specify which variable is the explanatory variable and which is the response variable in the relationship between them. Amount a person walks or runs per day and performance on a test of lung function. Feeling about importance of religion and age of respondent.2.19E2.20E2.21E2.22E2.23E2.24E2.25E2.26ETable 2.1 (P. 20) summarized frequency of seatbelt use while driving for twelfth-grade participants in the 2003 Youth Risk Behavior Surveillance System (YRBSS) survey. In 2001, YRBSS survey students were asked the same question. For the 2001 survey, a summary of responses given by 2530 students in the twelfth grade who said that they drive follows. a. What percent of the twelfth-grade students who drive said that they always wear a seatbelt when driving? b. What percent of the twelfth-grade students who drive said that they do not always wear a seatbelt when they drive? c. Find the percentage in each of the five response categories. d. Draw a bar graph of the percentages found in part (c).2.28E2.29ERefer to Exercise 2.27. Students also were asked what grades they usually get in school. For twelfth-grade students who responded to this question and the question about how often they wear seatbelts when driving, a summary of frequency counts for combinations of responses to the two questions is as follows: a. The total number of students in the table is 2470. What percentage of these 2470 students said that they usually get A’s and B’s in school? b. What percentage of the 1700 students who said that they usually get A’s and B’s said that they always wear a seat belt when driving? c. What percentage of the 657 students who said that they usually get C’s said that they always wear a seatbelt when driving? d. What percentage of the 113 students who said that they usually get D’s and F’s said that they always wear a seat belt when driving?2.31E2.32E2.33E2.34E2.35E2.36EThis is the same as Exercise ¡.1. The five-number summaries of the fastest speed ever driven data given in Case Study 1.1 (page 2) were as follows: Give a numerical value for each of the following: a. The fastest speed driven by anyone in the class. b. The slowest of the ‘fastest speeds” driven by a male. c. The speed for which one-fourth of the women had driven at that speed or faster. d. The proportion of females who had driven 89 mph or faster. e. The number of females who had driven 89 mph or faster.2.38E2.39E2.40E2.41E2.42E2.43EHand et al. (1994, p. 148) provide data on the number of words in each of 600 randomly selected sentences from the book Shorter History of England by G. K. Chesterton. They summarized the data as follows: Create a histogram for the number of words in the 600 randomly selected sentences. Provide a summary of the dataset based on your histogram. Explain why you could not create a stem-and-leaf plot for this dataset. Count the number of words in the first 20 sentences in Chapter 1 of this book (not including headings), and create a histogram of sentence lengths. Compare the sentence lengths to those in the Shorter History of England.The following stem-and-leaf plot is for the mean August temperatures (Fahrenheit) in 20 U.S. cities. The stem” (row label) gives the first digit of a temperature, while the “leaf” gives the second digit (Data source: temperature dataset on the companion website). a. Describe the shape of the dataset. Is it highly skewed or is it roughly symmetric? b. What is the highest temperature in the dataset? c. What is the lowest temperature in the dataset? d. What percent of the 20 cities have a mean August temperature in the 80s?2.46E2.47E2.48E2.49E2.50E2.51E2.52E2.53E2.54E2.55E2.56E2.57E2.58E2.59E2.60E2.61EStudents in a statistics class wrote as many letters of the alphabet as they could in 15 seconds using their non dominant hand. The figure for this exercise is a box plot that compares the number of letters written by males and females in the sample (Data source: letters dataset on the companion website). a. What is the median number of letters written by females? b. What is the median for males? c. Explain whether the interquartile range is larger for males or for females. d. Find the value of the range for males. e. Find the value of the range for females.2.63EThe following cholesterol levels for n = 20 individuals were given in Exercise 2.48: 196 212 200 242 206 178 184 198 160 182 198 182 222 198 188 166 204 178 164 230 a. Create a 5-number summary for these data. b. Draw a boxplot of the cholesterol levels.The weights (in pounds) for nine men on the Cambridge crew team were as follows (The Independent, March 31, 1992; also Hand et al., 1994, p. 337): 188.5, 183.0, 194.5, 185.0, 214.0, 203.5, 186.0, 178.5, 109.0 The nine men are comprised of eight rowers and a coxswain, a person who does not row but gives orders to the rowers about the rowing tempo. Find a five-number summary for these data. Identify whether or not any data points would qualify to be marked as an outlier on a boxplot. If there are outliers, specify the values. Which individual do you think is the coxswain?A set of eight systolic blood pressures follows: 110, 123, 132, 150, 127, 118, 102, 122 Find the median value for the dataset. Find the values of the lower and upper quartiles. Find the value of the interquartile range (IQR). Identify any outliers in the dataset. Use the criterion that a value is an outlier if it is either more than 1.5IQR above Q3 or more than 1.5IQR below Q1 Draw a boxplot of the dataset.2.67E2.68E2.69E2.70E2.71E2.72E2.73EThe football team at the school of one of the authors won 4 of 11 games it played during the 2004 college football season. Point differences between teams in the 11 games were +38, -14, -24, -13, -9, -7, -2, -11, -7, +4, +24 A positive difference indicates that the author’s school won the game, and a negative difference indicates that the author’s school lost. Find the value of the mean point difference and the value of the median point difference for the 11 games. Explain which of the two summary values found in part (a) is a better summary of the team’s season.2.75E2.76E2.77E2.78E2.79E2.80E2.81E2.82E2.83E2.84E2.85E2.86E2.87ESuppose that the distribution of speeds at an interstate highway location is bell-shaped with a mean of 71 mph and a standard deviation of 5 mph. Use the Empirical Rule to complete each sentence: About 68% of vehicles at this location travel between ____ and mph. About 95% of vehicles at this location travel between____ and mph. About 99.7% of vehicles at this location travel between ____ and mph.2.89E2.90E2.91E2.92E2.93EThe data for Exercise 2.66 was this set of systolic blood pressures: 110, 123,132,150, 127, 118, 102, 122 a. Find the mean and standard deviation for these data. b. What is the variance for these data?2.95EIf you learn that your score on an exam was 80 and the mean was 70, would you be more satisfied if the standard deviation was 5 or if it was 15? Explain.The scores on the final exam in a course with a large number of students have approximately a bell-shaped distribution. The mean score was 70, the highest score was 98, and the lowest score was 41. a. Find the value of the range for the exam scores. b. Refer to part (a). Use the value of the range to estimate the value of the standard deviation.2.98E2.99E2.100EHead circumferences of adult males have a bell-shaped distribution with a mean of 56cm and a standard deviation of 2 cm. Explain whether or not it would be unusual for an adult male to have a 52-cm head circumference. Explain whether or not it would be unusual for an adult male to have a 62-cm head circumference.2.102ESuppose verbal SAT scores for students admitted to a university are bell-shaped with a mean of 540 and a standard deviation of 50. Draw a picture of the distribution of these verbal SAT scores, indicating the cutoff points for the middle 68%, 95%, and 99.7% of the scores. What is the variance of verbal SAT scores for students admitted to the university?2.104E2.105E2.106E2.107ERemember that a resistant statistic is a numerical summary whose value is not unduly influenced by an outlier of any magnitude. Is the standard deviation a resistant statistic? Justify your answer by giving an example of a small dataset, and then adding a very large outlier and noting how the standard deviation is affected.2.109E2.110E2.111E2.112E2.113E2.114E2.115E2.116E2.117E2.118E2.119E2.120E2.121E2.122EThe data for 103 women’s right handspans are shown in Figures 2.7 to 2.9 (P. 29), and a five-number summary is given in Example 2.5 (p. 25). Examine Figures 2.7 to 2.9 and comment on whether or not the Empirical Rule should hold. The mean and standard deviation for these measurements are 20.0 cm and 1.8 cm, respectively. Determine whether or not the range of the data (found from the five number summary) is about what would be expected using the Empirical Rule.2.124E2.125E2.126E2.127E2.128E2.129E2.130E2.131E2.132E2.133E2.134E2.135EExplain why women’s heights are likely to have a bell shape but their ages at marriage do not.2.137EUse the pennstate 1 dataset on the companion website for this exercise. Draw a histogram of the height variable. What is the shape of this histogram? why do you think it is not a bell shape? Draw a boxplot of the height variable. Which graph, the histogram or the boxplot, is more information about this dataset? Briefly explain.For each of the following pairs of variables, is there likely to be a positive association, a negative association, or no association? Briefly explain your reasoning. Amount of alcohol consumed and performance on a test of coordination, where a high score represents better coordination. Height and grade point average for college students. Weight of a car and average number of miles it can go on a gallon of gas.For each of the following pairs of variables, is there likely to be a positive association, a negative association, or no association? Briefly explain your reasoning. Miles of running per week and time for a 5-kilometer run. Forearm length and foot length. Grade level and height for children in grades 1 through 10.The figure for this exercise is a scatter plot of y average math SAT score in 1998 versus x = percent of graduating seniors who took the test that year for the 50 states and the District of Columbia. The data are from the sats98 dataset on the companion website. Does the plot show a positive association, a negative association, or no association between the two variables? Explain. Explain whether you think the pattern of the plot is linear or curvilinear. About what was the highest average math SAT for the 50 States and Districts of Columbia? Approximately, what percent of graduates took the test in that state? About what was the lowest average math SAT for the 50 states and District of Columbia? Approximately what percent of graduates took the test in that state?3.4E3.5E3.6E3.7E3.8EThe data in the following table are the geographic latitudes and the average August and January temperatures (Fahrenheit) for 20 cities in the United States. The cities are listed in geographic order from south to north. (These data are part of the temperature dataset on the companion website.) Draw a scatterplot of = August temperature versus x = latitude Is the pattern linear or curvilinear? What is the direction of the association? Are there any cities that appear to be outliers because they don’t fit the pattern of the rest of the data? If so, which city or cities are they?Refer to the latitude and temperature data in the table presented in Exercise 3.9, which also appear in the temperature dataset on the companion website. Draw a scatterplot of y = January temperature versus x = latitude. Is the pattern linear or curved? Is the direction of the association positive or negative? Is this direction what you would expect for these data? Explain. Are there any cities that appear to be outliers because they don’t fit the pattern of the rest of the data? If so, which city or cities are they?3.11EThe following table shows sex, height (inches), and mid parent height (inches) for a sample of 18 college students. The variable mid-parent height is the average of mother’s height and father’s height. (These data are in the dataset UCDchap3 on the companion website; they are sampled from the larger dataset UCDavis2 In the relationship between height and mid-parent height, which variable is the response variable (y) and which is the explanatory variable (x)? Draw a scatter plot of the data for the y and x variables defined in part (a). Use different symbols for males and females. Briefly interpret the scatter plot. Does the association appear to be linear? That are the differences between the males and females? Which points, ii any, are outliers? Calculate the difference between height and mid-parent height for each student, and draw a scatter plot of y difference versus x mid-parent height. Use different symbols for males and females. What does this graph reveal about the connection between height and mid- parent height?3.13ERefer to Exercise 3.13 in which a regression equation is given that relates average weight and height for men in the 18- to 29-year-old age group. Suppose a man in this age group is 72 inches tall. Use the regression equation given in the previous exercise to predict the weight of this man. Suppose this man, who is 72 inches tall, weighs 190 pounds. Calculate the residual (prediction error) for this individual.3.15E3.16EThe equation for converting a temperature from x = degreesCelsius toy = degrees Fahrenheit is y=32+1.8x . Does thisequation describe a statistical relationship or a deterministicrelationship? Briefly explain your answer.The average August temperatures (y) and geographic latitudes (x) of 20 cities in the United States were given in the table for Exercise 3.9. (The data are part of the temperature dataset on the companion website.) The regression equation for these data is y=113.61.01x What is the slope of the line? Interpret the slope in terms of how the mean August temperature is affected by a change in latitude. Estimate the mean August temperature for a city with latitude of 32.A regression equation for y = handspan (cm) and x =height (inches) was discussed in Section 3.2. If the roles of the variables are reversed and only women are considered, the regression equation is Average height = 51.1 ÷ 0.7 (Handspan). Interpret the slope of 0.7 in terms of how height changes as handspan increases. What is the estimated average height of women with a handspan of 20 cm? Molly has a handspan of 20 cm and is 66.5 inches tall. What is the prediction error (residual) for Molly?Imagine a regression line that relates y average systolic blood pressure to x = age. The average blood pressure for people 30 years old is 120. While for those 50 years old the average is 130. What is the slope of the regression line? What is the estimated average systolic blood pressure for people who are 34 years old?3.21EThe figure for Exercise 3.8 is a scatterplot of pulse rate after marching in place for 1 minute (y) versus resting pulse rate measured before marching (x) for n = 63 individuals. (The data are in the pulse march dataset on the companion website.) The regression equation for these data is Pulse after marching = 17.8 + 0.894 (Resting pulse) What is the slope of this equation? Write a sentence that interprets this slope in the context of this situation. Predict the pulse rate after marching for somebody with a resting pulse rate of 50 beats per minute. Predict the pulse rate after marching for somebody with a resting pulse rate of 90 beats per minute. Use the results of parts (b) and (c) to draw the regression line. Clearly label the axes of your graph.Refer to Exercise 3.22. Predict the pulse rate after marching for somebody with a resting pulse rate of 70. Suppose the pulse rate after marching is 76 for somebody whose resting pulse rate is 70. What is the residual (prediction error) for this individual?The average January temperatures (y) and geographic latitudes (x) of 20 cities in the United States were given in the table for Exercise 3.9. (The data are part of the temperature dataset on the companion website.) The regression equation for these data is y=1262.34 What is the slope of the line? Interpret the slope in terms of how mean January temperature is related to change in latitude. Pittsburgh, Pennsylvania, has a latitude of 40, and Boston, Massachusetts has a latitude of 42. Use the slope to predict the difference in expected average January temperatures for these two cities Compare your answer to the actual difference in average January temperature for these two cities using the data shown in the table for Exercise 3.9. Predict the average January temperature for a city with latitude 33. Refer to part (c). Identify the two cities in the table that have latitude of 33 and compute the residual (prediction error) for each of these cities. Discuss the meaning of these two residuals in the context of this example, identifying whether each city is warmer or cooler than predicted.3.25E3.26E3.27ERemember that r2 can be expressed as a proportion or as apercent. (When written as a percent, the percent sign willalways be included.) Explain which of the following not a value could be for r2 : 0, —0.25, 0.3, 10, 1.7, 25%, —50%, ÷ 200%. Refer to the values in part (a). Which one of the legitimate values for r2 represents the strongest relationshipbetween x and y?3.29E3.30E3.31E3.32E3.33EExplain how two variables can have a perfect curved relationship yet have zero correlation. Draw a picture of a set of data meeting those criteria.3.35E3.36EThe figure for this exercise (below) shows four graphs. Assume that all four graphs have the same numerical scales for the two axes. Which graph shows the strongest relationship between the two variables? Which graph shows the weakest? Graph 1 Graph 2Refer to the figure for the previous exercises. In scrambled order, correlation values for these four graphs are —0.9, 0, +0.3, and +0.6. Match these correlation values to the graphs.3.39E3.40E3.41E3.42E3.43EThe correlation between latitude and average August temperature (in degrees Fahrenheit) is -0.78 for the 20 cities shown in the table for Exercise 3.9. (The data also are in the dataset temperature on the companion website.) Calculate r2 and write a sentence that interprets it in the context of this situation. If temperature were to be converted to Centigrade (without rounding off) what would be the value of the correlation between latitude and temperature?3.45E3.46EIn a regression analysis, the total sum of squares (SSTO) is 800, and the error sum of squares (SSE) is 200. That is the value for r2 ?3.48ESuppose you know that the slope of a regression line is B1 = +3.5. Based on this value, explain what you know and do not know about the strength and direction of the relationship between the two variables.3.50E3.51E3.53E3.54ERefer back to Exercise 3.7 about stopping distance and vehicle speed. The least squares line for these data is Average distance = -44.2 + 5.7 (Speed) Use this equation to estimate the average stopping distance when the speed is 80 miles per hour Do you think this is an accurate estimate? Explain. Draw a scatterplot of the data, as instructed in Exercise 3.7(b). Use the scatterplot to estimate the average stopping distance for a speed of 80 mph. Do you think the data on stopping distance and vehicle speed shown in Exercise 3.7 describe the relationship between these two variables for all situations? What are some other variables that should be considered when the relationship between stopping distance and vehicle speed is analyzed?3.56E3.57E3.58E3.59E3.60E3.61E3.62E3.63E3.64E3.65E3.66E3.67E3.68E3.69E3.70E3.71EGiven tickets for traffic violations than drivers of any other car color. Does this mean that if you drive a red car rather than a car of some other color, it will cause you to get more tickets for traffic violations? Explain.3.73E3.74E3.75E3.76E3.77E3.78E3.79EThe heights (inches) and foot lengths (cm) of 33 college men are shown in the following table. (These data are in the dataset height foot on the companion website.) In the relationship between height and mid-parent height, which variable is the response variable (y) and which is the explanatory variable (x)? Draw a scatterplot of the data for the y and x variables defined in part (a). Use different symbols for males and females. Briefly interpret the scatterplot. Does the association appear to be linear? That are the differences between the males and females? Which points, ii any, are outliers? Calculate the difference between height and mid-parent height for each student, and draw a scatterplot of y difference versus x mid-parent height. Use different symbols for males and females. What does this graph reveal about the connection between height and mid- parent height?3.81EThe winning time in the Olympic men’s 500-meter speed skating race over the years 1924 to 2006 can be described by the following regression equation: Winning time = 272.63 — 0.1184 (Year) Note: Beginning with the 1998 Olympics each competitor skated twice and the average of the two times defined the winner. In this analysis the data used for the relevant years is the average of the two times for the winner (Source: http://www.infoplease.com/ipsa/A0758122.html). Is the correlation between winning time and year positive or negative? Explain how you know, and explain what that means in the context of this situation. In 2010, the actual winning time for the gold medal was 34.91 seconds. Use the regression equation to predict the winning time for 2010, and compare the prediction to what actually happened. Explain what the slope of —0.1184 indicates in terms of how winning times change from one set of Olympic games to the next. Olympic games occur every 4 years. Why should we not use this regression equation to predict the winning time for the men’s 500-meter speed skating race in the 2080 Winter Olympics?3.83E3.84E3.86E3.87E3.88E3.89EUse the dataset ceodata0t on the companion website for this exercise, which gives the ages (Age) and salaries (Salary) for the 50 highest-paid CEOs on the Fortune 500 list of top companies in the United States (Data source: http://www.forbes. com/lists/2009/12/best-boss 09_CEO-Compensation_CompTotDisp.html). In the relationship between age and salary, which is the response variable and which is the explanatory variable? Plot Salary versus Age. Are there any obvious outliers in the plot? Use your plot from part (b) to discuss whether linear regression is appropriate for predicting CEO salaries from age for the top Fortune 500 companies.3.91E3.92E3.93E3.94E3.95E3.96E3.97E3.98E3.99E3.100E4.1E4.2EEach fall, auditions for the band and orchestra are held at a large university. Last fall, the numbers of males and females in each class who auditioned were as follows: a. Calculate the row percentage for freshman females and explain what it means. h. Calculate the column percentage for freshman females and explain what it means. c. Which class had the highest percentage of female applicants? Support your answer with numbers. d. Which sex had a higher percentage of sophomore applicants? Support your answer with numbers.4.4EFor each pair of variables, indicate whether or not a two-way table would be appropriate for summarizing the relationship. In each case, briefly explain why or why not. a. Political party (Republican, Democrat, etc.) and opinion about a new gun control law. b. Weight (pounds) and height (inches).For each pair of variables, indicate whether or not a twoway table would be appropriate for summarizing the relationship. In each case, briefly explain why or why not. Age group (under 20, 21-29, etc.) and rating of a song on 1 to 5 scale (1=hateit,5=loveit). Gender and opinion about capital punishment. Head circumference (centimeters) and gender.Suppose a study on the relationship between gender and political party included 200 men and 200 women and found 180 Democrats and 220 Republicans. Is that information sufficient for you to construct a contingency table for the study? If so, construct the table. If not, explain why not.4.8E4.9EDo grumpy old men have a greater risk of having coronary heart disease than men who aren’t so grumpy? Harvard Medical School researchers examined this question in a prospective observational study reported in the November 1994 issue of Circulation (Kawachi et al., 1994). For 7 years, the researchers studied men between the ages of 46 and 90. All study participants completed a survey of anger symptoms at the beginning of the study period. Among 199 men who had no anger symptoms, there were 8 cases of coronary heart disease. Among 559 men who had the most anger symptoms, there were 59 cases of coronary heart disease. a. Construct a contingency table for the relationship between degree of anger and the incidence of heart disease. b. Among those with no anger symptoms, what percentage had coronary heart disease? c. Among those with the mostanger symptoms, what percentage had coronary heart disease? d. Draw a bar graph of these data. Based on this graph, does there appear to be an association between anger and the risk of coronary heart disease? Explain.4.11E4.12E4.13E4.14E4.15E4.16EFor Exercise 4.17 and 4.18: A study is done to compare side effects for two different drugs used to treat a medical condition. One hundred people are given each drug. Results are as shwon in the following table: 4. 17 For the headache side effect, compute each of the following. a. The risk of experiencing a headache for each drug (separately). b. The relative risk of a headache for Drug 1 compared to Drug 2. c. The percent increase in the risk of a headache for Drug 1 compared to Drug 2. d. The odds ratio for comparing the odds of a headache for Drug 1 compared to Drug 2.For Exercise 4.17 and 4.18: A study is done to compare side effects for two different drugs used to treat a medical condition. One hundred people are given each drug. Results are as shwon in the following table: 4.18 For nausea, compute each of the following. a. The risk of experiencing nausea for each drug (separately). b. The relative risk of nausea for Drug 1 compared to Drug 2. c. The percent increase in the risk of nausea for Drug 1 compared to Drug 2. d. The odds ratio for comparing the odds of nausea for Drug 1 compared to Drug 2.4.19Ea. For a relative risk of 2.1, what is the percent increase in risk? b. For a percent increase in risk of 40%, what is the relative risk?