Bartleby Sitemap - Textbook Solutions

All Textbook Solutions for Understanding Basic Statistics

Terminology If a numerical measure describes an aspect of a sample, is it a statistic or a parameter?Terminology If a variable describes an individual by placing the individual in a category or group, is the variable quantitative or qualitative?3CRTerminology If it makes sense to say that one data measurement in a data set is twice that of another measurement in the set. whatis the highest level of measurement for the data: nominal, ordinal, interval, or ratio?Terminology Consider a sample of size n. If every sample of size n has equal chance of being selected, what is the type of sample: stratified, systematic, simple random, or cluster?Terminology If a treatment is applied to subjects or objects in a study in order to observe a possible change in the variable of interest, is the study an observational study or an experiment?Critical Thinking Sudoku is a puzzle consisting of squares arranged in 9 rows and 9 columns. The 81 squares are further divided into nine 33 square boxes. The object is to fill in the squares with numerals 1 through 9 so that each column, row, and box contains all nine numbers. However, there is a requirement that each number appear only once in any row. column, or box. Bach puzzle already has numbers in some of the squares. Would it be appropriate to use a random-number table to select a digit for each blank square? Explain.Critical Thinking Alisha wants to Jo a statistical study to determine how long it takes people to complete a Sudoku puzzle (see Problem 7 for a description of the puzzle). Her plan is as follows: Download 10 different puzzles from the Internet. Find 10 friends willing to participate. Ask each friend to complete one of the puzzles and time him- or herself. Gather the completion times from each friend. Describe some of the problems with Alishas plan for the study. (Note: Puzzles differ in difficulty, ranging from beginner to very difficult.) Are the results from Alisha's study anecdotal, or do they apply to the general population?Statistical Literacy You are conducting a study of students doing work-study jobs on your campus. Among the questions on the survey instrument are A. How many hours are you scheduled to work each week? Answer to the nearest hour. B. How applicable is this work experience to your future employment goals? Respond using the following scale: 1 = not at all. 2 = somewhat, 3 = very. (a) Suppose you take random samples from the following groups: freshmen, sophomores, juniors, and seniors What kind of sampling technique are you using (simple random, stratified, systematic, cluster, multistage, convenience)? (b) Describe the individuals of this study. (c) What is the variable for question A? Classify the variable as qualitative or quantitative. What is the level of the measurement? (d) What is the variable for question B? Classify the variable as qualitative or quantitative. What is the level of the measurement? (e) Is the proportion of responses 3 = very" to question B a statistic or a parameter? (f) Suppose only 40% of the students you selected for the sample respond. What is the nonresponse rate? Do you think the nonresponse rate might introduce bias into the study? Explain. (g) Would it be appropriate to generalize the results of your study to all work-study students in the nation? Explain.Radio Talk Show: Sample Bias A radio talk show host asked listeners to respond cither yes or no to the question. Is the candidate who spends the most on a campaign the most likely to win?" Fifteen people called in. and nine said yes. What is the implied population? What is the variable? Can you detect any bias in the selection of the sample?11CRGeneral: Type of Sampling Categorize the type of sampling (simple random, stratified, systematic, cluster, or convenience) used in each of the following situations. (a) To conduct a preelection opinion poll on a proposed amendment to the state constitution, a random sample of 10 telephone prefixes (first three digits of the phone number) was selected, and all households from the phone prefixes selected were called. (b) To conduct a study on depression among the elderly, a sample of 30 patients in one nursing home was used. (c) To maintain quality control in a brewery, every 20th bottle of beer coming off the production line was opened and tested. (d) Subscribers to a new smartphone app that streams songs were assigned numbers. Then a sample of 30 subscribers was selected by using a random-number table. The subscribers in the sample were invited to rate the process for selecting the songs in the playlist. (c) To judge the appeal of a proposed television sitcom, a random sample of 10 people from each of three different age categories was selected and those chosen were asked to rate a pilot show.General: Gathering Data Which technique fur gathering data (observational study or experiment) do you think was used in the following studies? Explain. (a) The U.S. Census Bureau tracks population age. In 1900. the percentage of the population that was 19 years old or younger was 44.4%. In 1930. the percentage was 38.8%; in 1970, the percentage was 37.9%; and in 2000. the percentage in that age group was down to 28.5% (Reference: The First Measured Century. T. Caplow, L, Hicks, and B. J. Wattenherg). (b) After receiving the same lessons, a class of 100 students was randomly divided into two groups of 50 each. One group was given a multiple-choice exam covering the material in the lessons. The other group was given an essay exam. The average test scores for the two groups were then compared.General: Experiment How would you use a completely randomized experiment in each of the following settings? Is a placebo being used or not? Be specific and give details. (a) A charitable nonprofit organization wants to test two methods of fund-raising. From a list of 1000 past donors, half will be sent literature about the successful activities of the charity and asked to make another donation. The other 500 donors will be contacted by phone and asked to make another donation. The percentage of people from each group who make a new donation will be compared. (b) A tooth-whitening gel is to be tested for effectiveness. A group of 85 adults have volunteered to participate in the study. Of these. 43 are to be given a gel that contains the tooth-whitening chemicals. The remaining 42 are to be given a similar-looking package of gel that does not contain the tooth-whitening chemicals. A standard method will be used to evaluate the whiteness of teeth for all participants. Then the results for the two groups will be compared. How could this experiment he designed to be double-blind? (c) Consider the experiment described in part (a). Describe how you would use a randomized block experiment with blocks based on age. Use three blocks: donors younger than 30 years old. donors 30 to 59 years old. donors 60 and older.Student Life: Data Collection Project Make a statistical profile of your own statistics class. Items of interest might he the following: (a) Height, age. gender, pulse, number of siblings, marital status (b) Number of college credit hours completed (as of beginning of term): grade point average (c) Major; number of credit hours enrolled in this term (d) Number of scheduled work hours per week (e) Distance from residence to first class; time it takes to travel from residence to first class (f) Year. make, and color of car usually driven What directions would you give to people answering these questions? For instance, how accurate should the measurements be? Should age he recorded as of last birthday?Form Problem: Fireflies Suppose you air conducting a study to compare firefly populations exposed to normal daylight/darkness conditions with firefly populations exposed to continuous light (24 hours a day). You set up two firefly colonics in a laboratory environment. The two colonics are identical except that one colony is exposed to normal daylight/darkness conditions and the other is exposed to continuous light. Each colony is populated with the same number of mature fireflies. After 72 hours, you count the number of living fireflies in each colony. (a) Is this an experiment or an observation study? Explain. (b) Is there a control group? Is there a treatment group? (c) What is the variable in this study? (d) What is the level of measurement (nominal, interval, ordinal, or ratio) of the variable?Use a random-number table or random-number generator to simulate tossing a fair coin 10 times. Generate 20 such simulations of 10 coin tosses. Compare the simulations. Are there any strings of 10 heads? Of 4 heads? Does it seem that in most of the simulations, half the outcomes are heads? Half are tails? In Chapter 6, we will study the probabilities of getting from 0 to 10 heads in such a simulation.Use a random-number table or random-number generator to generate a random sample of 30 distinct values from the set of integers 1 to 100. Instructions for doing this using the TI-84Plus/TI-83P1us/TI-n spire (with TI-84Plus keypad). Excel 2013, Minitab, Minitab Express, or SPSS are given in Using Technology at the end of this chapter. Generate five such samples. How many of the samples include the number 1? The number 100? Comment about the differences among the samples. How well do the samples seem to represent the numbers between 1 and 100?What does it mean to say that we are going to use a sample to draw an inference about a population? Why is a random sample so important for this process? If we wanted a random sample of students in the cafeteria, why couldn't we just choose the students who order Diet Pepsi with their lunch? Comment on the statement A random sample is like a miniature population, whereas samples that are not random are likely to be biased." Why would the students who order Diet Pepsi with lunch not be a random sample of students in the cafeteria?In your own words, explain the differences among the following sampling techniques: simple random sample, stratified sample, systematic sample, cluster sample, multistage sample, and convenience sample. Describe situations in which each type might be useful.Simulate the results of tossing a fair die 18 times. Repeat the simulation. Are the results the same? Did you expect them to be the same? Why or why not? Do there appear to be equal numbers of outcomes 1 through 6 in each simulation? In Chapter 5, we will encounter the law of large numbers, which tells us that we would expect equal numbers of outcomes only when the simulation is very large.2UTAStatistical Literacy In a statistical study, what is the difference between an individual and a variable?Statistical Literacy Are data at the nominal level of measurement quantitative or qualitative?Statistical Literacy What is the difference between a parameter and a statistic?Statistical Literacy For a set population, does a parameter ever change? If there are three different samples of the same size from a set population, is it possible to get three different values for the same statistic?Critical Thinking Numbers are often assigned to data that are categorical in nature. (a) Consider these number assignments for category items describing electronic ways of expressing personal opinions: 1 = Twitter; 2 = e-mail; 3 = text message; 4 = Facebook; 5 = blog Are these numerical assignments at the ordinal data level or higher? Explain. (b) Consider these number assignments for category items describing usefulness of customer service: 1 = not helpful; 2 = somewhat helpful; 3 = very helpful; 4 = extremely helpful Are these numerical assignments at the ordinal data level? Explain. What about at the interval level or higher? Explain.Interpretation Lucy conducted a survey asking some of her friends to specify their favorite type of TV entertainment from the following list of choices: sitcom; reality; documentary; drama; cartoon; other Do Lucy's observations apply to all adults? Explain. From the description of the survey group, can we draw any conclusions regarding age of participants, gender of participants, or education level of participants?Marketing: Fast Food A national survey asked 1261 U.S. adult fast-food customers which meal (breakfast, lunch, dinner, snack) they ordered. (a) Identify the variable. (b) Is the variable quantitative or qualitative? (c) What is the implied population?Advertising: Auto Mileage What is the average miles per gallon (mpg) for all new hybrid small cars? Using Consumer Reports, a random sample of such vehicles gave an average of 35.7 mpg. (a) Identify the variable. (b) Is the variable quantitative or qualitative? (c) What is the implied population?Ecology: Wetlands Government agencies carefully monitor water quality and its effect on wetlands (Reference: Environmental Protection Agency Wetland Report EPA 832-R-93-005). Of particular concern is the concentration of nitrogen in water draining from fertilized lands. Too much nitrogen can kill fish and wildlife. Twenty-eight samples of water were taken at random from a lake. The nitrogen concentration (milligrams of nitrogen per liter of water) was determined for each sample. (a) Identify the variable. (b) Is the variable quantitative or qualitative? (c) What is the implied population?Archaeology: Ireland The archaeological site of Tara is more than 4000 years old. Tradition states that Tara was the seat of the high kings of Ireland. Because of its archaeological importance, Tara has received extensive study (Reference: Tara: An Archaeological Survey by Conor Newman, Royal Irish Academy, Dublin). Suppose an archaeologist wants to estimate the density of ferromagnetic artifacts in the Tara region. For this purpose, a random sample of 55 plots, each of size 100 square meters, is used. The number of ferromagnetic artifacts for each plot is determined. (a) Identify the variable. (b) Is the variable quantitative or qualitative? (c) What is the implied population?Student Life: Levels of Measurement Categorize these measurements associated with student life according to level: nominal, ordinal, interval, or ratio. (a) Length of time to complete an exam (b) Time of first class (c) Major field of study (d) Course evaluation scale: poor, acceptable, good (e) Score on last exam (based on 100 possible points) (f) Age of studentBusiness: Levels of Measurement Categorize these measurements associated with a robotics company according to level: nominal, ordinal, interval, or ratio. (a) Salesperson's performance: below average, average, above average (b) Price of companys stock (c) Names of new products (d) Temperature (F) in CEOs private office (e) Gross income for each of the past 5 years (f) Color of product packagingFishing: Levels of Measurement Categorize these measurements associated with fishing according to level: nominal, ordinal, interval, or ratio. (a) Species of fish caught: perch, bass, pike, trout (b) Cost of rod and reel (c) Time of return home (d) Guidebook rating of fishing area: poor. fair, good (e) Number of fish caught (f) Temperature of waterEducation: Teacher Evaluation If you were going to apply statistical methods to analyze teacher evaluations, which question form, A or B, would be better? Form A: In your own words, tell how this teacher compares with other teachers you have had. Form B: Use the following scale to rank your teacher as compared with other teachers you have had. 1 2 3 4 5 worst below average average above average bestCritical Thinking You are interested in the weights of backpacks students carry to class and decide to conduct a study using the backpacks carried by 30 students. (a) Give some instructions for weighing the backpacks. Include unit of measure, accuracy of measure, and type of scale. (b) Do you think each student asked will allow you to weigh his or her backpack? (c) Do you think telling students ahead of time that you are going to weigh their backpacks will make a difference in the weights?Statistical Literacy Explain the difference between a stratified sample and a cluster sample.Statistical Literacy Explain the difference between a simple random sample and a systematic sample.Statistical Literacy Marcie conducted a study of the cost of breakfast cereal. She recorded the costs of several boxes of cereal. However, she neglected to take into account the number of servings in each box. Someone told her not to worry because she just had some sampling error. Comment on that advice.Statistical Literacy A random sample of students who use the college recreation center were asked if they approved increasing student fees for all students in order to add a climbing wall to the recreation center. Describe the sample frame. Does the sample frame include all students enrolled in the college? Explain.Interpretation In a random sample of 50 students from a large university, all the students were between 18 and 20 years old. Can we conclude that the entire population of students at the university is between 18 and 20 years old? Explain.Interpretation A campus performance series features plays, music groups, dance troops, and stand-up comedy. The committee responsible for selecting the performance groups includes three students chosen at random from a pool of volunteers. This year, the 30 volunteers came from a variety of majors. However, the three students for the committee were all music majors. Does this fact indicate there was bias in the selection process and that the selection process was not random? Explain.Critical Thinking Greg took a random sample of size 100 from the population of current season ticket holders to State College mens basketball games. Then he took a random sample of size 100 from the population of current season ticket holders to State College womens basketball games. (a) What sampling technique (stratified, systematic, cluster, multistage, convenience, random) did Greg use to sample from the population of current season ticket holders to all State College basketball games played by either men or women? (b) Is it appropriate to pool the samples and claim to have a random sample of size 200 from the population of current season ticket holders to all State College home basketball games played by either men or women? Explain.Critical Thinking Consider the students in your statistics class as the population and suppose they are seated in four rows of 10 students each. To select a sample, you toss a coin. If it comes up heads, you use the 20 students sitting in the first two rows as your sample. If it comes up tails, you use the 20 students sitting in the last two rows as your sample. (a) Does every student have an equal chance of being selected for the sample? Explain. (b) Is it possible to include students sitting in row 3 with students sitting in row 2 in your sample? Is your sample a simple random sample? Explain. (c) Describe a process you could use to get a simple random sample of size 20 from a class of size 40.Critical Thinking Suppose you are assigned the number 1, and the other students in your statistics class call out consecutive numbers until each person in the class has his or her own number. Explain how you could get a random sample of four students from your statistics class. (a) Explain why the first four students walking into the classroom would not necessarily form a random sample. (b) Explain why four students coming in late would not necessarily form a random sample. (c) Explain why four students sitting in back row would not necessarily form a random sample. (d) Explain why the four tallest students would not necessarily form a random sample.Critical Thinking In each of the following situations, the sampling frame does not match the population, resulting in undercoverage. Give examples of population members that might have been omitted. (a) The population consists of all 250 students in your large statistics class. You plan to obtain a simple random sample of 30 students by using the sampling frame of students present next Monday. (b) The population consists of all 15-year-olds living in the attendance district of a local high school. You plan to obtain a simple random sample of 200 such residents by using the student roster of the high school as the sampling frame.Sampling: Random Use a random-number table to generate a list of 10 random numbers between 1 and 99. Explain your work.Sampling: Random Use a random-number table to generate a list of eight random number from 1 to 976. Explain your work.Sampling: Random Use a random-number table to generate a list of six random numbers from 1 to 8615. Explain plain your work.14PComputer Simulation: Roll of a Die A die is a cube with dots on each face. The faces have 1,2,3,4,5, or 6 dots. The table below is a computer simulation (from the software package Minitab) of the results of rolling a fair die 20 times. DATA DISPLAY ROW C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 1 5 2 2 2 5 3 2 3 1 4 2 3 2 4 5 4 5 3 5 3 4 (a) Assume that each number in the table corresponds to the number of dots on the upward face of the die. Is it appropriate that the same number appears more than once? Why? What is the outcome of the fourth roll? (b) If we simulate more rolls of the die. do you expect to get the same sequence of outcomes? Why or why not?16PEducation: Test Construction Professor Gill is designing a multiple-choice test. There are to be 10 questions. Each question is to have five choices for answers. The choices are to be designated by the letters a, b, c, d, and e. Professor Gill wishes to use a random-number table to determine which letter choice should correspond to the correct answer for a question. Using the number correspondence 1 for a, 2 for b, 3 for c, 4 for d, and 5 for e. use a random-number table to determine the letter choice for the correct answer for each of the 10 questions.Education: Test Construction Professor Gill uses true-false questions. She wishes to place 20 such questions on the next test. To decide whether to place a true statement or a false statement in each of the 20 questions, she uses a random-number table. She selects 20 digits from the table. An even digit tells her to use a true statement. An odd digit tells her to use a false statement. Use a random-number table to pick a sequence of 20 digits, and describe the corresponding sequence of 20 true-false questions. What would the test key for your sequence look like?Sampling Methods: Benefits Package An important part of employee compensation isa benefits package, which might include health insurance, life insurance. child care, vacation days, retirement plan, parental leave, bonuses, etc. Suppose you want to conduct a survey of benefits packages available in private businesses in Hawaii. You want a sample size of 100. Some sampling techniques are described below. Categorize each technique as simple random sample, stratified sample, systematic sample, cluster sample, or convenience sample. (a) Assign each business in the Island Business Directory a number, and then use a random-number table to select the businesses to be included in the sample. (b) Use postal ZIP Codes to divide the state into regions. Pick a random sample of 10 ZIP Code areas, and then include all the businesses in each selected ZIP Code area. (c) Send a team of five research assistants to Bishop Street in downtown Honolulu. Let each assistant select a block or building and interview an employee from each business found. Bach researcher can have the rest of the day off after getting responses from 20 different businesses. (d) Use the Island Business Directory. Number all the businesses. Select a starting place at random, and then use every 50th business listed until you have 100 businesses. (e) Group the businesses according to type: medical, shipping, retail, manufacturing, financial, construction, restaurant, hotel, tourism, other. Then select a random sample of 10 businesses from each business type.Sampling Methods: Health Care Modern Managed Hospitals (MMH) is a national for-profit chain of hospitals. Management want to survey patients discharged this past year to obtain patient satisfaction profiles. It wishes to use a sample of such patients. Several sampling techniques are described below. Categorize each technique as simple random sample, stratified sample, systematic sample, cluster sample, or convenience sample. (a) Obtain a list of patients discharged from all MMH facilities. Divide the patients according in length of hospital stay (2 days or less, 3-7 days, 8-14 days, more than 14 days). Draw simple random samples from each group. (b) Obtain lists of patients discharged from all MMH facilities. Number these patients, and then use a random-number table to obtain the sample. (c) Randomly select some MMH facilities from each of five geographic regions, and then include all the patients on the discharge lists of the selected hospitals. (d) At the beginning of the year, instruct each MMH facility to survey every 500th patient discharged. (e) Instruct each MMH facility to survey 10 discharged patients this week and send in the results.Statistical Literacy A study of college graduates involves three variables: income level, job satisfaction, and one-way commute times to work. List some ways the variables might be confounded.Statistical Literacy Consider a completely randomized experiment in which a control group is given a placebo for congestion relief and a treatment group is given a new drug lor congestion relief. Describe a double-blind procedure for this experiment and discuss some benefits of such a procedure.Critical Thinking A brief survey regarding opinions about recycling was carefully designed so that the wording of the questions would not influence the responses. Jill administered the survey at a farmers market. She approached adults and asked if they would fill out the survey, explaining that the results might he used to set trash collection and recycling policy in the city. She stood by silently while the form was filled out. Jill was wearing a green T-shirt with the slogan fight global warming. Are the respondents a random sample of people in the community? Are there any concerns that Jill might have influenced the respondents?Critical Thinking A randomized block design was used to study the amount of grants awarded to students at a large university. One block consisted of undergraduate students and the other block consisted of graduate students. Samples of size 50 were taken from each block. Could the combined sample of 60 be considered a simple random sample from the population of all students, undergraduate and all graduate, at the university? Explain.Interpretation Zane is examining two studies involving how different generations classify specified items as either luxuries or necessities. In the first study, the Echo generation is defined to be people ages 18 to 29. The second study defined the Echo generation to be people ages 20 to 31. Zane notices that the first study was conducted in 2006 while the second one was conducted in 2008. (a) Are the two studies inconsistent in their description of the Echo generation? (b) What are the birth years of the Echo generation?Interpretation Suppose you are looking at the 2006 results of how the Echo generation classified specified items as either luxuries or necessities. Do you expect the results to reflect how the Echo generation would classify items in 2020? Explain.Ecology: Gathering Data Which technique for gathering data (observational study or experiment) do you think was used in the following studies? (a) The Colorado Division of Wildlife netted and released 774 fish at Quincy Reservoir. There were 219 perch, 315 blue gill, 83 pike, and 157 rainbow trout. (b) The Colorado Division of Wildlife caught 41 bighorn sheep on Mt. Evans and gave each one an injection to present heart worm. A year later, 38 of these sheep did not have heartworm, while the other three did. (c) The Colorado Division of Wildlife imposed special fishing regulations on the Deckers section of the South Platte Riser. All trout under 15 inches had to be released. A study of trout before and after the regulation went into effect showed that the average length of a trout increased by 4.2 inches after the new regulation. (d) An ecology class used binoculars to watch 23 turtles at Lowell Ponds. It was found that 18 were box turtles and 5 were snapping turtles.General: Gathering Data Which technique for gathering data (sampling, experiment, simulation, or census) do you think was used in the following studies? (a) An analysis of a sample of 31,000 patients from New York hospitals suggests that the poor and the elderly sue for malpractice at one-fifth the rate of wealthier patients (Journal of the American Medical Association). (b) The effects of wind shear on airplanes during both landing and takeoff were studied by using complex computer programs that mimic actual flight. (c) A study of all league football scores attained through touchdowns and field goals was conducted by the National Football League to determine whether field goals account for more scoring events than touchdowns (USA Today). (d) An Australian study included 588 men and women who already had some precancerous skin lesions. Half got a skin cream containing a sunscreen with a sun protection factor of 17; half got an inactive cream. After 7 months, those using the sunscreen with the sun protection had fewer new precancerous skin lesions (NewEngland Journal of Medicine).General: Completely Randomized Experiment How wouldyou usea completely randomized experiment in each of the following setting? Is a placebo being used? Be specific and give details. (a) A veterinarian wants to test a strain of antibiotic on calves to determine their resistance to common infection. In a pasture are 22 newborn calves. There is enough vaccine for 10 calves. However, blood tests to determine resistance to infection can be done on all calves. (b) The Denver Police Department wants to improve its image with teenagers. A uniformed officer is sent to a school 1 day a week for 10 weeks. Each day the officer visits with students, eats lunch with students, attends pep rallies, and so on. There are 18 schools, hut the police department can visit only half of these schools this semester. A survey regarding how teenagers view police is sent to all 18 schools at the end of the semester. (c) A skin patch contains a new drug to help people quit smoking. A group of 75 cigarette smokers have volunteered as subjects to test the new skin patch. For 1 month. 40 of the volunteers receive skin patches with the new drug. The other volunteers receive skin patches with no drugs. At the end of 2 months, each subject is surveyed regarding his or her current smoking habits.Survey: Manipulation The NewYork Times did a special report on polling that was earned in papers across the nation. The article pointed out how readily the results of a survey can be manipulated. Some features that can influence the results of a poll include the following: the number of possible responses, the phrasing of the questions, the sampling techniques used (voluntary response or sample designed to be representative), the fact that words may mean different things to different people, the questions that precede the question of interest, and finally, the fact that respondents can offer opinions on issues they know nothing about. (a) Consider the expression "over the last few years." Do you think that this expression means the same time span to everyone? What would be a more precise phrase? (b) Consider this question: Do you think fines for running stop signs should be doubled?" Do you think the response would be different if the question. "Have you ever run a stop sign?" preceded the question about fines? (c) Consider this question: Do you watch too much television?" What do you think the responses would be if the only responses possible were yes or no? What do you think the responses would be if the possible responses were rarely." sometimes, or frequently"?Critical Thinking An agricultural study is comparing the harvest volume of two types of barley. The site for the experiment is bordered by a river. The field is divided into eight plots of approximately the same size. The experiment calls for the plots to be blocked into four plots per Mock. Then, two plots of each block will be randomly assigned to one of the two barley types. Two blocking schemes are shown below, with one Mock indicated by the white region and the other by the gray region. Which Mocking scheme. A or B. would be better? Explain.Terminology Consider the following graph types: histogram, relative-frequency graph, ogive. Match each type to the appropriate description: (i) Shows cumulative frequency (or percent of data) falling at or below each upper class boundary in a frequency table (ii) Shows number of data falling within each distinct class of a frequency table (iii) Shows the relative frequency (or percent) of all data tailing within each class of a frequency tableTerminology Whichtype(s) of data can be shown in a histogram: quantitative, qualitative (also known as category), or both?Terminology Which type(s) of data can be shown in a bar graph: quantitative, qualitative (also known as category). or both?Terminology Which graphical display shows each data value (or truncated data value) in order from smallest to largest: histogram, pie chart, stem-and-leaf?Terminology If a histogram is skewed left, more of the data falls which side: right or left?Terminology How are data plotted in a time-series graph: by frequency. in order from smallest to largest, or at regular intervals over time?Critical Thinking Consider these types of graphs: histogram, bar graph. Pareto chart, pie chart, stem-and-leaf display. (a) Which are suitable for qualitative data? (b) Which are suitable for quantitative data?Critical Thinking A consumer interest group is tracking the percentage of household income spent on gasoline over the past 30 years. Which graphical display would be more useful, a histogram or a time-series graph? Why?Critical Thinking Describe how data outliers might be revealedin histograms and stem-and-leaf plots.Expand Your Knowledge How are dotplots and stem-and-leaf displays similar? How are they different?Focus Problem: Fuel Economy Solve the focus problem at the beginning of this chapter.Criminal Justice: Prisoners The time plot in Figure 2-18 gives the number of state and federal prisoners per 100,000 population (Source: Statistical Abstract of the United States. 120th edition). (a) Estimate the number of prisoners per 100.000 people for 1980 and for 1997. (b) Interpretation During the period shown. there was increased prosecution of drug offenses, longer sentences for common crimes, and reduced access to parole. What does the time-series graph say about the prison population change per 100,000 people? (c) In 1997, the U.S. population was approximately 266,574,000 people. At the rate of 444 prisoners per 100,000 population. about how many prisoners were in the system? The projected U.S. population for the year 2020 is 323,724,000. If the rate of prisoners per 100,000 stays the same as in 1997, about how many prisoners do you expect will be in the system in 2020? To obtain the most recent information, visit the Census Bureau web site.IRS: Tax Returns Almost everyone files (or will someday file) a federal income tax return. A research poll for TurboTax. (a computer software package to aid in tax-return preparation) asked what aspect of filing a return people thought to be the most difficult. The results showed that 43 of the respondents said understanding the IRS jargon. 28 said knowing deductions. 10 said getting the right form. 8 said calculating the numbers, and 10 didn't know. Make a circle graph to display this information. Note: Percentages will not total 100% because of rounding.Law Enforcement: DUI Driving under the influence of alcohol (DUI) is a serious offense. The following data give the ages of a random sample of 50 drivers arrested while driving under the influence of alcohol. This distribution is based on the age distribution of DUI arrests given in the Statistical Abstract of the United States (112th edition). 46 16 41 26 22 33 30 22 36 34 63 21 26 18 27 24 31 38 26 55 31 47 27 43 35 22 64 40 58 20 49 37 53 25 29 32 23 49 39 40 24 56 30 51 21 45 27 34 47 35 (a) Make a stem-and-leaf display of the age distribution. (b) Make a frequency table with seven classes showing class limits, class boundaries, midpoints, frequencies, and relative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Identify the shape of the distribution. (f) Draw an ogive. (g) Interpretation Discuss how these data might be used to price auto insurance for different age groups.Agriculture: Apple Trees The following data represent trunk circumferences (in mm) for a random sample of 60 four-year-old apple trees at East Mailing Agriculture Research Station in England (Reference: S. C. Pearce. University of Kent at Canterbury). Note: These data are also available for download at the Companion sites for this text. 108 99 106 102 115 120 120 117 122 142 106 111 119 109 125 108 116 105 117 123 103 114 101 99 112 120 108 91 115 109 114 105 99 122 106 113 114 75 96 124 91 102 108 110 83 90 69 117 84 142 122 113 105 112 117 122 129 100 138 117 (a) Make a stem-and-leaf display of the tree circumference data. (b) Make a frequency table with seven classes showing class limits, class boundaries, midpoints, frequencies, and relative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Identify the shape of the distribution. (f) Draw an ogive. (g) Interpretation How are the low frequencies shown in the histogram reflected in the steepness of the lines in the ogive?Law: Corporation Lawsuits Many people say the civil justice system is overburdened. Many cases center on suits involving businesses. The following data are based on a Wall Street Journal report. Researchers conducted a study of lawsuits involving 1908 businesses ranked in the Fortune 1000 over a 20-year period. They found the following distribution of civil justice caseloads brought before the federal courts involving the businesses: Case Type Number of Filings (in thousands) Contracts 107 General torts (personal injury) 191 Asbestos liability 49 Other product liability 38 All other 21 Note: Contracts cases involve disputes over contracts between businesses. (a) Make a Pareto chart of the caseloads. Which type of cases occur most frequently? (b) Make a pie chart showing the percentage of cases of each type.Archaeology: Tree-King DataThe Sand Canyon Archaeological Project, edited by W. D. Lipe and published by Crow Canyon Archaeological Center, contains the stem-and-leaf diagram shown in Figure 2-19. The study uses tree ringsto accurately determine the year in which a tree was cut. The figure gives the tree-ring-cutting dates for samples of timbers found in the architectural units at Sand Canyon Pueblo. The text referring to the figure says. "The three-digit numbers in the left column represent centuries and decades a.d. The numbers to the right represent individual years, with each number derived from an individual sample. Thus. 124|2 2 2 represents three samples dated to a.d. 1242." Use Figure 2-19 and the verbal description to answer the following questions: (a) Which decade contained the most samples? (b) How many samples had a tree-ring-cutting date between 100 a.d. and 1239 a.d., inclusive? (c) What are the dates of the longest interval during which no tree-cutting samples occurred? What might this indicate about new construction or renovation of the pueblo structures during this period?Interpretation A Harm Poll surveyed 2085 U.S. adults regarding use of cell phones while driving. All the adults were asked their opinion regarding how dangerous it is for a driver to use a cell phone while driving. Graph (a) shows the percentage responding "very dangerous" by age group. Only the adults who drive and who have a cell phone were asked how often they talk on the cell phone while driving. Graph (b) shows the percentage responding "never" by age group. Cell Phone Use and Driving (a) What trend does this survey portray regarding age group and the opinion that using a cell phone while driving is very dangerous? (b) How does the behavior of never using a cell phone while driving compare to the opinion that using a cell phone while driving is dangerous? Do you think that some of the differences in the behavior (never use a cell phone while driving) and the opinion (using a cell phone while driving is very dangerous) can be attributed to the differences in the survey population? Explain.Examine Figure 2-20, Everyone Agrees: Slobs Make Worst Roommates. This is a clustered bar graph because two percentages are given for each response category; responses from men and responses from women. Comment about how the artistic rendition has slightly changed the format of a bar graph. Do the bars seem to have lengths that accurately reflect the relative percentages of the responses? In your own opinion, does the artistic rendition enhance or confuse the information? Explain. Which characteristic of worst roommates does the graphic seem to illustrate? Can this graph be considered a Pareto chart for men? For women? Why or why not? From the information given in the figure, do you think the survey just listed the four given annoying characteristics? Do you think a respondent could choose more than one characteristic? Explain your answer in terms of the percentages given and in terms of the explanation given in the graphic. Could this information also be displayed in one circle graph for men and another for women? Explain.Examine Figure 2-21, Global Teen Worries. How many countries were contained in the sample? The graph contains bars and a circle. Which bar is the longest? Which bar represents the greatest percentage? Is this a bar graph or not? If not, what changes would need to be made to put the information in a bar graph? Could the graph be made into a Pareto chart? Could it be made into a circle graph? Explain.In your own words, explain the differences among histograms. relative-frequency histograms, bar graphs, circle graphs, time-series graphs. Pareto charts, and stem-and-leaf displays. If you have nominal data, which graphic displays might be useful? What if you have ordinal, interval, or ratio data?What do we mean when we say a histogram is skewed to the left? to the right? What is a bimodal histogram? Discuss the following statement: A bimodal histogram usually results if we draw a sample from two populations at once." Suppose you took a sample of weights of college football players and with this sample you included weights of cheerleaders. Do you think a histogram made from the combined weights would be bimodal? Explain.Discuss the statement that stem-and-leaf displays are quick and easy to construct. How can we use a stem-and-leaf display to make the construction of a frequency table easier? How does a stem-and-leaf display help you spot extreme values quickly?The following tables show the first-round winning scores of the NCAA men's and women's basketball teams. TABLE 2-17 Men's Winning First-Round NCAA Tournament Scores 95 70 79 99 83 72 79 101 69 82 86 70 79 69 69 70 95 70 77 61 69 68 69 72 89 66 84 77 50 83 63 58 TABLE 2-18 Women's Winning First-Round NCAA Tournament Scores 80 68 51 80 83 75 77 100 96 68 89 80 67 84 76 70 98 81 79 89 98 83 72 100 101 83 66 76 77 84 71 77 Use the software or method of your choice to construct separate histograms for the men's and women's winning scores Try 5, 7, and 10 classes for each. Which number of classes seems to be the best choice? Why?The following tables show the first-round winning scores of the NCAA men's and women's basketball teams. TABLE 2-17 Men's Winning First-Round NCAA Tournament Scores 95 70 79 99 83 72 79 101 69 82 86 70 79 69 69 70 95 70 77 61 69 68 69 72 89 66 84 77 50 83 63 58 TABLE 2-18 Women's Winning First-Round NCAA Tournament Scores 80 68 51 80 83 75 77 100 96 68 89 80 67 84 76 70 98 81 79 89 98 83 72 100 101 83 66 76 77 84 71 77 Use the same class boundaries for histograms of men's and of women's scores. How do the scores for the two groups compare? What general shape do the histograms follow?The Following tables show the first-round winning scores of the NCAA men's and women's basketball teams. TABLE 2-17 Men's Winning First-Round NCAA Tournament Scores 95 70 79 99 83 72 79 101 69 82 86 70 79 69 69 70 95 70 77 61 69 68 69 72 89 66 84 77 50 83 63 58 TABLE 2-18 Women's Winning First-Round NCAA Tournament Scores 80 68 51 80 83 75 77 100 96 68 89 80 67 84 76 70 98 81 79 89 98 83 72 100 101 83 66 76 77 84 71 77 Use the software or method of your choice to make stem-and-leaf displays for each set of scores. If your software does not make stem-and-leaf displays. sort the data first and then make a back-to-back display by hand. Do there seem to be any extreme values in either set? How do the data sets compare?Statistical Literacy What is the difference between a class boundary and a class limit?Statistical Literacy A data set has values ranging from a low of 10 to a high of 52. What's wrong with using the class limits 10-19, 20-29, 30-39, 40-49 for a frequency table?Statistical Literacy A data set has values ranging from a low of 10 to a high of 50. What's wrong with using the class limits 10-20, 20-30, 30-40, 40-50 for a frequency table?Statistical Literacy A data set has value ranging from a low of 10 to a high of 50. The class width is to be 10. What's wrong with using the class limits 10-20, 21-31, 32-42, 43-53 for a frequency table with a class width of 10?Basic Computation: Class Limits A data set with whole numbers has a low value of 20 and a high value of 82. Find the class width and class limits for a frequency table with 7 classes.Basic Computation: Class limits A data set with whole numbers has a low value of 10 and a high value of 120. Find the class width and class limits for a frequency table with 5 classes.Interpretation You are manager of a specialty coffee shop and collect data throughout a full day regarding waiting time for customers from the time they enter the shop until the time they pick up their order. (a) What type of distribution do you think would be most desirable for the waiting times: skewed right, skewed left, mound-shaped symmetric? Explain. (b) What if the distribution for waiting times were bimodal? What might be some explanations?Critical Thinking A web site rated 100 colleges and ranked the colleges from 1 to 100, with a rank of 1 being the best. Each college was ranked, and there were no ties. If the ranks were displayed in a histogram, what would be the shape of the histogram: skewed, uniform, mound-shaped?Critical Thinking Look at the histogram in Figure 2-10(a), which shows mileage, in miles per gallon (mpg), for a random selection of older passenger cars (Reference: Consumer Reports). (a) Is the shape of the histogram essentially bimodal? (b) Jose looked at the raw data and discovered that the 54 data values included both the city and the highway mileages for 27 cars. He used the city mileages for the 27 cars to make the histogram in Figure 2-10(b). Using this information and Figure 2-10, parts (a) and (b), construct a histogram for the highway mileages of the same cars. Use class boundaries 16.5, 20.5, 24.5, 28.5, 32.5. 36.5, and 40.5.Critical Thinking The following data represent annual salaries, in thousands of dollars, for employees of a small company Notice that the data have been sorted in increasing order. 54 55 55 57 57 59 60 65 65 65 66 68 68 69 69 70 70 70 75 75 75 75 77 82 82 82 88 89 89 91 91 97 98 98 98 280 (a) Make a histogram using the class boundaries 53.5, 99.5, 145.5, 191.5, 237.5, 283.5. (b) Look at the last data value. Does it appear to be an outlier? Could this be the owner's salary? (c) Eliminate the high salary of 280 thousand dollars. Make a new histogram using the class boundaries 53.5, 62.5, 71.5, 80.5, 89.5, 98.5. Does this histogram reflect the salary distribution of most of the employees better than the histogram in part (a)?Interpretation Histograms of random sample data are often used as an indication of the shape of the underlying population distribution. The histograms on the next page are based on random sample of size 30.50, and 100 from the same population. (a) Using the midpoint labels of the three histograms, what would you say about the estimated range of the population data from smallest to largest? Does the balk of the data seem to be between 8and 12 in all three histograms? (b) The population distribution from which the samples were draws is symmetric and mound-shaped, with the top of the mound at10.95% of the data between 8 and 12, and 99.7% of data between 7 and 13. How well does each histogram reflect these characteristics?Interpretation The following histograms are based on different random samples of size 100 drawn form the same population. (a) Identify the midpoint of the class with the highest frequency in each of the three histograms (b) Using the class midpoint, what is the range of data shown in each histogram? (c) Based on your study of random sample in Chapter 1, is it surprising to see the variations in the samples as displayed in the histograms? The original population from which the samples were drawn is skewed right with a high frequency near 4. Do all three random samples seem to reflect these properties equally well?Interpretation The ogives shown are based on U.S Census data and show the average annual personal income per capita for each of the 50 states. The data are rounded to the nearest thousand dollars. (a) How were the percentages shown in graph (ii) computed? (b) How many states have average per capita income less than 37.5 thousand dollars? (c) How many states have average per capita income between 42.5 and 52.5 thousand dollars? (d) What percentage of the states have average per capita income more than 47.5 thousand dollars?Critical Thinking The following ogives come from different distributions of 50 whole numbers between 1 and 60. Labels on each point give the cumulative frequency and the cumulative percentage of data. (a) In which distribution does the most data fall below 20.5? (b) In which distribution does the most data fall below 40.5? (c) In which distribution does the amount of data below 20.5 most closely match that above 30.5? (d) Which distribution seems to be skewed right? Skewed left? Mound-shaped?For Problems 15-20. use the specified number of classes to do the following. (a) Find the class width (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies. relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative- frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped symmetric, binodal, skewed left, orskewed right. (f) Draw an ogive. (g) Interpretation Discuss some of the features about the data that the graphs reveal. Consider items such as dat range, location of the middle half of the data, unusual values, cutlers, etc. Sports: DogsledRacing How long does it take to finish the 1161-mile Iditarod Sled Dog Race from Anchorage to Nome. Alaska (see Viewpoint)? Finish times (to the nearest hour) for 57 dogsled teams are shown below. 261 271 236 244 279 296 284 299 288 288 247 256 338 360 341 333 261 266 247 296 313 311 307 307 209 306 277 283 304 305 288 290 288 289 297 299 332 330 309 328 307 328 245 291 295 298 306 315 310 318 318 320 333 321 323 324 327 Use five classes.For Problems 15-20, use the specified number of classes to do the following. (a) Find the class width. (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped symmetric, bimodal, skewed left, or skewed right. (f) Draw an ogive. (g) Interpretation Discuss some of the features about the data that the graphs reveal. Consider items such as data range, location of the middle half of the data, unusual values, outliers, etc. Medical: Glucose Testing The followingdata represent glucose blood levels (mg/100ml) after a 12-hour fast for a random sample of 70 women (Referenece: American Journal of Clinical Nutrition. Vol. 19. pp. 345-351). Note: These data are also available for download at the Companion Sites for this text. 45 66 83 71 76 64 59 59 76 82 80 81 85 77 82 90 87 72 79 69 83 71 87 69 81 76 96 83 67 94 101 94 89 94 73 99 93 85 83 80 78 80 85 83 84 74 81 70 65 89 70 80 84 77 65 46 80 70 75 45 101 71 109 73 73 80 72 81 63 74 Use six classes.For Problems 15-20, use the specified number of classes to do the following. (a) Find the class width. (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped symmetric, bimodal, skewed left, or skewed right. (f) Draw an ogive. (g) Interpretation Discuss some of the features about the data that the graphs reveal. Consider items such as data range, location of the middle half of the data, unusual values, outliers, etc. Medical: Tumor Recurrence Certain kinds of tumors tend to recur. The following data represent the lenths of time, in months for a tumor to recur after chemotherapy (Reference: D. P. Byar. Journal of Urology, Vol. 10. pp. 556-561). Note: These data are also available for download at the Companion Sites for this text. 19 18 17 1 21 22 54 46 25 49 50 1 59 39 43 39 5 9 38 18 14 45 54 59 46 50 29 12 19 36 38 40 43 41 10 50 41 25 19 39 27 20 Use five classes.For Problems 15-20, use the specified number of classes to do the following. (a) Find the class width. (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped symmetric, bimodal, skewed left, or skewed right. (f) Draw an Ogive. (g) Interpretation Discuss some of the features about the data that the graphs reveal. Consider items such as data range, location of the middle half of the data, unusual values, outliers, etc. Archaeology: New Mexico The Wind Mountain excavation site in New Mexico it animportant archaeological location ofthe ancient Native American Anasazi culture. The following data represent depths (in cm) below surface grade at which significant artifacts were discovered at this site (Reference: A. I. Woosley and A. J.McIntyre. MimbresMogollomArchaeology. University of New Mexico Press). Note: These data are also available for download at the Companion Sites for this text. 85 45 75 60 90 90 115 30 55 58 78 120 80 65 65 140 65 56 30 125 75 137 80 120 15 45 70 65 50 45 95 70 70 28 40 125 105 75 80 70 90 68 73 75 55 70 95 65 200 75 15 90 46 33 100 65 60 55 85 50 10 68 99 145 45 75 45 95 85 65 65 52 82 Use seven classes.For Problems 15-20, use the specified number of classes to do the following. (a) Find the class width. (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped symmetric, bimodal, skewed left, or skewed right. (f) Draw an Ogive. (g) Interpretation Discuss some of the features about the data that the graphs reveal, Consider items such as data range, location of the middle half of the data, unusual values, outliers, etc. Education: College Enrollment What percent of undergraduate enrollment in coed colleges and universities in the United States is male? A random sample of 50 such institutionsgive thefollowing data (Source: USA Today College Guide). Percent Males Enrolled in Coed Universities and Colleges 31 39 53 47 40 49 53 47 45 26 39 79 45 50 36 49 45 49 43 48 54 50 43 42 42 35 49 45 42 58 42 55 45 71 50 57 49 50 45 46 53 48 53 37 56 63 41 41 51 48 Use five classes.For Problems 15-20. use the specified number of classes to do the following. (a) Find the class width. (b) Make a frequency table showing class limits, class boundaries. midpoints, frequencies, relative frequencies, and cumulative frequencies. (C) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped symmetric, bimodal. skewed left, or slewed right. (f) Draw an ogive. (g) Interpretation Discuss some of the features about the data that the graphs reveal. Consider items such as data range, location of the middle half of the data, unusual values, outliers, etc. Advertising: Readability Readability Levels of Magazine Ads, by F.K. shuptrine and D.D. McVicker, is an article in the Journal of Advertising Research. (For more information. Find the web site for DASL, the Carnegie Mellon University Data and Story Library. Look in Data Subjects under consumer and then Magazine Ads Readability file.) The following is a list of the number of three-syllable (or longer) words in advertising copy of randomly selected magazine advertisements. 34 21 37 31 10 24 39 10 17 18 32 17 3 10 6 5 6 6 13 22 25 3 5 2 9 3 0 4 29 26 5 5 24 15 3 8 16 9 10 3 12 10 10 10 11 12 13 1 9 43 13 14 32 24 15Expand Your knowledge: Decimal Data The fallowing data represent tonnes of wheat harvested each year (1894-1925) from Plot 19 at the Rothamsted Agricultural Experiment Stations, England. 2.71 1.62 2.60 1.64 2.20 2.02 1.67 1.99 2.34 1.26 1.31 1.80 2.82 2.15 2.07 1.62 1.47 2.19 0.59 1.48 0.77 1.04 1.32 0.89 1.35 0.95 0.94 1.39 1.19 1.18 0.46 0.70 (a) Multiply each data value by 100 to clear" the decimals. (b) Use the standard procedures of this section to make a frequency table and histogram with your whole-number data. Use six classes. (c) Divide class limits, class boundaries, and class midpoints by 100 to get back to your original data values.Decimal Data: Batting Averages The following data represent baseball batting averages for a random sample of National League players neat the end of the baseball season. the data are from the baseball statistics section of the Denver Post. 0.194 0.258 0.190 0.291 0.158 0.295 0.261 0.250 0.181 0.125 0.107 0.260 0.309 0.309 0.276 0.287 0.317 0.252 0.215 0.250 0.246 0.260 0.265 0.182 0.113 0.200 (a) Multiply each data value by 1000 to clear the decimals. (b) Use the standard procedares of this section to make a frequency table and histogram with your whole-number data. Use five classes. (c) Divide class limits, class boundaries, and class midpoints by 1000 to get back to your original dat.Expand Your Knowledge: Dotplot another display technique that is somewhat similar to histogram is dotplot. In a dotplot, the data values are displayed along the horizontal axis A dot is then plotted over each data value in the data set. (a) Form the dotplot, how many states have 600 or fewer licensed drivers per 1000 residents? (b) About what percentage of the states (out of 51) seem to have close to 800 licensed drivers per 1000 residents? (c) Consider the intervals 550 to 650, 650 to 750, and 750 to 850 licensed drivers per 1000 residents. In which interval do most of the states fall?24. Dotplot: Sled Dog Racing Make a dotplot for the data in Problem 15 regarding the finish time (number of hours) for the Iditarod Sled Dog Race. Compare the dotplot to the histogram of Problem 15. 25P1. Interpretation Consider graph (a) of Reasons People Like Texting on Cell Phones, based on a GfK Roper survey of 1000 adults. Reasons People Like Texting on Cell Phones Do you think respondents could select more than one response? Explain. Could the same information be displayed in a circle graph? Explain. Is graph (a) a Pareto chart? Interpretation Look at graph (b) of Reasons People Like Texting on Cell Phones. Is this a proper bar graph? Explain.Critical Thinking A personnel office is gathering data regarding working conditions. Employees are givena list of five conditions that they might want to see improved. They are asked to select the one item that is most critical to them. Which type of graph, circle graph or Pareto chart, would he most useful for displaying the results of the survey? Why?Critical Thinking Your friend is thinking about busing shares of stock in a company. You have been tracking the closing paces of the stock shares for the past 90 trading days. Which type of graph for the data, histogram or time-series, would be best to show your friend? Why?Education: Does College Pay Off? It is costly in both time and money to go to college. Does it pay off? According to the Bureau of the Census, the answer is yes. The average annual income (in thousands of dollars) of a household headed by a person with the stated education level is as follows: 24.3 if ninth grade is the highest level achieved. 41.4 forhigh school graduates. 59.7 for those holding associate degrees, 82.7 for those with bachelor's degrees, 100.8 for those with masters degrees, and 121.6 for those with doctoral degrees. Make a bar graph showing household income for each education level.Interpretation Consider the two graphs depicting the influence of advertisements on making large purchases for two different age groups, those 18 to 34 years old and those 45 to 54 years old (based on a Harris Poll of about 2500 adults aged 18 or older). Note: Other responses such as "not sure" and "not applicable" were also possible. Influence of Advertising on Most Recent Large Purchase (a) Taking a quick glance at the graphs, Jenna thought that there was very little difference (maybe less than 1%) in the percentage of the two age groups who said that ads were influential. How would you change the graphs so that Jenna would not be misled so easily? Hint: Look at the vertical scales of the two graphs. (b) Take the information from the two graphs and make a cluster bar graph showing the percentage by age group reporting to be influenced by ads and those reporting they were not influenced by ads.Commercial Fishing: Gulf of Alaska It's not an easy life, but it's a good life! Suppose you decide to take the summer off and sign on as a deck hand for a commercial fishing boat in Alaska that specializes in deepwater fishing for groundfish. What kind of fish can you expect to catch? One way to answer this question is to examine government reports on groundfish caught in the Gulf of Alaska. The following list indicates the types of fish caught annually in thousands of metric tons (Source: Report on the Status of U.S. Living Marine Resources, National Oceanic and Atmospheric Administration): flatfish, 36.3; Pacific cod, 68.6; sablefish, 16.0; walleye pollock. 71.2; rockfish, 18. 9. Make a Pareto Chart showing the annual harvest for commercial fishing in the Gulf of Alaska.Archaeology: Ireland Commercial dredging operations in ancient riven occasionally uncover archaeological artifacts of great importance One such artifact is Bronze Age spearheads recovered from ancient riven in Ireland. A recent study gave the following information regarding discoveries of ancient bronze spearheads in Irish rives. River Bann Black water Erne Shannon Barrow No. of spearheads 19 8 15 33 14 (Based on information from C rossing the Rubicon. Bronze Age Studies 5, Lorraine Bourke. Department of Archaeology. National University of Ireland, Galway.) (a) Make a Pareto chart for these data. (b) Make a circle graph for these data.Lifestyle: Hide the Mess! A survey of 1000 adults (reported in USA Today) uncovered some interesting housekeeping secrets. When unexpected company comes, where do we hide the mess? The survey showed that 68% of the respondents loss their mess into the closet. 23 shove things under the bed. 6 put things into the bathtub. and 3 put the mess into the freezer. Make a circle graph to display this information.Education: College Professors' Time How do college professors spend their time? The National Education Association Almanac of Higher Education gives the following average distribution of professional time allocation: teaching. 51 research. 16 professional growth. 5; community service. 11; service to the college. 11; and consulting outside the college. 6. Make a pie chart showing the allocation of professional time for college professors.FBI Report: Hawaii In the Aloha state, you are very unlikely to be murdered! However. it is considerably more likely that your house might be burgled, your car might be stolen, or you might be punched in the nose. That said. Hawaii is still a great place to vacation or, if you are very lucky, to live. The following numbers represent the crime rates per 100.000 population in Hawaii: murder. 2.6; rape. 35.4; robbery. 93.3; house burglary. 911.6; motor vehicle theft. 550.7; assault. 125.3 (Source: Crimein the United States, U.S. Department of Justice. Federal Bureau of Investigation ). (a) Display this information in a Pareto chart, showing the crime rate for each category. (b) Could the information as reported be displayed as a circle graph? Explain. Hint: Other forms of crime, such as arson, are not included in the information. In addition, some crimes might occur together.Driving: Bad Habits Driving would be more pleasant if we didn't have to put up with the bad habits of other drivers. USA Today reported the results of a Valvoline Oil Company survey of 500 drivers. in which the drivers marked their complaints about other drivers. The top complaints turned out to be tail- gating. marked by 22 of the respondents. not using turn signals, marked by 19; being cut off. marked by 16. other drivers driving loo slowly, marked by 11%; and other drivers being inconsiderate, marked by 8. Make a Pareto chart showing percentage of drivers listing each staled complaint. Could this information as reported be put in a circle graph.* Why or why not?13Ecology: Lakes Pyramid Lake, Nevada, is described as the pride of the Paiute Indian Nation. It is a beautiful desert lake famous for very large trout. The elevation of the lake surface (feet above sea level) vanes according to the annual flow of the Truckee Riser from Lake Tahoe. The U.S. Geological Survey provided the following data from equally spaced intervals of time over a 15-year period: Time Period Elevation 1 3817 2 3815 3 3810 4 3812 5 3808 6 3803 Time Period Elevation 7 3798 8 3797 9 3795 10 3797 11 3802 12 3807 Time Period Elevation 11 3811 14 3816 15 3817 Make a time-series graph displaying the data. For more information, visit the web site for Pyramid Lake Fisheries.Vital Statistics: Height How dies average height for boys change as boys get older? According to Physicians Handbook, the average heights at different ages are as follows: Age (years) 05 1 2 3 4 5 6 7 Height (inches) 26 29 33 36 39 42 45 47 Age (years) 8 9 10 11 12 13 14 Height (inches) 50 52 54 56 58 60 62 Make a time-series graph for average height for ages 0.5 though 14 years.Expand Your Knowledge Donut Pie Charts The book The Wall Street Journal. Guide to Information Graphics by Dona M. Wong gives strategies for using graphs and charts to display information effectively. One popular graph discussed is the donut pie chart. The donut pie chart is simply a pie chart with the center removed. A recent Harris Poll asked adults about their opinions regarding whether books should be banned from libraries because of social, language, violent, sexual, or religious content. The responses by education level to the question, Do you think that there are any books which should be banned completely? are shown in the following donut pie charts. What feature of Keiths graph makes it difficult to visually compare the responses of those with some college to those shown in the other graphs? How would you change Keiths graph for easier comparison? Interpretation Compare graphs made by Ramon. At which of the two education levels is the no response more frequent?Technology: Cars The following cluster bar graph shows responses from different age groups to questions regarding connectivity and tracking technology found in new cars. A recent Hams Poll asked respondents how much they agreed or disagreed with statements that they (1)worry that the technologies cause too much distraction and are dangerous; (2)worry about letting companies know too much about location and driving habits. (3)worry that insurance rates could increase because of knowledge of driving habits; (4)think the technologies make driving more enjoyable; (5) feel safer with the technologies; (6) feel it is important to stay connected when in vehicle. The graph shows the percentage of respondents in each age category who agree strongly or somewhat agree to each of the six statements. (a) Interpretation Which statement has the highest rale of agreement for all four age groups? (b) Interpretation Which age group expresses the least worry about insurance companies raising their rates because of the driving habit information collected by the technologies? (c) Interpretation Which age group ha* the highest percentage of those who find the technologies make driving more enjoy able?Cowboys: Longevity How long did real cowboys live? One answer may be found in the book The Last Cowboys by Connie Brooks (University of New Mexico Press). This delightful book presents a thoughtful sociological study of cowboys in west Texas and southeastern New Mexico around the year 1890. A sample of 32 cowboys gave the following yean of longevity: 58 52 68 86 72 66 97 89 84 91 91 92 66 68 87 86 73 61 70 75 72 73 85 84 90 57 77 76 84 93 58 47 (a) Make a stem-and-leaf display for these data. (b) Interpretation Consider the following quote from Baron von Richthofen in his Cattle Raising on the Plaint of North America: Cowboys are to be found among the sons of the best families. The truth is probably that most were not a drunken, gambling lot. quick to draw and fire their pistols." Does the data distribution of longevity lend credence to this quote?Ecology: Habitat Wetlands offer a diversity of benefits. They provide a habitat for wildlife, spawning grounds for U.S. commercial fish, and renewable timber resources. In the last 200 years, the United States has lost more than half its wetlands. Environmental Almanac gives the percentage of wetlands lost in each state in the last 200 years. For the lower 48 states, the percentage loss of wetlands per state is as follows: 46 37 36 42 81 20 73 59 35 50 87 52 24 27 38 56 39 74 56 31 27 91 46 9 54 52 30 33 28 35 35 23 90 72 85 42 59 50 49 48 3ft 60 46 87 50 89 49 67 Make a stem-and-leaf display of these data. Be sure to indicate the scale. How are the percentages distributed? Is the distribution skewed? Are there any gaps?Health Care: Hospitals The American Medical Association Center for Health Policy Research, in its publication State Health Care Data: Utilization. Spending, and Characteristics. included data, by stale, on the number of community hospitals and the average patient stay (in days). The data are shown in the table. Make a stem-and-leaf display of the data for the average length of stay in days. Comment about the general shape of the distribution.Health Care: Hospitals Using the number of hospitals per state listed in the table in Problem 3, make a stem-and-leaf display for the number of community hospitals per state. Which states have an unusually high number of hospitals?Expand Your knowledge: Split Stem The Boston Marathon is the oldest and best-known U.S. marathon. It covers a route from Hopkinton. Massachusetts, to downtown Boston The distance is approximately 26 miles. The Boston Marathon web site has a wealth of information about the history of the race. In particular, the site gives the winning times for the Boston Marathon. They are all over 2 hours. The following data are the minutes over 2 hours for the winning male runners over two periods of 20 years each: Earlier Period 23 23 18 19 16 17 15 22 13 10 18 15 16 13 9 20 14 10 9 12 Recent Period 9 8 9 10 14 7 11 8 9 8 II 8 9 7 9 9 10 7 9 9 (a) Make a stem-and-leaf display for the minutes over 2 hours of the winning times for the earlier period. Use two lines per stem. (b) Make a stem-and-leaf display for the minutes over 2 hours of the winning times for the recent period. Use two lines per stem. (c) Interpretation Compare the two distributions How many times under 15 minutes are in each distribution?Split Stem: Golf The U.S. Open Golf Tournament was played at Congressional Country Club, Bethesda, Maryland, with prizes ranging from $465,000 for first place to $5000. Par for the course was 70. The tournament consisted of four rounds played on different days. The scores for each round of the 32 players who placed in the money (more than $17,000) were given on a web site. For more information, visit the PGA web site. The scores for the first round were as follows: 71 65 67 73 74 73 71 71 74 73 71 70 75 71 72 71 75 75 71 71 74 75 66 75 75 75 71 72 72 73 71 67 The scores for the fourth round for these players were as follows: 69 69 73 74 72 72 70 71 71 70 72 73 73 72 71 71 71 69 70 71 72 73 74 72 71 68 69 70 69 71 73 74 (a) Make a stem-and-leaf display for the first-round scores. Use two lines per stem. (See Problem 5.) (b) Make a stem-and-leaf display for the fourth-round scores. Use two lines per stem. (c) Interpretation Compare the two distributions. How do the highest scores compare? How do the lowest scores compare?Are cigarettes bad for people? Cigarette smoking involves tar, carbon monoxide, and nicotine. The first two are definitely not good for a person's health, and the last ingredient can cause addiction. Problems 7, 8, and 9 refer to Table 2-16, which was taken from the web site maintained by the Journal of Statistics Education. For more information, visit the web site of the Journal of Statistics Education. Follow the links to the cigarette data. Health: Cigarette Smoke Use the data in Table 2-16 to make a stem-and-leaf display for milligrams of tar per cigarette smoked. Are there any outlier*? TABLE 2-16 Milligrams of Tar, Nicotine, and Carbon Monoxide (CO) per One CigaretteAre cigarettes bad for people? Cigarette smoking involves tar, carbon monoxide, and nicotine. The first two are definitely not good for a person's health, and the last ingredient can cause addiction. Problems 7, 8, and 9 refer to Table 2-16, which was taken from the web site maintained by the Journal of Statistics Education. For more information, visit the web site of the Journal of Statistics Education. Follow the links to the cigarette data. Health: Cigarette Smoke Use the data in Table 2-16 to make a stem-and-leaf display for milligrams of carbon monoxide per cigarette smoked. Are there any outliers?Are cigarettes bad for people? Cigarette smoking involves tar, carbon monoxide, and nicotine. The first two are definitely not good for a person's health, and the last ingredient can cause addiction. Problems 7, 8, and 9 refer to Table 2-16, which was taken from the web site maintained by the Journal of Statistics Education. For more information, visit the web site of the Journal of Statistics Education. Follow the links to the cigarette data. Health: Cigarette Smoke Use the data in Table 2-16 to make a stem-and-leaf display for milligrams of nicotine per cigarette smoked. In this case, truncate the measurements at the tenths position and use two lines per stem (sec Problem 5. pan a).Expand Your Knowledge: Back-to-Back Stem Plot In archaeology, the depth (below surface grade) at which artifacts are found is very important. Greater depths sometimes indicate older artifacts, perhaps from a different archaeological period. Figure 2-17 is a back-to-back stem plot showing the depths of artifact locations at two different archaeological sites. These sites are from similar geographic locations. Notice that the stems are in the center of the diagram. The leaves for Sue I artifact depths are shown to the left the stem. While the leaves for Site II are to the right of the stem (Reference: Mimbres Mogollon Archaeology by A. I. Woosley and A. J. McIntyre. University of New Mexico Press). (a) What are the least and greatest depths of artifact finds at Site I? at Site II? (b) Describe the data distribution of depths of artifact finds at Site I and at Site II. (c) Interpretation At Site II. there is a gap in the depths at which artifacts were found. Does the Site II data distribution suggest that there might have been a period of no occupation?Terminology Consider the following measures of central tendency: mean, median, mode. Match each type to the appropriate description: (i) the central value of a data set after it has been ordered from smallest to largest (ii) the data value occurring most frequently in a data set (iii) the sum of all the data values in a data set divided by the number of data values in the setTerminology Consider the statement: For a 5% trimmed mean, we eliminate the bottom 5% of the data from an ordered data set and then compute the mean of the remaining data. Is the statement true or false? Explain.Terminology When we compute a sample standard deviation of a data set. do we subtract the sample mean or the sample median from each of the data values?Terminology Consider the following terms: outlier, coefficient of variation, range. Match each term to the appropriate description. (i) The difference between the largest and the smallest data value of a data set (ii) The spread of a data set measured by the standard deviation relative to the mean of the data set (iii) An unusually large or small data value in a data setTerminology Consider the following symbols: s,,x,. Match each of the symbols to the appropriate name: (i) sample mean (ii) sample standard deviation (ii) population mean (iv) population standard deviationTerminology How is the standard deviation related to the variance of a data set?Terminology In a box-and-whisker plot. which measure of central tendency is displayed: mean, median, or mode?Terminology Consider the following statement: If you answered 90% of the questions on a test correctly, then your score is in the 90th percentile. Is the statement true or false? Explain.Statistical Literacy (a) What measures of variation indicate spread about the mean? (b) Which graphic display shows the median and the data spread about the median?Critical Thinking Look at the two histograms on page 136. Each involves the same number of data. The data are all whole numbers, so the height of each bar represents the number of values equal to the corresponding midpoint shown on the horizontal axis. Notice that both distributions are symmetric.Critical Thinking Consider the following Minitab display of two data sets. Variable N Mean St Mean St Dev Minimum Q1 Median 03 Maximum C1 20 20.00 1.62 7.26 7.00 15.00 20.00 25.00 31.00 C2 20 20.00 1.30 5.79 7.00 20.00 22.00 22.00 31.00 (a) What are the respective means? the respective ranges? (b) Which data set seems more symmetric? Why? (c) Compare the interquartile ranges of the two sets. How do the middle halves of the data sets compare?Consumer: Radon Gas "Radon: The Problem No One Wants to Face" is the title of an article appearing in Consumer Reports. Radon is a gas emitted from the ground that can collect in houses and buildings. At certain levels it can cause lung cancer. Radon concentrations are measured in picocuries per liter (pCi/L). A radon level of 4 pCi/L is considered "acceptable." Radon levels in a house vary from week to week. In one house, a sample of 8 weeks had the following readings for radon level (in pCi/L): 1.92.85.74. 2 1.98.63.97.2 (a) Find the mean, median, and mode. (b) Find the sample standard deviation, coefficient of variation, and range. (c) Interpretation Based on the data, would you recommend radon mitigation in this house? Explain.Political Science: Georgia Democrats How Democratic is Georgia? County-by-county results are shown for a recent election. For your convenience. the data have been sorted in increasing order (Source: Countyand City Data Book. 12th edition. U.S. Census Bureau). Percentage of Democratic Vote by Counties in Georgia 31 33 34 34 35 35 35 36 38 38 38 39 40 40 40 40 41 41 41 41 41 41 41 42 42 43 44 44 44 45 45 46 46 46 46 47 48 49 49 49 49 50 51 52 52 53 53 53 53 53 55 56 56 57 57 59 62 66 66 68 (a) Make a bos-and-whisker plot of the data. Find the interquartile range. (b) Grouped Data Make a frequency table using five classes. Then estimate the mean and sample standard deviation using the frequency table. Compute a 75% Chebyshev interval centered about the mean. (c) If you have a statistical calculator or computer, use it to find the actual sample mean and sample standard deviation. Otherwise, use the values x=2769andx2=132,179 to compute the sample mean and sample standard deviation.Grade: Weighted Average Professor Cramer determines a final grade based on attendance, two papers, three major tests, and a final exam. Each of these activities has a total of 100 possible points. However, the activities carry different weights. Attendance is worth 5%, each paper is worth 8%, each test is worth 15%, and the final is worth 34%. (a) What is the average for a student with 92 on attendance, 73 on the first paper, 81 on the second paper, 85 on test 1.87 on test 2. 83 on test 3, and 90 on the final exam? (b) Compute the average for a student with the above scores cm the papers, tests, and final exam, but with a score of only 20 cm attendance.General: Average Weight Wright An elevator it loaded with 16 people and is at its load limit of 2500 pounds. What is the mean weight of these people?Agriculture: Harvest Weight of Maize The following data represent weights in kilograms of maize harvest from a random sample of 72 experimental plots on St. Vincent, an island in the Caribbean (Reference: B. G. F. Springer. Proceedings, Caribbean Food Corps. Soc., Vol. 10, pp. 147-152). Note: These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order. 7.8 9.1 9.5 10.0 10.2 10.5 11.1 11.5 11.7 11.8 12.2 12.2 12.5 13.1 13.5 13.7 13.7 14.0 14.4 14.5 14.6 15.2 15.5 16.0 16.0 16.1 16.5 17.2 17.8 18.2 19.0 19.1 19.3 19.8 20.0 20.2 20.3 20.5 20.9 21.1 21.4 21.8 22.0 22.0 22.4 22.5 22.5 22.8 22.8 23.1 23.1 23.2 23.7 23.8 23.8 23.8 23.8 24.0 24.1 24.1 24.5 27.1 24.5 29.5 24.9 25.1 25.2 25.5 26.1 26.4 26.5 26.7 (a) Compute the five-number summary. (b) Compute the interquartile range. (c) Make a box-and-whisker plot. (d) Interpretation Discuss the distribution. Does the lower half of the distribution show more data spread than the upper half?Focus Problem: Water Solve the focus problem at the beginning of this chapter. The Yellowstone River starts in massive and beautiful Yellowstone Lake. Then it flows through prime trout fishing areas to the famous Yellowstone Falls. After it leaves the park, the river is an important source of water for wildlife, ranchers, farmers, and cities downstream. How much water does leave the park each year? The annual flow of the Yellowstone River (units 10 cubic meters) is shown here for 19 recent years (Reference: U.S. Department of Interior. U.S. Geological Survey: Integrated Geoscience Studies in the Greater Yellowstone Area). 25.9 32.4 33.1 19.1 17.5 24.9 27.1 29.1 25.6 31.8 21.0 45.1 30.8 34.3 25.9 18.6 23.7 24.1 23.9 (a) Is there a "guaranteed" amount of water farmers, ranchers, and cities will get from the Yellowstone River each year? (b) What is the expected annual How from the Yellowstone snowmelt? Find the mean, the median, and the mode. (c) Find the range and standard deviation of annual flow. (d) Find a 75% Chebyshev interval around the mean. (e) Give a five-number summary of annual water How from the Yellowstone River and make a box-and-whisker plot. Interpret the five-number summary and the box-and-whisker plot. Where does the middle portion of the data lie? What is the interquartile range? Can you find data withers? (f) The Madison River is a smaller but very important source of water flowing out of Yellowstone Park from a different drainage. Ten recent years of annual water flow data are shown below (units 10 cubic meters). 3.83 3.81 4.01 4.84 5.81 5.50 4.31 5.81 4.31 4.67 Although smaller, is the Madison more reliable? Use the coefficient of variation to make an estimate. (g) Interpretation Based on the data, would it be safe to allocate at least 27 units of Yellowstone River water each year for agricultural and domestic use? Why or why not?Agriculture: Bell Peppers The pathogen Phytophthora capsici causes hell pepper plants to will and die. A research project was designed to study the effect of soil water content and the spread of the disease in fields of hell peppers (Source: Journal of Agricultural. Biological, and Environmental Statistics. Vol. 2, No. 2). It is thought that loo much water helps spread the disease. The fields were divided into rows and quadrants. The soil water content (percent of water by volume of soil) was determined for each plot. An important first step in such a research project is to give a statistical description of the data. Soil Water Content for Bell Pepper Study 15 14 14 14 13 12 11 11 11 11 10 11 13 16 10 9 15 12 9 10 7 14 13 14 8 9 8 11 13 13 15 12 9 10 9 9 16 16 12 10 11 11 12 15 6 10 10 10 11 9 (a) Make a box-and-whisker plot of the data. Find the interquartile range. (b) Grouped Data Make a frequency table using four classes. Then estimate the mean and sample standard deviation using the frequency table. Compute a 75% Chebyshev interval centered about the mean. (c) If you have a statistical calculator or computer, use it to find the actual sample mean and sample standard deviation.Performance Rating: Weighted Average A performance evaluation for new sales representatives at Office Automation Incorporated involves several ratings done on a scale of 1 to 10, with 10 the highest rating. The activities rated include new contacts, successful contacts, total contacts, dollar volume of sales, and reports. Then an overall rating is determined by using a weighted average. The weights are 2 for new contacts, 3 for successful contacts, 3 for total contacts, 5 for dollar value of sales, and 3 for reports. What would the overall rating he for a sales representative with ratings of 5 for new contacts. 8 for successful contacts, 7 for total contacts, 9 for dollar volume of sales, and 7 for reports?The Story of Old Faithful is a short book written by George MArler and published by the Yellowstone Association. Chapter 7 of this interesting book talks about the effect of the 1959 earthquake on eruption intervals for Old Faithful Geyser. Dr. John Rinehart (a senior research scientist with the National Oceanic and Atmospheric Administration) has done extensive studies of the eruption intervals before and after the 1959 earthquake. Examine Figure 3-11. Notice the general shape. Is the graph more or less symmetric? Does it have a single mode frequency? The mean interval between eruptions has remained steady at about 65 minutes for the past 100 years. Therefore, the 1959 earthquake did not significantly change the mean, but it did change the distribution of eruption intervals. Examine Figure 3-12. Would you say there are really two frequency modes, one shorter and the other longer? Explain. The overall mean is about the same for both graphs, but one graph has a much larger standard deviation (for eruption intervals) than the other. Do no calculations, just look at both graphs, and then explain which graph has the smaller and which has the larger standard deviation. Which distribution will have the larger coefficient of variation? In everyday terms, what would this mean if you were actually at Yellowstone waiting to see the next eruption of Old Faithful? Explain your answer.Most academic advisors tell students to major in a field they really love. After all, it is true that money cannot buy happiness! Nevertheless, it is interesting to at least look at some of the higher-paying fields of study. After all, a field like mathematics can be a lot of fun, once you get into it. We see that womens salaries tend to be less than mens salaries. However, womens salaries are rapidly catching up, and this benefits the entire workforce in different ways. Figure 3-13 shows the median incomes for college graduates with different majors. The employees in the sample are all at least 30 years old. Does it seem reasonable to assume that many of the employees are in jobs beyond the entry level? Explain. Compare the median incomes shown for all women aged 30 or older holding bachelors degrees with the median incomes for men of similar age holding bachelors degrees. Look at the particular majors listed. What percentage of men holding bachelors degrees in mathematics make $52,316 or more? What percentage of women holding computer/information science degrees make $41,559 or more? How do median incomes for men and women holding engineering degrees compare? What about pharmacy degrees? Salaries change all the time and hopefully increase. Check the Bureau of Labor Statistics web site for the most current salaries.An average is an attempt to summarize a collection of data into just one number. Discuss how the mean, median, and mode all represent averages in this context. Also discuss the differences among these averages. Why is the mean a balance point? Why is the median a midway point? Why is the mode the most common data point? List three areas of daily life in which you think the mean, median, or mode would be the best choice to describe an "average."Why do we need to study the variation of a collection of data? Why isnt the average by itself adequate? We have studied three ways to measure variation. The range, the standard deviation, and, to a large extent, a box-and-whisker plot all indicate the variation within a data collection. Discuss similarities and differences among these ways to measure data variation. Why would it seem reasonable to pair the median with a box-and-whisker plot and to pair the mean with the standard deviation? What are the advantages and disadvantages of each method of describing data spread? Comment on statements such as the following: (a) The range is easy to compute, but it doesnt give much information; (b) although the standard deviation is more complicated to compute, it has some significant applications; (c) the box-and-whisker plot is fairly easy to construct, and it gives a lot of information at a glance.Why is the coefficient of variation important? What do we mean when we say that the coefficient of variation has no units? What advantage can there be in having no units? Why is relative size important? Consider robin eggs; the mean weight of a collection of robin eggs is 0.72 ounce, and the standard deviation is 0.12 ounce. Now consider elephants; the mean weight of elephants in the zoo is 6.42 tons, with a standard deviation 1.07 tons. The units of measurement are different and there is a great deal of difference between the weight of an elephant and that of a robins egg. Yet the coefficient of variation is about the same for both. Comment on this from the viewpoint of the size of the standard deviation relative to that of the mean.What is Chebyshevs theorem? Suppose you have a friend who knows very little about statistics. Write a paragraph or two in which you describe Chebyshevs theorem for your friend. Keep the discussion as simple as possible, but be sure to get the main ideas across to your friend. Suppose he or she asks, What is this stuff good for? and suppose you respond (a little sarcastically) that Chebyshevs theorem applied to everything from butterflies to the orbits of the planets! Would you be correct? Explain.Application Using the software or calculator available to you, do the following: 1. Trade winds are one of the beautiful features of island life in Hawaii. The following data represent total air movement in miles per day over a weather station in Hawaii as determined by a continuous anemometer recorder. The period of observation is January 1 to February 15,1971. 26 14 18 14 13 50 13 22 27 57 28 50 72 52 105 138 16 33 18 16 32 26 11 16 17 14 57 100 35 20 21 M 18 13 18 28 21 13 25 19 11 19 22 19 15 20 Source: Untied States Department of Commerce. National Oceanic and Atmospheric Administration. Environmental Data Service. Climatological Data, Annual SummaryHawaii,. Vol, 67. No 13 Asheville: National Climatic Center, 1971, pp. 11, 24. 2. (a) Use the computer to find the sample mean, median, and (if it exists) mode. Also, find the range, sample variance, and sample standard deviation. 3. (b) Use the five-number summary provided by the computer to make a box-and-whisker plot of total air movement over the weather station. 4. (c) Four data values are exceptionally high: 113, 105, 138, and 100. The strong winds of January 5(113 reading) brought in a cold front that dropped snow on Haleakala National Park (at the 8000 ft elevation). The residents were so excited that they drove up to seethe snow and caused such a massive traffic jam that the Park Service had to close the road. The winds of January 15, 16, and 28 (readings 105, 138, and 100) accompanied a storm with funnel clouds that did much damage. Eliminate these values (i.e., 100, 105, 113. and 138) from the data hank and redo parts (a) and (b). Compare your results with those previously obtained. Which average is most affected? What happens to the standard deviation? How do the two box-and-whisker plots compare?Consider the following measures: mean, median, variance, standard deviation, percentile. (a) Which measures utilize relative position of the data values? (b) Which measures utilize actual data values regard-less of relative position?Describe how the presence of possible outliers might be identified on (a) histograms. (b) dot plots. (c) stem-and-leaf displays. (d) box-and-whisker plots.Consider two data sets. A and B. The sets are identical except that the high value of data set B is three times greater than the high value of data set A. (a) How do the medians of the two data sets compare? (b) How do the means of the two data sets compare? (c) How do the standard deviations of the two data sets compare? (d) How do the box-and-whisker plots of the two data sets compare?You are examining two data sets involving test scores, set A and set B. The score 86 appears in both data sets. In which of the following data sets does 86 represent a higher score? Explain. (a) The percentile rank of 86 is higher in set A than in set B. (b) The mean is 80 in both data sets, but set A has a higher standard deviation.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-I). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 Write a brief description in which you outline how you would obtain a random simple of 102 west Texas water wells, Explain how random numbers would be used in the selection process. These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 15 7.5 7.5 7.5 7.5 15 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 Is the given data nominal, ordinal, interval, or ratio? Explain. These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells, A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 15 7.5 7.5 7.5 7.5 15 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order. 7 Make a stem-and-leaf display. Use five lines per stem so that leaf values 0 and 1 are on one line, 2 and 3 are on the next line, 4 and 5 are on the next, 6 and 7 are on the next, and 8 and 9 are on the last line of the stem.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 isacidic and a pH above 7 isalkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 Make a frequency table, histogram, and relative-frequency histogram using five classes. Recall that for decimal data, we dear the decimal" to determine classes for whole-number data and then reinsert the decimal to obtain the classes for the frequency table of the original data. These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order. Make an ogive using five classes.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells, A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 15 7.5 7.5 7.5 7.5 15 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order. Compute the range, mean, median, and mode for the given data.In west Texas, water is extremely important. The follow mg data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 15 7.5 7.5 7.5 7.5 15 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 (a) Verily that x=772.9andx25876.6. (b) Compute the sample variance, sample standard deviation, and coefficient of variation for the given data. Is the sample standard deviation small relative to the mean pH? These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 Compute a 75% Chebyshev interval centered on the mean. These data are also available for download at the Companion Sues for this text. For convenience, the data are presented in increasing order.In west Texas, water is extremely important. The following data represent pH levels in groundwater for a random sample of 102 west Texas wells. A pH less than 7 is acidic and a pH above 7 is alkaline. Scanning the data, you can see that water in this region tends to be hard (alkaline). Too high a pH means the water is unusable or needs expensive treatment to make it usable (Reference: C. E. Nichols and V. E. Kane. Union Carbide Technical Report K/UR-1). x: pH of Ground Water in 102 West Texas Wells 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.7 7.7 7.7 7.7 7.7 7.7 7.8 7.8 7.8 7.8 7.8 7.9 7.9 7.9 7.9 7.9 8.0 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.4 8.5 8.6 8.7 8.8 8.8 Make a box-and-whisker plot. Find the interquartile range. These data are also available for download at the Companion Sites for this text. For convenience, the data are presented in increasing order.Interpretation Wow! In Problems 5-13 you constructed a lot of information regarding the pH of west Texas groundwater based on sample data. Let's continue the investigation. Look at the histogram. Is the pH distribution for these wells symmetric or skewed? Are lower or higher values more common?Interpretation Wow! In Problems 5-13 you constructed a lot of information regarding the pH of west Texas groundwater based on sample data. Let's continue the investigation Look at the ogive. What percent of the wells have a pH less than 8.15? Suppose a certain crop can tolerate irrigation water with a pH between 7.35 and 8.55. What percent of the wells could be used for such a crop?Interpretation Wow! In Problems 5-13 you constructed a lot of information regarding the pH of west Texas groundwater based on sample data. Let's continue the investigation. Look at the stem-and-leaf plot. Are there any unusually high or low pH levels in this sample of wells? How many wells are neutral (pH of 7)?Interpretation Wow! In Problems 5-13 you constructed a lot of information regarding the pH of west Texas groundwater based on sample data. Let's continue the investigation Use the box-and-whisker plot to describe how the data are spread about the median. Are the pH values above the median more spread out than those below? Is this observation consistent with the skew of the histogram?Interpretation Wow! In Problems 5-13 you constructed a lot of information regarding the pH of west Texas groundwater based on sample data. Let's continue the investigation. Suppose you are working for the regional water commissioner. You have been asked to submit a brief report about the pH level in groundwater in the west Texas region. Write such a report and include appropriate graphs.Statistical Literacy Consider the mode, median, and mean. Which average represents the middle value of a data distribution? Which average represents the most frequent value of a distribution? Which average takes all the specific values into account?Statistical Literacy What symbol is used for the arithmetic mean when it is a sample statistic? What symbol is used when the arithmetic mean is a population parameter?Statistical Literacy Look at the formula for the mean. List the two arithmetic procedures that are used to compute the mean.Statistical Literacy In order to find the median of a data set, what do we do first with the data?Basic Computation: Mean, Median, Mode Find the mean, median, and mode of the data set. 82726Basic Computation: Mean, Median, Mode Find the mean, median, and mode of the data set. 1012201520Basic Computations Mean, Median, Mode Find the mean, median, and mode of the data set 827265Critical Thinking Consider a data set with at least three data values. Suppose the highest value is increased by 10 and the lowest is decreased by 5. (a) Does the mean change? Explain. (b) Does the median change? Explain. (c) Is it possible for the mode to change? Explain.Critical Thinking Consider a data set with at least three data values. Suppose the highest value is increased by 10 and the lowest is decreased by 10. (a) Does the mean change? Explain. (b) Does the median change? Explain. (c) Is it possible for the mode to change? Explain.Critical Thinking If a data set has an even number of data, is it true or false that the median is never equal to a value in the data set? Explain.Critical Thinking When a distribution is mound-shaped symmetric, what is the general relationship among the values of the mean, median, and mode?Critical Thinking Consider the following types of data that were obtained from a random sample of 49 credit card accounts. Identify all the averages (mean, median, or mode) that can be used to summarize the data. (a) Outstanding balance on each account (b) Name of credit card (e.g., MasterCard. Visa. American Express, etc.) (c) Dollar amount due on next paymentCritical Thinking Consider the numbers 23455 (a) Compute the mode, median, and mean (b) If the numbers represent codes for the colors of T-shirts ordered from a catalog, which average(s) would make sense? (c) If the numbers represent one-way mileages for trails to different lakes. which average(s) would nuke sense? (d) Suppose the numbers represent survey responses from 1 to 5, with 1 = disagree strongly, 2 = disagree, 3 = agree, 4 = agree strongly, and 5 = agree very strongly. Which averages make sense?Critical Thinking Consider two data sets. Set A:n=5;x=10 Set B:n=50;x=10 (a) Suppose the number 20 is included as an additional data value in Set A. Compute x for the new data set. Hint: x=nx. To compute x for the new data set. add 20 to x of the original data set and divide by 6. (b) Suppose the number 20 is included as an additional data value in Set B. Compute x for the new data set. (c) Why does the addition of the number 20 to each data set change the mean for Set A more than it does for Set B?Interpretation A Job-performance evaluation form has these categories: 1 = excellent; 2 = good; 3 = satisfactory; 4 = poor; 5 = unacceptable Based on 15 client reviews, one employee had median rating of 4: mode rating of 1 The employee was pleased that most clients had rated her as excellent. The supervisor said improvement was needed because at least half the clients had rated the employee at the poor or unacceptable level. Comment on the different perspectives.Critical Thinking: Data Transformation Using Addition In this problem, we explore the effect on the mean, median, and mode of adding the same number to each data value. Consider the data set 2, 2, 3,6, 10. (a) Compute the mode, median, and mean. (b) Add 5 to each of the data values. Compute the mode, median, and mean, (c) Compare the results of parts (a) and (b). In general, how do you think the mode, median, and mean are affected when the same constant is added to each data value in a set?Critical Thinking: Data Transformation Using Multiplication In this problem, we explore the effect on the mean, median, and mode of multiplying each data value by the same number. Consider the data set 2, 2, 3, 6, 10. (a) Compute the mode, median, and mean. (b) Multiply each data value by 5. Compute the mode, median, and mean. (c) Compare the results of parts (a) and (b). In general, how do you think the mode, median, and mean are affected when each data value in a set is multiplied by the same constant? (d) Suppose you have information about average heights of a random sample of airplane passengers. The mode is 70 inches, the median is 68 inches, and the mean is 71 inches. To convert the data into centimeters, multiply each data value by 2.54. What are the values of the mode, median, and mean in centimeters?Critical Thinking Consider a data set of 15 distinct measurements with mean A and median B. (a) If the highest number were increased, what would be the effect on the median and mean? Explain. (b) If the highest number were decreased to a value still larger than B, what would be the effect on the median and mean? (c) If the highest number were decreased to a value smaller than B, what would be the effect on the median and mean?Environmental Studies: Death Valley How hot does it get in Death Valley? The following data are taken from a study conducted by the National Park System, of which Death Valley is a unit. The ground temperatures (F) were taken from May to November in the vicinity of Furnace Creek. 146 152 168 174 180 178 179 180 178 178 168 165 152 144 Compute the mean, median, and mode for these ground temperatures.Ecology: Wolf Packs How large is a wolf pack? The following information is from a random sample of winter wolf packs in regions of Alaska, Minnesota, Michigan, Wisconsin, Canada, and Finland (Source: The Wolf, by L. D. Mech, University of Minnesota Press). Winter pack size: 13 10 7 5 7 7 2 4 3 2 3 15 4 4 2 8 7 8 Compute the mean, median, and mode for the size of winter wolf packs.Medical: Injuries The Grand Canyon and the Colorado River are beautiful, rugged, and sometimes dangerous. Thomas Myers is a physician at the park clinic in Grand Canyon Village. Dr. Myers has recorded (for a 5-year period) the number of visitor injuries at different landing points for commercial boat trips down the Colorado River in both the Upper and Lower Grand Canyon (Source: Fateful Journey by Myers. Becker, and Stevens). Upper Canyon: Number of Injuries per Landing Point Between North Canyon and Phantom Ranch 23113469313 Lower Canyon: Number of Injuries per Landing Point Between Bright Angel and Lava Falls 8110672143011321 (a) Compute the mean, median, and mode for injuries per landing point in the Upper Canyon. (b) Compute the mean, median, and mode for injuries per landing point in the Lower Canyon. (c) Compare the results of parts (a) and (b). (d) The Lower Canyon stretch had some extreme data values. Compute a 5% trimmed mean for this regain, and compare this result to the mean for the Upper Canyon computed in pan (a).Football: Age of Professional Players How old are professional football players? The 11th edition of The Pm Football Encyclopedia gave the following information. Random sample of pro football player ages in years: 24 23 25 23 30 29 28 26 33 29 24 37 25 23 22 27 28 25 31 29 25 22 31 29 22 28 27 26 23 21 25 21 25 24 22 26 25 32 26 29 (a) Compute the mean, median, and mode of the ages. (b) Interpretation Compare the averages. Does one seem to represent the age of the pro football players most accurately? Explain.Leisure: Maui Vacation How expensive is Maui? It you want a vacation rental condominium (up to four people), visit a Maui tourism web site. The Maui News gave the following costs in dollars per day for a random sample of condominiums located throughout the island of Maui. 89 50 68 60 375 55 500 71 40 350 60 50 250 45 45 125 235 65 60 130 (a) Compute the mean, median, and mode for the data. (b) Compute a 5% trimmed mean for the data, and compare it with the mean computed in part (a). Does the trimmed mean more accurately reflect the general level of the daily rental costs? (c) Interpretation If you were a travel agent and a client asked about the daily cost of renting a condominium on Maui, what average would you use? Explain. Is there any other information about the costs that you think might be useful, such as the spread of the costs?Basic Computation: Weighted Average Find the weighted average of a data set where10 has a weight of 5; 20 has a weight of 3; 30 has a weight of 2Basic Computation: Weighted Average Find the weighted average of a data set where 10 has a weight of 2; 20 has a weight of 3; 30 has a weight of 5Grades: Weighted Average In your biology class, your final grade is based on several things: a lab score, scores on two major tests, and your score on the final exam. There are 100 points available for each score. However, the lab score is worth 25% of your total grade, each major test is worth 22.5%, and the final exam is worth 30%. Compute the weighted average for the following scores: 92 on the lab, 81 on the first major test, 93 on the second major test, and 85 on the final exam.Merit Pay Scale: Weighted Average At General Hospital, nurses are given performance evaluations to determine eligibility for merit pay raises. The supervisor rates the nurses on a scale of 1 to 10 (10 being the highest rating) for several activities: promptness, record keeping, appearance, and bedside manner with patients. Then an average is determined by giving a weight of 2 for promptness. 3 for record keeping. 1 for appearance, and 4 for bedside manner with patients. What is the average rating for a nurse with ratings of 9 for promptness, 7 for record keeping. 6 for appearance, and 10 for bedside manner?EPA: Wetlands Where does all the water go? According to the Environmental Protection Agency (EPA), in a typical wetland environment. 38% of the water is outflow; 47% is seepage; 7% evaporates; and 8% remains as water volume in the ecosystem (Reference: U.S. Environmental Protection Agency Case Studies Report 832-R-93-005). Chloride compounds as residuals from residential areas are a problem for wetlands. Suppose that in a particular wetland environment the following concentrations (mg/L) of chloride compounds were found: outflow. 64.1; seepage, 75.8; remaining due to evaporation. 23.9; in the water volume, 68.2. (a) Compute the weighted average of chlorine compound concentration (mg/L) for this ecological system. (b) Suppose the EPA has established an average chlorine compound concentration target of no more than 58 mg/L. Comment on whether this wetlands system meets the target standard for chlorine compound concentration.Expand Your knowledge: Harmonic Mean When data consist of rates of change, such as speeds, the harmonic mean is an appropriate measure of central tendency. For n data values. Harmonic mean =n1x, assuming no data value is 0 Suppose you drive 60 miles per hour for 100 miles, then 75 miles per hour for 100 miles. Use the harmonic mean to find your average speed.Expand Your Knowledge: Geometric Mean When data consist of percentages, ratios, compounded growth rates, or other rates of change, the geometric mean is a useful measure of central tendency. For n data values, Geometricmean=Productofthenndatavalues, assuming all data values are positive To find the average growth factor over 5 years of an investment in a mutual fund with growth rates of 10% the first year, 12% the second year, 14.8% the third year, 3.8% the fourth year, and 6% the fifth year, take the geometric mean of 1.10, 1.12, 1.48, 1.038, and 1.06. Find the average growth factor of this investment.Statistical Literacy Which averagemean, median, or modeis associated with the standard deviation?Statistical Literacy What is the relationship between the variance and the standard deviation for a sample data set?Statistical Literacy When computing the standard deviation, does it matter whether the data are sample data or data comprising the entire population? Explain.Statistical Literary What symbol is used for the standard deviation when it is a sample statistic? What symbol is used for the standard deviation when it is a population parameter?Basic Computation: Range, Standard Deviation Consider the data set 23456 (a) Find the range. (b) Use the defining formula to compute the sample standard deviation s. (c) Use the defining formula to compute the population standard deviation .Basic Computation: Range, Standard Deviation Consider the data set 12345 (a) Find the range. (b) Use the defining formula to compute the sample standard deviation s. (c) Use the defining formula to compute the population standard deviation .Critical Thinking Fora given data set in which not all data values are equal, which value is smaller, s or ? Explain.Critical Thinking Consider two data sets with equal sample standard deviations. The first data set has 20 data values that are not all equal, and the second has 50 data values that are not all equal. For which data set is the difference between s and greater? Explain. Hint: Consider the relationship =s(n1)/n.Critical Thinking Each of the following data sets has a mean of x=10. (i)89101112(ii)79101113(iii)78101213 (a) Without doing am computations, order the data sets according to increasing value of standard deviations. (b) Why do you expect the difference in standard deviations between data sets (i) and (ii) to be greater than the difference in standard deviations between data sets (ii) and (iii)? Hint: Consider how much the data in the respective sets differ from the mean.Critical Thinking: Data Transformation Using Addition In this problem, we explore the effect on the standard deviation of adding the same constant to each data value in a data set. Consider the data set 5, 9, 10, 11, 15. (a) Use the defining formula, the computation formula, or a calculator to compute s. (b) Add 5 to each data value to get the new data set 10, 14, 15, 16, 20. Compute s. (c) Compare the results of pads (a) and (b). In general, how do you think the standard deviation of a data set changes if the same constant is added to each data value?Critical Thinking: Data Transformation Using Multiplication In this problem, we explore the effect on the standard deviation of multiplying each data value in a data set by the same constant. Consider the data set 5, 9, 10, 11, 15. (a) Use the defining formula, the computation formula, or a calculator to compute s. (b) Multiply each data value by 5 to obtain the new data set 25, 45, 50, 55, 75. Compute s. (c) Compare the results of pans (a) and (b). In general, how does the standard deviation change if each data value is multiplied by a constant c? (d) You recorded the weekly distances you bicycled in miles and computed the standard deviation to be s = 3.1 miles. Your friend wants to know the standard deviation in kilometers. Do you need to redo all the calculations? Given 1 mile = 1.6 kilometers, what is the standard deviation in kilometers?Critical Thinking: Outliers One indicator of an outlier is that an observation is more than 2.5 standard deviations from the mean. Consider the data value 80. (a) If a data set has mean 70 and standard deviation 5, is 80 a suspect outlier? (b) If a data set has mean 70 and standard deviation 3, is 80 a suspect outlier?Basic Computation: Variance, Standard Deviation Given the sample data x: 2317153025 (a) Find the range. (b) Verify that x=110andx2=2568. (c) Use the results of part (b) and appropriate computation formulas to compute the sample variance s2 and sample standard deviation s. (d) Use the defining formulas to compute the sample variance s2 and sample standard deviation s. (e) Suppose the given data comprise the entire population of all x values. Compute the population variance 2 and population standard deviation .Basic Computation: Coefficient of Variation, Chebyshev Interval Consider sample data with x=15 and s = 3. (a) Compute the coefficient of variation. (b) Compute a 75% Chebyshev interval around the sample mean.Basic Computation: Coefficient of Variation. Chebyshev Interval Consider population data with =20 and =2. (a) Compute the coefficient of variation. (b) Compute an 88.9% Chebyshev interval around the population mean.Investing: Stocks and Bonds Do bonds reduce the overall risk of an investment portfolio? Let x be a random variable representing annual percent return for Vanguard Total Stock Index (all stocks). Let y be a random variable representing annual return for Vanguard Balanced Index (60% stock and 40% bond). For the past several years. we have the following data (Reference: Morningstar Research Group. Chicago): x: 11 0 36 21 31 23 24 -11 -11 -21 y: 10 -2 29 14 22 18 14 -2 -3 -10 (a) Compute x,x2,y,andy2. (b) Use the results of part (a) to compute the sample mean, variance, and standard deviation for x and for y. (c) Compute a 75% Chebyshev interval around the mean for x values and also for y values. Use the intervals to compare the two funds. (d) Interpretation: Compute the coefficient of variation for each fund. Use the coefficients of variation to compare the two funds. If s represents risks and x represents expected return, then s/x can he thought of as a measure of risk per unit of expected return. In this case, why is a smaller CV better? Explain.Space Shuttle: Epoxy Kevlar epoxy it a material used on the NASA space shuttles. Strands of this epoxy were tested at the 90% breaking strength. The following data represent time to failure (in hours) for a random sample of 50 epoxy strands (Reference: R. E. Barlow, University of California. Berkeley). Let x be a random variable representing time to failure (in hours) at 90% breaking strength. Note: These data are also available for download at the Companion Sites for this text. 0.54 1.80 1.52 2.05 1.03 1.18 0.80 1.33 1.29 1.11 3.34 1.54 0.08 0.12 0.60 0.72 0.92 1.05 1.43 3.03 1.81 2.17 0.63 0.56 0.03 0.09 0.18 0.34 1.51 1.45 1.52 0.19 1.55 0.02 0.07 0.65 0.40 0.24 1.51 1.45 1.60 1.80 4.69 0.08 7.89 1.58 1.64 0.03 0.23 0.72 (a) Find the range. (b) Use a calculator to verify that x=62.11andx2=164.23. (c) Use the results of part (b) to compute the sample mean, variance, and standard deviation for the time to failure. (d) Interpretation Use the results of part (c) to compute the coefficient of variation. What does this number say about time to failure? Why does a small CV indicate more consistent data, whereas a larger CV indicates less consistent data? Explain.Archaeology: Ireland The Hill of Tara in Ireland is a place of great archaeological importance. This region has been occupied by people for more than 4000 years. Geomagnetic surveys detect subsurface anomalies in the earth's magnetic field. These surveys have led to many significant archaeological discoveries. After collecting data, the next step is to begin a statistical study. The following data measure magnetic susceptibility (centimeter-gram-second x 10-6) on two of the main grids of the Hill of Tara (Reference: Tara: An ArchaeologicalSurvey by Conor Newman. Royal Irish Academy. Dublin). Grid E: x variable 13.20 5.60 19.80 15.05 21.40 17.25 27.45 16.95 23.90 32.40 40.75 5.10 17.75 28.35 Grid H: y variable 11.85 15.25 21.30 17.30 27.50 10.35 14.90 48.70 25.40 25.95 57.60 34.35 38.80 41.00 31.25 (a) Compute x,x2,y,andy2. (b) Use the results of part (a) to compute the sample mean, variance, and standard deviation for x and for y. (c) Compute a 75% Chebyshev interval around the mean for x values and also for y values. Use the intervals to compare the magnetic susceptibility on the two grids. Higher numbers indicate higher magnetic susceptibility. However, extreme values, high or low. could mean an anomaly and possible archaeological treasure. (d)Interpretation Compute the sample coefficient of variation for each grid. Use the CVs to compare the two grids. If s represents variabilityin the signal (magnetic susceptibility) and x represents the expected level of the signal, then s/x can he thought of as a measure of the variability per unit of expected signal. Remember, a considerable variability in the signal (above or below average) might indicate buried artifacts, Why, in this case, would a large CV be better or at least more exciting? Explain.Wildlife: Mallard Ducks and Canada Geese For mallard ducks and Canada geese, what percentage of nests are successful (at least one offspring survives)? Studies in Montana, Illinois, Wyoming, Utah, and California gave the following percentages of successful nests (Reference: The Wildlife Society Press. Washington. D.C.). x: Percentage success for mallard duck nests 5685521339 y: Percentage success for Canada goose nests 2453606918 (a) Use a calculator to verify that x=245,x2=14.755,y=224,andy2=12.070. (b) Use the results of part (a) to compute the sample mean, variance, and standard deviation for x, the percent of successful mallard nests. (c) Use the results of part (a) to compute the sample mean, variance, and standard deviation for y. the percent of successful Canada goose nests. (d) Interpretation Use the results of parts (b) and (c) to compute the coefficient of variation for successful mallard nests and Canada goose nests. Write a brief explanation of the meaning of these numbers. What do these results say about the nesting success rates for mallards compared to those of Canada geese? Would you say one group of data is more or less consistent than the other? Explain.Investing: Socially Responsible Mutual Funds Pax World Balanced isa highly respected, socially responsible mutual fund of stocks and bonds (see View point). Vanguard Balanced Index is another highly regarded fund that represents the entire U.S. stock and bond market (an index fund). The mean and standard deviation of annualized percent returns are shown below. The annualized mean and standard deviation are for a recent 10-year period (Source: Fund Reports). Pax World Balanced: x=9.58;s=14.05 Vanguard Balanced Index: x=9.02;s=12.50 (a) Interpretation Compute the coefficient of variation for each fund. If x represents return and s represents risk, then explain why the coefficient of variation can be taken to represent risk per unit of return. From this point of view, which fund appears to be better? Explain. (b) Interpretation Compute a 75% Chebyshev interval around the mean for each fund. Use the intervals to compare the two funds. As usual, past performance does not guarantee future performance.Medical: Physician Visits In some reports, the mean and coefficient of variation are given. For instance, in Statistical Abstract of the United States. 116th edition, one report gives the average number of physician visits by males per year. The average reported is 2.2, and the reported coefficient of variation is 1.5%. Use this information to determine the standard deviation of the annual number of visits to physicians made by males.Grouped Data: Anthropology What was the age distribution of prehistoric Native Americans? Extensive anthropologic studies in the southwestern United States gave the following information about a prehistoric extended family group of 80 members on what is now the Navajo Reservation in northwestern New Mexico (Source: Based on information taken from Prehistory in the Navajo Reservation District. by F. W. Eddy. Museum of New Mexico Press). Age range (years) 1-10* 11-20 21-30 31 and over Number of Individuals 34 18 17 11 Includes interest For this community, estimate the mean age expressed in years, the sample variance, and the sample standard deviation. For the class 31 and over, use 35.5 as the class midpoint.Grouped Data: Shoplifting What is the age distribution of adult shoplifters (21 years of age or older) in supermarkets? The following is based on information taken from the National Retail Federation. A random sample of 895 incidents of shoplifting gave the following age distribution: Age range (years) 21-30 31-40 41 and over Number of shoplifters 260 348 287 Estimate the mean age. sample variance. and sample standard deviation for the shoplifters. For the class 41 and over, use 45.5 as the class midpoint.Grouped Data: Hours of Sleep per Day Alexander Borbely is a professorat the University of Zurich Medical School, where he is director of the sleep laboratory. The histogram in Figure 3-2 is based on information from his book Secrets of Sleep. The histogram displays hours of sleep per day for a random sample of 200 subjects, Estimate the mean hours of sleep, the standard deviation of hours of sleep, and the coefficient of variation.Grouped Data: Business Administration What are the big corporations doing with their wealth? One way to answer this question is to examine profits as percentage of assets. A random sample of 50 Fortune 500 companies gave the following information (Source: Based on information from Fortune 500. Vol. 135. No. 8). Profit as percentage of assets 8.6-12.5 12.6-16.5 16.6-20.5 20.6-24.5 24.6-28.5 Number of companies 15 20 5 7 3 Estimate the sample mean, the sample variance, and the sample standard deviation for profit as percentage of assets.Expand Your knowledge: Moving Averages You do not need a lot of money to invest in a mutual fund. However, if you decide to put some money into an investment, you are usually advised to leave it in for (at least) several years. Why? Because good years tend to cancel out had years, giving you a better overall return with less risk. To see what we mean, let's use a 3-year moving average on the Calvert. Social Balanced Fund (a sociallyresponsible fund). Year 1 2 3 4 S 6 7 8 9 10 11 % Return 1.78 17.79 744 5.95 -4.74 25.85 9.03 18.92 17.49 6.80 -2.38 Source Fund Reports (a) Use a calculator with mean and standard deviation keys to verify that the mean annual return for all 11 years is approximately 9.45%, with standard deviation 9.57%. (b) To compute a 3-year moving average for 1992. we take the data values for year 3 and the poor 2 years and average them. To compute a 3-year moving average for year 4, we take the data values for year 4 and the poor 2 years and average them. Verify that the following 3-year moving averages are correct. Year 3 4 5 6 7 8 9 10 11 3-year moving average 9.01 10.40 2.89 9.02 10.05 17.93 15.15 14.40 7.30 (c) Use a calculator with mean and standard deviation keys to verify that for the 3-year moving average, the mean is 10.68% with sample standard deviation 4.53%. (d) Interpretation Compare the results of parts (a) and (c). Suppose we take the point of view that risk is measured by standard deviation. Is the risk (standard deviation) of the 3-year moving average considerably smaller? This is an example of a general phenomenon that will he studied in more detail in Chapter 6.27PExpand Your knowledge: Stratified Sampling and Students in SixthGrade This is a technique to break down the variation of a random variable into useful components (called stratum) in order to decrease experimental variation and increase accuracy of results. It has been found that a more accurate estimate of population mean can often be obtained by taking measurements from naturally occurring subpopulations and combining the results using weighted averages. For example, suppose an accurate estimate of the mean weight of sixth grade students is desired for a large school system. Suppose (for cost reasons) we can only take a random sample of m = 100 students. Instead of taking a simple random sample of 100 students from the entire population of all sixth grade students, we use stratified sampling as follows. The school system under study consists three large schools. School A has N1=310 sixth-grade students. School B has N2=420 sixth-grade students. and School C has sixth-grade students. This is a total population of 1246 sixth-grade students in our study and we have strata consisting of the 3 schools. A preliminary study in each school with relatively small sample size has given estimates for the sample standard deviation s of sixth-grade student weights in each school. These are shown in the following table: School A School B School C N1=310 N2=420 N3=516 s1=3lb s2=12lb s3=6lb How many students should we randomly choose from each school for a best estimate for the population mean weight? A lot of mathematics goes into the answer. Fortunately. Bill Williams of Bell Laboratories wrote a book called A Sampler on Sampling (John Wiley and Sons, publisher), which provides an answer. Let n1 be the number of students randomly chosen from School A. n2, be the number chosen from School B, and n1 be the number chosen from School C. This means our total sample size will be m=n1+n2+n3. What is the formula for n1? A popular and widely used technique is the following. n1=[N1s1N1s1+N2s2+N3s3]m The n1 are usually whole numbers, so we need to round to the nearest whole number. This formula allocates more students to schools that have a larger population of sixth graders and/or have larger sample standard deviations Remember, this is a popular and widely used technique for stratified sampling. It is not an absolute rule. There are other methods of stratified sampling also in use. In general practice, according to Bill Williams, the use of naturally occurring strata seems to reduce overall variability in measurements by about 20% compared to simple random samples taken from the entire (unstratified) population. Now suppose you have taken a random sample size n1 from each appropriate school and you got a sample mean weight x from each school. How do you get the best estimate for population mean weight of the all 1246 students? The answer is that we use a weighted average. =n1mx1+n2mx2+n3mx3 COMMENT: This is an example with three strata. Applications with any number of strata can be solved in a similar way with obvious extensions of formulas. (a) Compute the size of the random samples n1,n2,n3 to be taken from each school. Round each sample size to the nearest whole number and make sure they add up to m = 100. (b) Suppose you took the appropriate random sample from each school and you got the following average student weights: x1=82lb,x2=115lb,x2=90lb. Compute your best estimate for the population mean weight .Expand Your knowledge: Stratified Sampling and Politics Three local districts are swing" districts for an upcoming election on a contentious political issue. A survey will be conducted in which voters will he asked to rate their opinion regarding this issue on a scale of 0 (strongly oppose) to 10 (strongly support). A small preliminary random sample from each district was used to estimate the sample standard deviations of responses for the district. The following table slums the number of voters N in each district and the sample standard deviations of strength of support in each district. District 1 District 2 District 3 N1=1525 N2=917 N3=2890 s1=2.2 s2=1.4 s3=3.3 We have a total population of 5332 voters and 3 strata (districts). The group doing the survey has enough funding to obtain a random sample of m = 250 total responses from all the districts. (a) Compute the size of the random samples n1,n2,n3, to taken from each district. Round each sample size to the nearest whole number and make sure they add up to m = 250. Hint: Sec Problem 28.30PStatistical Literacy Angela took a general aptitude test and scored in the 82nd percentile for aptitude in accounting. What percentage of the scores were at or below her score? What percentage were above?Statistical Literacy One standard for admission to Redfield College is that the student rank in the upper quartile of his or her graduating high school class. What is the minimal percentile rank of a successful applicant?Critical Thinking The town of Butler, Nebraska, decided to give a teacher-competency exam and defined the passing scores to be those in the 70th percentile or higher. The raw test scores ranged from 0 to 100. Was a raw wore of 82 necessarily a passing score? Explain.Critical Thinking Clayton and Timothy took different section of Introduction to Economics. Each section had a different final exam. Timothy scored 83 out of 100 and had a percentile rank in his class of 72. Clayton scored 85 out of 100 but his percentile rank in his class was 70. Who performed better with respect to the rest of the students in the class. Clayton or Timothy? Explain your answer.Basic Computation: Five-Number Summary, Interquartile Range Consider the following ordered data: 2556778910 (a) Find the low. Q1, median, Q2, high. (b) Find the interquartile range. (c) Make a box-and-whisker plot.Basic Computation: Five-Number Summary Interquartile Range Consider the following ordered data: 255678891012 (at Find the low, Q1, median, Q2, high. (b) Find the interquartile range. (c) Make a box-and-whisker plot.Health Care: Nurses At Center Hospital there is some concern about the high turnover of nurses. A survey was done to determine how long (in months) nurses had been in their current positions. The responses (in months) of 20 nurses were 23 2 5 14 25 36 27 42 12 8 7 23 29 26 28 11 20 31 8 36 Make a box-and-whisker plot of the data. Find the interquartile range.Health Care: Staff Another survey was done at Center Hospital to determine how long (in months) clerical staff had been in their current positions. The responses (in months) of 20 clerical staff members were 25 22 7 24 26 31 IX 14 17 20 31 42 6 25 22 3 29 32 15 72 (a) Make a box-and-whisker plot. Find the interquartile range. (b) Compare this plot with the one in Problem 7, Discuss the locations of the medians, the locations of the middle halves of the data hanks, and the distances from Q1 and Q3 to the extreme values.Sociology: College Graduates What percentage of the general U. S. population have bachelors degrees? The Statistical Abstractof the United States. 120th edition, gives the percentage of bachelors degrees by state. For convenience. the data are Mined in increasing order. 17 18 18 18 19 20 20 20 21 21 21 21 22 22 22 22 22 22 23 23 24 24 24 24 24 24 24 24 25 26 26 26 26 26 26 27 27 27 27 27 28 28 29 31 31 32 32 34 35 38 (a) Make a box-and-whisker plot and find the interquartile range. (b) Illinois has a bachelors degree percentage rate of about 26%. Into what quartile does this rate fall?Sociology: High School Dropouts What percentage of the general U.S. population are high school dropout? The Statistical Abstract of the United States, 120th edition gives the percentage of high school dropouts by state. For convenience, the data are sorted in increasing order. 5 6 7 7 7 7 8 8 8 8 8 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 13 (a) Make a box-and-whisker plot and find the interquartile range. (b) Wyoming has a dropout rate of about 7%. Into what quartile does this rate fall?Auto Insurance: Interpret GraphsConsumer Reports rated automobile insurance companies and listed annual premiums for top-rated companies in several states. Figure 3-9 shows box-and-whisker plots for annual premiums for urban customers (married couple with one 17-year-old son) in three states. The box-and-whisker plots in Figure 3-9 were all drawn using the same scale on a Tl-84Plus/Tl-83Plus/Tl-n spire calculator. (a) Which state has the lowest premium? The highest? (b) Which state has the highest median premium? (c) Which state has the smallest range of premiums? The smallest interquartile range? (d) Figure 3-10 gives the five-number summaries generated on the Tl-84Plus/Tl-83Plus/Tl-n spire calculators for the box-and-whisker plots of Figure 3-9. Match the five-number summaries to the appropriate box-and-whisker plots.Expand Your knowledge: Outliers Some data include values so high or so low that they seem to stand apart from the rest of the data. These data are called outliers. Outliers may represent data collection errors, data entry errors, or simply valid hut unusual data values. It is important to identify outliers in the data set and examine the outliers carefully to determine if they are in error. One way to detect outliers is to use a box-and-whisker plot. Data values that fall beyond the limits. Lower limit: Q11.5(IQR) Upper limit: Q31.5(IQR) where IQR is the interquartile range, are suspected outliers. In the computer software package Minitab, values beyond these limits are plotted with asterisks (*). Students from a statistics class mere asked to record their heights in inches. The heights (as recorded) were 65 72 68 64 60 55 73 71 52 63 61 74 69 67 74 50 4 75 67 62 66 80 64 65 (a) Make a box-and-whisker plot of the data. (b) Find the value of the interquartile range (IQR). (c) Multiply the IQR by 1.5 and find the lower and upper limits. (d) Are there any data values below the lower limit? Above the upper limit? List any suspected outliers. What might be some explanations for the outliers?Terminology Consider the equation of a least-squares line y=3+2x based on data pairs (x, y). Which variable is the explanatory variable, and which is the response variable?Terminology Consider the values of the sample correlation coefficient r: close to 1, close to 0, close to -1. Match the values to the appropriate description. (i) indicates little or no linear relationship between the values of x and y in the ordered pairs (x, y) (ii) indicates that the linear relation between values of x and y in the ordered pairs (x, y) is almost perfect and is such that higher values of x correspond to higher values of y. (iii) indicates that the linear relation between values of x and y in the ordered pairs (x. y) is almost perfect and is such that higher values of x correspond to lower values of y.Terminology Suppose we have a set of ordered pairs (x, y). If we use a least-squares line to predict y values for x values beyond the observed x values in the data set, are we using extrapolation or interpolation?Terminology Consider the following terms in a linear regression model: slope, correlation coefficient, residual. Match each term to the appropriate description (i) the number of units y changes for each unit change in x. (ii)a measure of the strength of linear correlation between the x and y variable. (iii)the difference between the predicted y value for a specified x and the value of y paired with x in a data set.Statistical Literacy Suppose the scatter diagram of a random sample of data pairs (x, y) shows no linear relationship between x and y. Do you expect the value of the sample correlation coefficient r to be close to 1, -1, or 0?Critical Thinking Suppose you and a friend each take different random samples of data pairs (x, y) from the same population. Assume the samples are the same size. Based on your samples, you compute r = 0.83. Based on her sample, your friend computes r = 0.79. Is your friends value for r wrong?Statistical Literacy When using the least-squares line for prediction, are results usually more reliable for extrapolation or interpolation?StatisticalLiteracy Suppose that for x = 3. the predicted value is y = 6. The data pair (3. 8) is part of the sample data. What is the value of the residual for x = 3? In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b. and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model?In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b, and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model? Desert Ecology: Wildlife Bighorn sheep are beautiful wild animals found throughout the western United States. Data for this problem are based on information taken from The Desert Bighorn, edited by Monson and Sumner (University of Arizona Press). Let x be the age of a bighorn sheep (in years), and let y be the mortality rate (percent that die) for this age group. For example, x = 1, y = 14 means that 14% of the bighorn sheep between 1 and 2 years old die. A random sample of Arizona bighorn sheep gave the following information: x 1 2 3 4 5 y 14 18.9 14.4 19.6 20.0 Complete parts (a) through (c). given x=15;y=86.9;x2=55;y2=1544.73;xy=273.4.In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b, and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model? Sociology: Job Changes A sociologist is interested in the relation between x = number of job changes and y = annual salary (in thousands of dollars) for people living in the Nashville area. A random sample of 10 people employed in Nashville provided the following information: x (Number of job changes) 4 7 5 6 1 5 9 10 10 3 y (Salary in $1000) 33 37 34 32 32 38 43 37 40 33 Complete parts (a) through (c), given x=60;y=359;x2=442;y2=13.013;xy=2231. (d) If someone had x= 2 job changes, what does the least-squares line predict for y, the annual salary?In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b, and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model? Medical: Fat Babies Modem medical practice tells us not to encourage babies to become too fat. Is there a positive correlation between the weight x of a 1-year-old baby and the weight y of the mature adult (30 years old)? A random sample of medical files produced the following information for 14 females: x(lb) 21 25 23 24 20 15 25 21 17 24 26 22 18 19 y (lb) 125 125 120 125 130 120 145 130 130 130 130 140 110 115 Complete parts (a) through (c). given x=300;y=1775:x2=6572;y2=226.125:xy=38.220. (d) If a female baby weighs 20 pounds at 1 year, what do you predict she will weigh at 30 years of age?In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b, and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model? Sales: Insurance Dorothy Kelly sells life insurance for the Prudence Insurance Company. She sells insurance by making visits to her clients' homes. Dorothy believes that the number of sales should depend, to some degree, on the number of visits made. For the past several years, she has kept careful records of the number of visits (x) she makes each week and the number of people (y) who buy insurance that week. For a random sample of 15 such weeks, the x and y values follow: X 11 19 16 13 28 5 20 14 22 7 15 29 8 25 16 y 3 11 8 5 8 2 5 6 8 3 5 10 6 10 7 Complete parts (a) through (c). given x=248;y=97;x2=4856;y2=731;xy=1825. (d) In a week during which Dorothy makes 18 visits, how many people do you predict will buy insurance from her?In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b, and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model? Marketing: Coupons Each box of Healthy Crunch breakfast cereal contains a coupon entitling you to a free package of garden seeds. At the Healthy Crunch home office, they use the weight of incoming mail to determine how many of their employees are to be assigned to collecting coupons and mailing out seed packages on a given day. (Healthy Crunch has a policy of answering all its mail on the day it is received.) Let x = weight of incoming mail and y = number of employees required to process the mail in one working day. A random sample of 8 days gave the following data: x(lb) 11 20 16 6 12 18 23 25 y (Number of employees) 6 10 9 5 8 14 13 16 Complete parts (a) through (c), given x=131;y=81:x2=2435;y2927;xy=1487 (d) If Healthy Crunch receives 15 pounds of mail, how many employees should be assigned mail duty that day?In Problems 9-14, (a) Draw a scatter diagram for the data. (b) Find x,y,b, and the equations of the least-squares line. Plot the line on the scatter diagram of part (a). (c) Find the sample correlation coefficient r and the coefficient of determination r2. What percentage of variation in y is explained by the least-squares model? Focus Problem: Changing Population and Crime Rate Let x be a random variable representing percentage change in neighborhood population in the past lew years, and let y be a random variable representing crime rate (crimes per 1000 population). A random sample of six Denver neighborhoods gave the following information (Source: Neighborhood Facts. The Piton Foundation). X 29 2 11 17 7 6 y 173 35 132 127 69 53 Complete parts (a) through (c), given x=72;y=589;x2=1340;y2=72.277;xy=9499 (d) For a neighborhood with x = 12 change in population in the past few years, predict the change in the crime rale (per 1000 residents).1UTA2UTA3UTA4UTAThe data in this section are taken from this source: Based on King. Cuchlaine A. M. Physical Geography. Oxford: Basil Blackwell. Throughout the world, natural ocean beaches are beautiful sights to see. If you have visited natural beaches, you may have noticed that when the gradient or dropoff is steep, the grains of sand tend to be larger. In fact, a man-made beach with the "wrong size granules of sand tends to be washed away and eventually replaced when the proper size grain is selected by the action of the ocean and the gradient of the bottom. Since man-made beaches are expensive, grain size is an important consideration. In the data that follow ,x= median diameter (in millimeters) of granules of sand, and y = gradient of beach slope in degrees on natural ocean beaches. x y 0.17 0.63 0.19 0.70 0.22 0.82 0.235 0.88 0.235 1.15 0.30 1.50 0.35 4.40 0.42 7.30 0.85 11.30 Suppose you have a truckload of sifted sand in which the median size of granules is 0.38 mm. If you want to put this sand on a beach and you don't want the sand to wash away, then what does the least-squares line predict for the angle of the beach? Note: Heavy storms that produce abnormal waves may also wash out the sand. However, in the long run, the size of sand granules that remain on the beach or that are brought back to the beach by long-term wave action are determined to a large extent by the angle at which the beach drops off. What range of angles should the beach have if we want to be 90 confident that we are matching the size of our sand granules (0.38mm) to the proper angle of the beach?The data in this section are taken from this source: Based on King. Cuchlaine A. M. Physical Geography. Oxford: Basil Blackwell. Throughout the world, natural ocean beaches are beautiful sights to see. If you have visited natural beaches, you may have noticed that when the gradient or dropoff is steep, the grains of sand tend to be larger. In fact, a man-made beach with the wrong size granules of sand tends to be washed away and eventually replaced when the proper size grain is selected by the action of the ocean and the gradient of the bottom. Since man-made beaches are expensive, grain size is an important consideration. In the data that follow, x = median diameter (in millimeters) of granules of sand, and y = gradient of beach slope in degrees on natural ocean beaches. x y 0.17 0.63 0.19 0.70 0.22 0.82 0.235 0.88 0.235 1.15 0.30 1.50 0.35 4.40 0.42 7.30 0.85 11.30 Suppose we now have a truckload of sifted sand in which the median size of the granules is 0.45 mm. Repeat Problem 5. Suppose you have a truckload of sifted sand in which the median size of granules is 0.38 mm. If you want to put this sand on a beach and you dont want the sand to wash away, then what does the least-squares line predict for the angle of the beach? Note: Heavy storms that produce abnormal waves may also wash out the sand. However, in the long run, the size of sand granules that remain on the beach or that are brought back to the beach by long-term wave action are determined to a large extent by the angle at which the beach drops off. What range of angles should the beach have if we want to be 90% confident that we are matching the size of our sand granules (0.38 mm) to the proper angle of the beach?Note: Answers may vary due to rounding. Statistical Literacy When drawing a scatter diagram, along which axis is the explanatory variable placed? Along which axis is the response variable placed?Note: Answers may vary due to rounding. Statistical Literacy Suppose two variables are positively correlated. Does the response variable increase or decrease as the explanatory variable increases?Note: Answers may vary due to rounding. Statistical Literacy Suppose two variables are negatively correlated. Does the response variable increase or decrease as the explanatory variable increases?Note: Answers may vary due to rounding. Statistical Literacy Describe the relationship between two variables when the correlation coefficient r is (a) near -1. (b) near 0. (c) near 1.Critical Thinking: Linear Correlation Look at the following diagrams. Which show high linear correlation, moderate or low linear correlation, or linear correlation?Critical Thinking: Linear Correlation Look at the following diagrams. Which show high linear correlation, moderate or low linear correlation, or no linear correlation?Critical Thinking: Lurking Variables Over the past few years, there has been a strong positive correlation between the annual consumption of diet soda drinks and the number of traffic accidents. (a) Do you think increasing consumption of diet soda drinks causes traffic accidents? Explain. (b) What lurking variables might be causing the increase in one or both of the variables? Explain.Critical Thinking: Lurking Variables Over the past decade, there has been a strong positive correlation between teacher salaries and prescription drug costs. (a) Do you think paying teachers more causes prescription drugs to cost more? Explain. (b) What lurking variables might be causing the increase in one or both of the variables? Explain.Critical Thinking: Lurking Variables Over the past 50 years, there has been a strong negative correlation between average annual income and the record time to run 1 mile. In other words, average annual incomes have been rising while the record time to run 1 mile has been decreasing. (a) Do you think increasing income cause decreasing times to run the mile? Explain. (b) What lurking variables might be causing the change in one or both of the variables? Explain.Critical Thinking: Lurking Variables Over the past 30 years in the United States, there has been a strong negative correlation between the number of infant deaths at birth and the number of people over age 65. (a) Is the fact that people are living longer causing a decrease in infant mortalities at birth? (b) What lurking variables might be causing the increase in one or both of the variables? Explain.Interpretation Trevor conducted a study and found that the correlation between the price of a gallon of gasoline and gasoline consumption has a linear correlation coefficient of -0.7. What does this result say about the relationship between price of gasoline and consumption? The study included gasoline prices ranging from $2.70 to $5.30 per gallon. Is it reliable to apply the results of this study to prices of gasoline higher than $5.30 per gallon? Explain.Interpretation Do people who spend more time on social networking sites spend more time using Twitter? Megan conducted a study and found that the correlation between the times spent on the two activities was 0.8. What does this result say about the relationship between times spent on the two activities? If someone spends more time than average on a social networking site, can you automatically conclude that he or she spends more time than average using Twitter? Explain.Veterinary Science: Shetland Ponies How much should a healthy Shetland pony weigh? Let x be the age of the pony (in months), and let y be the average weight of the pony (in kilograms). The following information is based on data taken from The Merck Veterinary Manual (a reference used in most veterinary colleges). x 3 6 12 18 24 y 60 95 140 170 185 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x=63,x2=1089,y=650, y2=95,350,andxy=9930. Compute r. As x increases from 3 to 24 months, does the value of r imply that y should tend to increase or decrease? Explain.Health Insurance:Administrative Cost The following data are based on information from Domestic Affairs. Let x be the average number of employees in a group health insurance plan, and let y be the average administrative cost as a percentage of claims. x 3 7 15 35 75 y 40 35 30 25 18 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x=135,x2=7133,y=148,y2=4674,andxy=3040. Compute r. As x increases from 3 to 75, does the value of r imply that y should tend to increase or decrease? Explain.Meteorology: Cyclones Can a low barometer reading be used to predict maximum wind speed of an approaching tropical cyclone? Data for this problem are based on information taken from Weatherwise (Vol. 46, No. 1), a publication of the American Meteorological Society. For a random sample of tropical cyclones, let x be the lowest pressure (in millibars) as a cyclone approaches, and let y be the maximum wind speed (in miles per hour) of the cyclone. x 1004 975 992 935 985 932 y 40 100 65 145 80 150 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low. moderate, or strong? positive or negative? (c) Use a calculator to verify that x=5823,x2=5,655,779,y=580,y2=65,750,andxy=556,315. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain.Geology: Earthquakes Is the magnitude of an earthquake related to the depth below the surface at which the quake occurs? Let x be the magnitude of an earthquake (on the Richter scale), and let y be the depth (in kilometers) of the quake below the surface at the epicenter. The following is based on information taken from the National Earthquake Information Service of the U.S. Geological Survey. Additional data may be found by visiting the web site for the service. x 2.9 4.2 3.3 4.5 2.6 3.2 3.4 y 5.0 10.0 11.2 10.0 7.9 3.9 5.5 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x=24.1,x2=85.75,y=53.5,y2=458.31,andxy=190.18. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain.Baseball: Batting Averages and Home Runs In baseball, is there a linear correlation between batting average and home run percentage? Let x represent the batting average of a professional baseball player, and let y represent the player's home run percentage (number of home runs per 100 times at bat). A random sample of n = 7 professional baseball players gave the following information (Reference: The Baseball Encyclopedia. Macmillan Publishing Company). x 0.243 0.259 0.286 0.263 0.268 0.339 0.299 y 1.4 3.6 5.5 3.8 3.5 7.3 5.0 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low. moderate, or high? positive or negative? (c) Use a calculator to verify that x=1.957,x20.553,y=30.1,y2=150.15,andxy8.753. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain.University Crime: FBI Report Do larger universities tend to have more property crime? University crime statistics are affected by a variety of factors. The surrounding community, accessibility given to outside visitors, and many other factors influence crime rates. Let x be a variable that represents student enrollment (in thousands) on a university campus, and let y be a variable that represents the number of burglaries in a year on the university campus. A random sample of n=8 universities in California gave the following information about enrollments and annual burglary incidents (Reference: Crime in the United States, Federal Bureau of Investigation). x 12.5 30.0 24.5 14.3 7.5 27.7 16.2 20.1 y 26 73 39 23 15 30 15 25 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low, moderate, or high? Positive or negative? (c) Using a calculator, verify that x=152.8,x2=3350.98,y=246,y2=10,030,andxy=5488.4. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain.19P