Final Exam Practice A Solution

docx

School

Iowa State University *

*We aren’t endorsed by this school

Course

421

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

17

Uploaded by hengxu2017

Report
1 Stat 421: Final Exam Formula Sheet z .005 = 2.576 z .025 = 1.96 z .05 = 1.645 1. ( N n ) = N ! n! ( N n ) ! s 2 = 1 n 1 i = 1 n ( y i ¯ y ) 2 2. ¯ y = 1 n i = 1 n y i ^ V ( ¯ y ) = s 2 n ( 1 n N ) 3. ^ t = N ¯ y ^ V ( ^ t ) = N 2 ^ V ( ¯ y ) 4. ^ p = ¯ y ^ V ( ^ p ) = ^ p ( 1 − ^ p ) n 1 ( 1 n N ) 5. n = z α / 2 2 s 2 e 2 + z α / 2 2 s 2 N n = z α / 2 2 N 2 s 2 e 2 + z α / 2 2 Ns 2 6. ^ t ψ = 1 n i A Q i y i ψ i ^ V ( ^ t ψ )= 1 n ( n 1 ) i A Q i [ y i ψ i ^ t ψ ] 2 ¯ ^ y ψ = ^ t ψ N ^ V ( ¯ y ψ )= ^ V ( ^ t ψ ) N 2 7. ^ t str = h = 1 H ^ t h = h = 1 H N h ¯ y h ^ V ( ^ t str )= h = 1 H ( 1 n h N h ) N h 2 s h 2 n h ¯ y h = j = 1 n h y hj n h ^ t h = N h n h j = 1 n h y hj = N h ¯ y h s h 2 = j = 1 n h ( y hj −¯ y h ) 2 n h 1
2 8. ^ p str = h = 1 H N h N ^ p h ^ V ( ^ p str )= h = 1 H ( 1 n h N h ) ( N h N ) 2 ^ p h ( 1 − ^ p h ) n h 1 9. ^ B = ¯ y / ¯ x ^ V ( ^ B ) = ( 1 n N ) s e 2 n ¯ x U 2 e i = y i x i ^ B s e 2 = 1 n 1 i = 1 n ( e i −¯ e ) 2 10. ¯ ^ y r = ^ B ¯ x U ^ V ( ¯ ^ y r ) = ¯ x U 2 ^ V ( ^ B ) 11. ^ B 1 = s xy s x 2 = rs y s x ^ B 0 y ^ B 1 ¯ x s xy = 1 n 1 i = 1 n ( x i ¯ x )( y i ¯ y ) r = s xy / s x s y ¯ ^ y reg y + ^ B 1 x U −¯ x ) = ^ B 0 + ^ B 1 ¯ x U ^ V ( ¯ ^ y reg )= ( 1 n N ) s e 2 n e i = y i ^ B 0 x i ^ B 1 s e 2 = 1 n 1 i = 1 n e i 2 12. n c S N c S N n H l l l l h h h h 1 / / H l l l l h h h h c S N c S N n n 1 / / 2 2 2 / 2 1 2 2 2 2 / e z e S N n n z n H h h h h
3 13. ¯ y d = ^ B = ¯ u ¯ x ^ V ( ¯ y d )= ( 1 n N ) 1 n ¯ x U 2 i = 1 n ( u i ^ B x i ) 2 n 1 ¯ y d = 1 n d i S d y i ^ V ( ¯ y d )≃ ( 1 n N ) s yd 2 n d s yd 2 = 1 n d 1 i S d ( y i ¯ y d ) 2 ^ t d = N d ¯ y d ^ t d = N ¯ u 14. ¯ ^ y post = h = 1 H N h N ¯ y hR ^ V ( ¯ ^ y post ) ≃ h = 1 H ( N h N ) 2 ( 1 n hR N h ) ( s hR 2 n hR ) 15. ^ t unb = N n i = 1 n t i ^ V ( ^ t unb ) = N 2 ( 1 n N ) s t 2 n s t 2 = i = 1 n ( t i ^ t unb N ) n 1 2 ¯ ^ y unb = ^ t unb K ^ V ( ¯ ^ y unb ) = ^ V ( ^ t unb ) K 2 ¯ ^ y r = i = 1 n t i i = 1 n M i ^ V ( ¯ ^ y r ) = ( 1 n N ) 1 n ¯ M U 2 i = 1 n ( t i ¯ ^ y r M i ) n 1 2 =
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 ( 1 n N ) 1 n ¯ M U 2 i = 1 n M i 2 ( ¯ y i ¯ ^ y r ) n 1 2 ^ t r = K ¯ ^ y r ^ V ( ^ t r ) = K 2 ^ V ( ¯ ^ y r ) 16. ^ t i = M i ¯ y i ¯ y i = 1 m i j = 1 m i y ij ^ t unb = N n i = 1 n ^ t i ^ V ( ^ t unb ) = N 2 ( 1 n N ) s t 2 n + N n i = 1 n ( 1 m i M i ) M i 2 s i 2 m i s t 2 = i = 1 n ( ^ t i ^ t unb N ) n 1 2 ¯ ^ y r = i = 1 n M i ¯ y i i = 1 n M i ^ V ( ¯ ^ y r ) = ( 1 ¯ M 2 ) [ ( 1 n N ) s r 2 n + 1 nN i = 1 n ( 1 m i M i ) M i 2 s i 2 m i ] s r 2 = i = 1 n ( M i ¯ y i M i ¯ ^ y r ) 2 n 1 = i = 1 n M i 2 ( ¯ y i ¯ ^ y r ) n 1 2
5 Stat 421 – Spring 2016 Name: _____________________________ Final Exam, May 5, 2016 This midterm has 3 “short answer” questions, each with multiple parts, and 5 True/False questions. The questions cover pages 5-16 of this packet. Please show your work, and remember to include units where relevant. For the short answer computations, if you include a complete formula with numbers, you do not need to complete the calculation. 1. A real estate company wants to understand the composition of houses in a community with a total of 40 houses. The real estate company selects an SRSWOR of 4 houses. For the sampled houses, the real estate company collects information on the size of the garage and the number bedrooms in the house. The table below contains the collected data. [30 points] Sample House ID #s Garage size (square feet) Number of Bedrooms 1 384 2 2 308 1 3 484 2 4 576 4 a. First, the real estate company wants to estimate the average number of bedrooms among houses with a garage size of at least 400 square feet. (i) Define the domain U d of interest in words. [3 points] Houses with a garage size of at least 400 feet. (ii) Provide a mathematical expression for the domain population parameter of interest. Define the meanings of the symbols used in your expression, including the definition of y i . [3 points] ´ y U d = 1 N d i U d y i , where N d is the number of elements in U d , and y i is the number of bedrooms in house i
6 (iii) Record the values for the following terms. [3 points] N 40 n 4 n d 2 (iv) Fill in the columns in the table below for the variables x i and u i [6 points] Sample House ID #s Garage size (square feet) Number of Bedrooms x i u i 1 384 2 0 0 2 308 1 0 0 3 484 2 1 2 4 576 4 1 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 (v) Estimate the average number of bedrooms among households with a garage size of at least 400 square feet. [5 points] ´ y d = 1 n d i A d y i = 2 + 4 2 = 3 bedrooms / house (vi) Estimate the standard error of the estimator used in (v). [5 points] ´ y ^ V ( ¿¿ d )= ( 1 n N ) S d 2 n d = ( 1 4 40 ) 2 2 ¿ ´ y ^ SE ( ¿¿ d )= ( 1 4 40 ) = 0.95 ¿
8 (vii) Now, the real estate company wants to estimate the total number of bedrooms for the households that have a garage size of at least 400 square feet. The real estate company does not know the total number of households that have a garage size of at least 400 square feet. Using the variable u i defined in (iv), estimate the total number of bedrooms among households that have a garage size of at least 400 square feet. [5 points] ^ t d = N n i A u i = 40 4 ( 2 + 4 ) = 60 bedrooms houses with garagesthat areat least 400 squarefeet 2. A soil scientist wants to estimate the average water erosion (soil loss due to rainfall; units of tons/acre) on crop fields in a region. She selects an SRSWOR of n = 100 crop
9 fields from the N = 1000 crop fields in the population. Many of the operators of the sampled crop fields refuse to participate in the survey. She decides to use post- stratification to adjust for nonresponse. The post-strata are the two groups defined by whether or not corn was grown in the field last year. The table below contains the population size, respondent sample size, sample mean of the respondents, and sample standard deviation of the respondents for each post-stratum. [20 points] Post-stratum ( h ) Number in population ( N h ) Number of respondents ( n hR ) Mean erosion for respondents ( ´ y hR ) (tons/acre) Standard deviation of erosion for respondents ( s hR ) Corn not grown ( h = 1 ) 250 15 2 1 Corn grown ( h = 2 ) 750 35 4 2 a. Why might the simple mean of the 50 respondents be a biased estimator of the overall mean erosion in this region? Provide two reasons. [2 points] The probability of responding might be related to the level of erosion in the field. For example, if farmers with higher erosion are less likely to respond to the survey, then we would expect the simple mean to have a negative bias for the population mean. b. Give a formula for the post-stratified estimator of the mean erosion in the whole population, and define the symbols used in your formula.[6 points] ^ ´ y post = h = 1 H N h N ´ y hR , where N h is the population size for stratum h , N is the total population size, and ´ y hR is the mean of respondents in stratum h .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10 c. Using the formula from (b), estimate the mean erosion in this region using post- stratification. (Remember to include units) [5 points] ^ ´ y post = 250 1000 2 + 750 1000 4 = 3.5 tons / acre d. What is the weight for the post-stratified estimator for a respondent that grew corn last year? [3 points] N h n hR = 750 35 = 21.43 e. What assumption justifies the use of post-stratification to adjust for nonresponse?[4 points]
11 The distribution of soil erosion is the same for the responding and nonresponding portions of the population within each post-strata, where the post-strata are defined by fields were corn is grown and fields where corn is not grown.
12 3. A researcher from the department of education in Rhode Island is interested in the smoking behaviors among high school students in Rhode Island. Rhode Island has a total of 45,000 high school students enrolled in a total of 60 high schools. The researcher selects an SRSWOR of 2 high schools from the 60 high schools in Rhode Island. From each selected high school, the researcher selects an SRSWOR of 10% of the students in the high school. That is, for both sampled high schools, m i M i = 0.1 . The researcher asks each sampled student, “Do you smoke cigarettes?” The table below summarizes the collected data for the 2 high schools in the sample. [40 points] Sample high school ID #s M i ^ t i 1 400 100 2 1000 200 a. What is the target population?[2 points] High school students in Rhode Island b. Define the variable of interest, y ij for student j in high school i .[2 points] y ij = 1 if student smokes cigarettes, 0 otherwise c. Does this study have potential sources of measurement error? Explain.[2 points] Yes, the student may lie; the question is also not precise in terms of specifying the frequency of smoking that is of interest.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
13 d. Provide numeric values for the following symbols. [7 points] N 60 K 45,000 n 2 m 1 40 π 1 2/60 π j 1 .1 π 1 , j 2/600 e. Is this a self-weighting design? Why, or why not? [2 points] Yes, the weight is 300 for all sampled units.
14 f. Now, we will estimate the total number of high school students in Rhode Island who smoke cigarettes using the unbiased estimator of the total. (i) Give a mathematical formula for the population parameter of interest and define symbols used that are not given in (d).[2 points] t = i = 1 N t i ,t i = j U i y ij (ii) Give a formula for the unbiased estimator of the total number of high school students in Rhode Island who smoke cigarettes.[4 points] ^ t = N n i = 1 n ^ t i , ^ t i = j = 1 m i M i m i y ij (iii) Estimate the total number of high school students in Rhode Island who smoke cigarettes using the unbiased estimator. [4 points] ^ t = N n i = 1 n ^ t i = 60 ( 100 + 200 2 ) = 9,000
15 (iv) Give a formula for the standard error of the unbiased estimator of the total number of high school students in Rhode Island who smoke cigarettes. [4 points] SE { ^ t } = N 2 ( 1 n N ) S t 2 n + N n i A M i 2 ( 1 m i M i ) S i 2 (v) Estimate the variance of the unbiased estimator of the total number of high school students in Rhode Island who smoke cigarettes. (Hint: Use the property that a proportion is a special case of a mean to estimate the within-cluster component of the variance.)[4 points] First, we estimate the between-psu component: N 2 ( 1 n N ) S t 2 n = 60 2 ( 1 2 60 ) ( 100 200 ) 2 / 2 2 = 8,700,000 Because the estimates of the within-PSU components are based on SRS proportions, S i 2 m i = ^ p i ( 1 ^ p i ) m i 1 = M i 1 ^ t i ( 1 M i 1 ^ t i ) m i 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
16 The within-PSU component is N n i A M i 2 ( 1 m i M i ) S i 2 m i = 60 2 { 400 2 ( 1 0.1 ) ( 0.25 ( 0.75 ) ) 39 + 1000 2 ( 1 0.1 ) ( 0.2 ( 0.8 ) ) 99 } ¿ 64405.59 We combine the between-psu component and the with-in psu component to obtain SE { ^ t } = 8,700,000 + 64405.59 = 2960.47 g. Use the unbiased estimator to estimate the proportion of high school students in Rhode Island who smoke cigarettes. (Hint: Use your answer to (f.) and recall that a proportion is a special case of a mean.) [3 points] ^ p = 9,000 45,000 = .2 h. For this example, would you recommend using the unbiased estimator or the ratio estimator? Explain. [4 points]
17 Ratio estimator because cluster population sizes vary 4. TRUE/FALSE: 2 points each. [T F] Stratified sampling typically leads to estimators of overall population means that are less efficient (higher MSE) than estimators from an SRSWOR of the same number of elements, especially if the strata define groups that are more homogeneous than the overall population. [T F] Cluster sampling is usually used because of practical reasons, such as the frame structure or to reduce data collection costs, rather than to improve the efficiency. Cluster samples usually lead to less precise estimators (higher MSE) than estimators from an SRSWOR of the same number of elements. [ T F] In probability proportional to size sampling with replacement (PPSWR), we select a with replacement sample where the probability of selecting an element on a single draw is proportional to the size measure for the element. If the size measure is correlated with the variable of interest, then we expect estimators of overall population means and totals from the PPSWR sample to be more efficient (lower MSE) than estimators from an SRSWOR of the same number of elements. [T F] An SRSWOR of size n = 50 is selected from a population of size N = 500 . The population mean of an auxiliary variable is known to be ´ x U = 50 . The sample mean of the same auxiliary variable is ´ x = 60 . The ratio estimator of the population mean of the variable of interest will be larger than the sample mean. [ T F] Systematic sampling is a special case of one-stage cluster sampling, where one cluster is selected.