# Economics

3809 Words16 Pages
Bootstrapping Regression Models Appendix to An R and S-PLUS Companion to Applied Regression John Fox January 2002 1 Basic Ideas Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by resampling from the data at hand. The term ‘bootstrapping,’ due to Efron (1979), is an allusion to the expression ‘pulling oneself up by one’s bootstraps’ – in this case, using the sample data as a population from which repeated samples are drawn. At ﬁrst blush, the approach seems circular, but has been shown to be sound. Two S libraries for bootstrapping are associated with extensive treatments of the subject: Efron and Tibshirani’s (1993) bootstrap library, and Davison and…show more content…
∗ Next, we compute the statistic T for each of the bootstrap samples; that is Tb = t(S∗ ). Then the b ∗ distribution of Tb around the original estimate T is analogous to the sampling distribution of the estimator T around the population parameter θ. For example, the average of the bootstrapped statistics, ∗ T = E ∗ (T ∗ ) = R b=1 R ∗ Tb ∗ estimates the expectation of the bootstrapped statistics; then B ∗ = T − T is an estimate of the bias of T , that is, T − θ. Similarly, the estimated bootstrap variance of T ∗ , V ∗ (T ∗ ) = R ∗ b=1 (Tb ∗ − T )2 R−1 estimates the sampling variance of T . The random selection of bootstrap samples is not an essential aspect of the nonparametric bootstrap: At least in principle, we could enumerate all bootstrap samples of size n. Then we could calculate E ∗ (T ∗ ) and V ∗ (T ∗ ) exactly, rather than having to estimate them. The number of bootstrap samples, however, is astronomically large unless n is tiny.2 There are, therefore, two sources of error in bootstrap inference: (1) the error induced by using a particular sample S to represent the population; and (2) the sampling error produced by failing to enumerate all bootstrap samples. The latter source of error can be controlled by making the number of bootstrap replications R suﬃciently large. 2 Bootstrap Conﬁdence Intervals There are several approaches to