This class is different every time I teach it: below is what we we usually cover in the 1st semester of an introductory sequence.
Statistics is all about saying something about what you believe about how the world works from what you have observed from it. Statistics is the science of experimentation: what we can observe, and all that we can say about it, is the result of our assumptions about the Experiment or Model that generated it.
We always start off with a model: we’ve used the Bernoulli experiment to illustrate.
The simplest possible experiment - only two outcomes, Success or Failure.
We can enumerate the sample space for this experiment
And then assign a Random Variable as a mapping from the Sample Space (Ω) to the real numbers
For example an outcome or event on the sample space is a particular one is ω = Success. We typically assign
and 0 otherwise.
Now we can immediately write a probability distribution for X
This is its probability mass function (density function for continuous random variables).
Then we can write its distribution function immediately
And so on with Expectation, Variance, Covariance (if we have more than one)...all functions on the probability mass function.
We know that these functions have nice properties and that Variance, and Covariance are basically just Expectations...discrete or continuous averages of functions on the probability mass function.
We know how to transform the random variable and generate a new distribution for the new random variable after transformation. We know that expectation is a linear function so that linearity is respected.
From this we get results on summation and integration ‘passing’ through
expectations under independence
We use these results to investigate the properties of multiple, repeated, often supposed identical and independent outcomes from the experiment: the Sample.
Suppose we take , observations
from our Bernoulli Random Variable model. We devise an estimator for the probability of success that is just
the average of the observed successes
We can see right away from the properties of the Expectation
The CLT tells us, that when we average things, the averages tend to a Normal distribution, no matter the initial distribution we draw from. It is enough to assume that the initial distribution - or model - is stable, and that we take enough samples and then average them.
The parameters - i.e. the particular values which give us a particular distribution -
of this Normal distribution for the average are governed by the initial mean and variance of the underlying
Random Variable (Experiment/Process)
One of the nice properties of the Normal Distribution is that its quantiles are
easy to remember and it is closed to linear transforms.
and by the properties of the Expectation
and if T ∼ Normal then Z ∼ Normal. Let’s write it again:
But remember: What we really want is to say something about the true state of the Experiment/Model/Nature/The World.
One way is to set up a choice between beliefs about the parameters of our model, and a way to choose one set of plausible values
given our tolerance for choosing the alternative hypothesis when it isn’t true
It is reasonable that when our Z statistic is large in magnitude we should take that
as evidence that we are not at our null hypothesis and reject that for our alternative hypothesis: we set
the Z statistic up that way, as just the difference between what we observe and what we expect.
Our ability to choose the alternative when in fact it is true we call the power of the test
The Rejection Region is, of course, set by our choice of α.
We can frame the hypothesis testing paradigm in terms of the values that we
think are plausible for the parameter of interest, at a particular tolerance for α. Using algebra that
which means that
is a (1 − α) percent Confidence Interval for our true parameter of interest p.
This is to say that out of M total experiments, where each time we generate an estimator p̂, (1 −
α) · M of them should cover p.
Lastly, we can make statements about the estimators using Bayesian inference, if we are willing to explicitly state that the parameter if interest is random, i.e. we are not certain in our belief about it.
This allow us to quantify any prior beliefs about the parameter, look at data, and construct a set of posterior beliefs about the parameter.
In the Bayesian approach, the posterior for θ, π(θ|x) is a full PDF, or distribution. This distribution is the tool or method by which we conduct inference.
The Bayesian approach augments ‘frequentist’ procedures by including ‘prior’ information about the parameter of interest, In the ‘frequentist approach’ p is a constant, which we estimate, say via the likelihood, Lik(x|p) = ∏ni=1 fp(xi), for example. We derive estimates using the likelihood of the data, or the sampling distribution. Common estimates are p̂ = x̄.
In the ‘Bayesian approach’ p is an instance of a random variable with a PDF, say π(p), and now we derive estimates using the additional randomness of π(p) via Bayes′Equation:
We get the marginal distribution for the data,
In the Bayesian approach, the posterior for θ, π(θ|x) is a full PDF,
or distribution. This distribution is the tool or method by which we conduct inference.
Suppose we observe data xobs = 0, then:
Now, with a full PDF for p, we can find ‘posterior’ estimates of parameters. Contrast this with the ‘frequentist’ approach where we found point estimates and used the sampling distribution; in the Bayesian approach the sampling distribution role in inference on the parameter is replaced with the Bayesian posterior distribution.
...one possible estimate for the parameter p.
Another possible estimate of
the posterior mode:
In the frequentist approach the Confidence Interval is the interval, say I, such that
In the Bayesian approach the Confidence Interval is the interval, I, such that
and the observed data
The 95% Bayes Interval is (a, b) such that