STAT 120
Statistical inference is the process of drawing conclusions about the entire population based on information in a sample.
Can you drink 5 beers and stay under the 0.08 limit?
Do the observed differences in strike rates between black and white eligible jurors indicate a potential bias, or are the differences just due to chance?
Parameter | Statistic | |
---|---|---|
Mean | \(\mu\) | \(\bar{x}\) |
Proportion | \(p\) | \(\hat{p}\) |
Std. Dev. | \(\sigma\) | \(s\) |
Correlation | \(\rho\) | \(r\) |
Slope | \(\beta\) | \(b\) |
State whether the quantity described is a parameter or a statistic, and give the correct notation.
A sampling distribution is the distribution of sample statistics computed for different samples of the same size from the same population.
Center: If samples are randomly selected, the sampling distribution will be centered around the population parameter.
Shape: For most of the statistics we consider, if the sample size is large enough the sampling distribution will be symmetric and bell-shaped.
Uncertainty in point estimates measured by the standard error (SE)
05:00
The standard error for the average word size in a random sample of 10 words is closest to
Sample size (n) = how many individuals are in the sample used to compute our stat?
Simulation size (N) = how many random samples did we take from the population to simulate the sampling distribution of our stat?
The SE of your stat gets smaller as \(n\) get bigger.
Once you’ve simulated a couple \(100\) samples, the shape/center/spread of the sampling distribution should remain about the same as you increase the simulation size.
Samples of size 5 are taken from a large population with population mean 8, and the sampling distributions for the sample means are shown. Dataset A (top) and Dataset B (bottom) were collected using different sampling methods. Which dataset (A or B) used random sampling?
Bootstrap: Sample with replacement from the original sample, using the same sample size.
Creating a bootstrap sample is the same as using the data simulate a “population” that contains an infinite number of copies of the data.
Generate a bootstrap sample.
Compute the statistic of interest for your bootstrap sample.
Repeat steps (1) – (2) many times. Plot the distribution of all your bootstrap statistics
This is the bootstrap distribution!
20:00