Confidence Intervals and Bootstrap

STAT 120

Bastola

Let’s start with a question!

The higher the standard error of a statistic, the \(\ldots \ldots\) the uncertainty surrounding the statistic.

  1. higher
  2. lower

Sampling distribution

Sampling distributions. Top n=50, Bottom n=100

Interval Estimation

  • Point estimates are almost always not accurate
  • Uncertainty in point estimates measured by the Standard Error (SE)
  • A plausible range of values for the population parameter is more reliable
  • Interval Estimate: An interval estimate is an interval of numbers within which the parameter value is believed to fall

A Gallup Poll

How accurate is an estimate of \(60\%\)?



A Gallup Poll


" \(\ldots\) the margin of sampling error is \(\pm\) 3 percentage points at the \(95\%\) confidence level.”

  • Interval estimate: \(60\% \pm 3\% = (57\% , 63\%)\)
  • The percentage of American adults who would vote for a Muslim for president is likely between \(57\%\) and \(63\%\).

Margin of Error

  • The margin of error measures how accurate a point estimate is likely to be in estimating a parameter.

  • To determine the margin of error, we can use the statistic’s sampling distribution and standard error

Confidence Intervals

  • A confidence interval is an interval containing the most believable values for a parameter
  • A confidence interval is centered on the point estimate and extends a certain number of standard errors on either side of the estimate
  • The confidence level tells us what percent of the intervals will contain the population parameter.
  • A 95% confidence interval means that if we were to draw numerous samples and calculate their confidence intervals, about 95% of these intervals would be expected to contain the true population parameter.

Confidence Intervals are …

  • always about the population
  • not probability statements
  • only about population parameters, not individual observations
  • only reliable if the sample statistic they’re based on is an unbiased estimator of the population parameter

Estimating 95% Confidence Interval

If the sampling distribution is relatively symmetric and bell-shaped, a \(95\%\) confidence interval can be estimated using \[\mathbf{statistic} \pm 2 \times \mathbf{SE}\]

A short demo

Let’s all go to Statkey web app.


Take Home Points

  • The parameter is fixed
  • The statistic is random (depends on the sample)
  • The interval is also random (depends on the statistic)
  • Confidence level is the proportion of intervals that capture the true parameter

What to do when we only have one sample - BOOTSTRAP!

  • Repeated sampling is needed to compute the standard error of a sample statistic
  • Can estimate the SE from a bootstrap distribution
  • Use this SE to compute a confidence interval for an unknown parameter

Bootstrap Distribution

A bootstrap distribution is the distribution of many bootstrap statistics.

  • The standard deviation of this distribution is called the bootstrap standard error of the statistic.
  • The bootstrap distribution is centered near the original sample mean.


    ** Bootstrap interval for mean 

 Observed  X : 20.66667 
 Mean of bootstrap distribution: 20.66403 
 Standard error of bootstrap distribution: 1.09819 

 Bootstrap percentile interval
    2.5%    97.5% 
18.50000 22.66667 

        *--------------*

library(CarletonStats) # load the library
X <- c(20,24,19,23,22,16) # data
boot(X) # bootstrap

 Group Activity 1


30:00