This example explores the survey dataset from the MASS package, with a focus on the Height and Age variables. First, let’s examine the survey dataset, paying special attention to Height and Age:
survey <- MASS::survey # load the datasummary(survey$Age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.75 17.67 18.58 20.37 20.17 73.00
summary(survey$Height)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
150.0 165.0 171.0 172.4 180.0 200.0 28
(a). Proportion Below a Given Value
Q1: What proportion of individuals have a height below 160 cm, based on our sample?
Hint: Calculate the mean and standard deviation for the Height variable first.
Click for answer
# Mean and standard deviation for HeightHeight_mean <-mean(survey$Height, na.rm =TRUE)Height_sd <-sd(survey$Height, na.rm =TRUE)# Proportion below 160 cmpnorm(160, mean = Height_mean, sd =Height_sd)
[1] 0.1043305
(b). Determining a Specific Percentile
Q2: What is the age cutoff for the lower 75% of our sample distribution?
Hint: Determine the mean and standard deviation of the Age variable to find the age at the 75th percentile.
Click for answer
# Mean and standard deviation for Ageage_mean <-mean(survey$Age, na.rm =TRUE)age_sd <-sd(survey$Age, na.rm =TRUE)# Age at the 75th percentileqnorm(0.75, mean = age_mean, sd = age_sd)
[1] 24.74139
(c). Calculating the Middle 95 Percentile for Age
Q3: What are the age cutoffs that define the middle 95% of our sample distribution?
Hint: Calculate the 5th and 95th percentiles for the Age variable to find the ages that bound the middle 95% of the distribution.
Click for answer
# Age at the 25th percentileage_25th <-qnorm(0.05, mean = age_mean, sd = age_sd)age_25th
[1] 9.725181
# Age at the 75th percentileage_75th <-qnorm(0.95, mean = age_mean, sd = age_sd)age_75th
[1] 31.02385
Problem 2: SAT Verbal scores
Suppose that the verbal SAT scores in a population are normally distributed with a mean \(\mu=580\) and standard deviation \(\sigma = 70\). If \(X\) is shorthand for a verbal SAT score, then we can write this as \(X \sim N(580,70)\).
(a) What proportion of scores are above 650?
Click for answer
Answer: About 15.9% of the scores are above 650.
pnorm(650,mean=580,sd=70) # proportion below
[1] 0.8413447
1-pnorm(650,mean=580,sd=70) # proportion above
[1] 0.1586553
(b) What is the 25th percentile (Q1)?
Click for answer
Answer: The score of about 533 is the 25th percentile, meaning 25% of the scores are below this value.
qnorm(.25,mean=580,sd=70)
[1] 532.7857
(c) What is the IQR for verbal SAT scores in this population? (Hint: find Q1 and Q3)
Click for answer
Answer: The 25th percentile (Q1) is 533 and the 75th percentile (Q3) is 627. The IQR for this normally distributed variable is about 94 points.
q1 <-qnorm(.25,mean=580,sd=70);q1
[1] 532.7857
q3 <-qnorm(.75,mean=580,sd=70);q3
[1] 627.2143
q3-q1
[1] 94.42857
(d) What score, high or low, will be deemed an outlier according the boxplot rules for outliers?
Click for answer
Answer: Using the 1.5IQR’s boxplot rule gives a lower fence of 392 and an upper fence of 768. So any score below 392 and above 768 will be called an outlier according to this rule.
1.5*94
[1] 141
q1 -1.5*94
[1] 391.7857
q3 +1.5*94
[1] 768.2143
(e) What percent of the population will be deemed an outlier?
Click for answer
Answer: We need to find the proportion of scores below 392 and above 768. With this symmetric distribution, we find about 0.004 in both tails. About 0.8% of the population will be deemed outliers according to the boxplot rule.
pnorm(392,mean=580,sd=70)
[1] 0.003618747
1-pnorm(768,mean=580,sd=70)
[1] 0.003618747
Problem 3: Standard Normal
The standard normal distribution has a mean of 0 and standard deviation of 1.
(a) What percent of SAT scores are at least 1 standard deviation above average?
Click for answer
pnorm(1) # proportion below
[1] 0.8413447
1-pnorm(1) # proportion above
[1] 0.1586553
Answer: About 16% of scores will be at least 1 standard deviation above average. (Note that the score of 580+70 = 650 is 1 standard deviation above average.)
(b) How many standard deviations away from average is the 25th percentile of SAT scores?
Click for answer
Answer: The 25th percentile of SAT scores (or any normally distributed values) is 0.67 standard deviations below average. We could also find this value using our answer to (1b):