This example explores the survey dataset from the MASS package, with a focus on the Height and Age variables. First, let’s examine the survey dataset, paying special attention to Height and Age:

survey <- MASS::survey # load the datasummary(survey$Age)

Min. 1st Qu. Median Mean 3rd Qu. Max.
16.75 17.67 18.58 20.37 20.17 73.00

summary(survey$Height)

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
150.0 165.0 171.0 172.4 180.0 200.0 28

(a). Proportion Below a Given Value

Q1: What proportion of individuals have a height below 160 cm, based on our sample?

Hint: Calculate the mean and standard deviation for the Height variable first.

Click for answer

# Mean and standard deviation for HeightHeight_mean <-mean(survey$Height, na.rm =TRUE)Height_sd <-sd(survey$Height, na.rm =TRUE)# Proportion below 160 cmpnorm(160, mean = Height_mean, sd =Height_sd)

[1] 0.1043305

(b). Determining a Specific Percentile

Q2: What is the age cutoff for the lower 75% of our sample distribution?

Hint: Determine the mean and standard deviation of the Age variable to find the age at the 75th percentile.

Click for answer

# Mean and standard deviation for Ageage_mean <-mean(survey$Age, na.rm =TRUE)age_sd <-sd(survey$Age, na.rm =TRUE)# Age at the 75th percentileqnorm(0.75, mean = age_mean, sd = age_sd)

[1] 24.74139

(c). Calculating the Middle 95 Percentile for Age

Q3: What are the age cutoffs that define the middle 95% of our sample distribution?

Hint: Calculate the 5th and 95th percentiles for the Age variable to find the ages that bound the middle 95% of the distribution.

Click for answer

# Age at the 25th percentileage_25th <-qnorm(0.05, mean = age_mean, sd = age_sd)age_25th

[1] 9.725181

# Age at the 75th percentileage_75th <-qnorm(0.95, mean = age_mean, sd = age_sd)age_75th

[1] 31.02385

Problem 2: SAT Verbal scores

Suppose that the verbal SAT scores in a population are normally distributed with a mean \(\mu=580\) and standard deviation \(\sigma = 70\). If \(X\) is shorthand for a verbal SAT score, then we can write this as \(X \sim N(580,70)\).

(a) What proportion of scores are above 650?

Click for answer

Answer: About 15.9% of the scores are above 650.

pnorm(650,mean=580,sd=70) # proportion below

[1] 0.8413447

1-pnorm(650,mean=580,sd=70) # proportion above

[1] 0.1586553

(b) What is the 25th percentile (Q1)?

Click for answer

Answer: The score of about 533 is the 25th percentile, meaning 25% of the scores are below this value.

qnorm(.25,mean=580,sd=70)

[1] 532.7857

(c) What is the IQR for verbal SAT scores in this population? (Hint: find Q1 and Q3)

Click for answer

Answer: The 25th percentile (Q1) is 533 and the 75th percentile (Q3) is 627. The IQR for this normally distributed variable is about 94 points.

q1 <-qnorm(.25,mean=580,sd=70);q1

[1] 532.7857

q3 <-qnorm(.75,mean=580,sd=70);q3

[1] 627.2143

q3-q1

[1] 94.42857

(d) What score, high or low, will be deemed an outlier according the boxplot rules for outliers?

Click for answer

Answer: Using the 1.5IQR’s boxplot rule gives a lower fence of 392 and an upper fence of 768. So any score below 392 and above 768 will be called an outlier according to this rule.

1.5*94

[1] 141

q1 -1.5*94

[1] 391.7857

q3 +1.5*94

[1] 768.2143

(e) What percent of the population will be deemed an outlier?

Click for answer

Answer: We need to find the proportion of scores below 392 and above 768. With this symmetric distribution, we find about 0.004 in both tails. About 0.8% of the population will be deemed outliers according to the boxplot rule.

pnorm(392,mean=580,sd=70)

[1] 0.003618747

1-pnorm(768,mean=580,sd=70)

[1] 0.003618747

Problem 3: Standard Normal

The standard normal distribution has a mean of 0 and standard deviation of 1.

(a) What percent of SAT scores are at least 1 standard deviation above average?

Click for answer

pnorm(1) # proportion below

[1] 0.8413447

1-pnorm(1) # proportion above

[1] 0.1586553

Answer: About 16% of scores will be at least 1 standard deviation above average. (Note that the score of 580+70 = 650 is 1 standard deviation above average.)

(b) How many standard deviations away from average is the 25th percentile of SAT scores?

Click for answer

Answer: The 25th percentile of SAT scores (or any normally distributed values) is 0.67 standard deviations below average. We could also find this value using our answer to (1b):