```
= 31/48
pL_hat = 27/48
pN_hat = 58/96
pooled_p = 48
nL = 48
nN = sqrt(pooled_p*(1-pooled_p)*(1/nL + 1/nN))
SE = (pL_hat - pN_hat) / SE
z = 2*(1-pnorm(z))
p_value p_value
```

`[1] 0.4038223`

A 2016 study described in The Guardian found that a random sample of US adults in 1994 found a female rate of gun ownership of 9%. A similar random sample in 2015 found the rate of female gun ownership rose to 12%. In the section 3.2 handout, we assumed that the SE for the difference in these two sample proportions is 2%. Show how this SE was computed using the appropriate SE formula from chapter 6. Assume that the sample sizes in both 1994 and 2015 were 500.

*Answer:*

We have a 1994 sample proportion of \(\hat{p}_{1994} = 0.09\) and a 2015 sample proportion of \(\hat{p}_{2015} = 0.12\) . The SE of the difference in two sample proportions for a confidence interval is given by:

\[SE = \sqrt{\frac{\hat{p}_{1994} (1 - \hat{p}_{1994} )}{n_{1994}} + \frac{\hat{p}_{2015} (1 - \hat{p}_{2015} )}{n_{2015}}} = 0.0194 \approx 0.02 \]Participants in a study to evaluate the accuracy of lie detectors were divided into two groups, with one group reading true material and the other group reading false material, while connected to a lie detector. Both groups received electric shocks to add stress. The two way table indicates whether the participants were lying or telling the truth and also whether the lie detector indicated they were lying or not.

Detector Says Lying | Detector Says Not | Total | |
---|---|---|---|

Person Lying | 31 | 17 | 48 |

Person Not | 27 | 21 | 48 |

Total | 58 | 38 | 96 |

*Answer:*

We are testing \(H_0:p_L = p_N\) vs \(H_a:p_L \neq p_N\). The test statistic is

\[z = \frac{statistic-null}{SE} = \frac{(\hat{p}_L - \hat{p}_N) - 0}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_l} + \frac{\hat{p}(1-\hat{p})}{n_N}}} = \frac{0.6458 - 0.5625}{\sqrt{\frac{0.6042(1-0.6042)}{48}} + \frac{0.6042*(1-0.6042)}{48}} = 0.834 \]

This is a two-tail test, and the area to the right of 0.834 in a normal distribution is 0.202 (1-pnorm(0.834)), so the p-value is 2(0.202) = 0.404. The R command is: 2*(1-pnorm(0.834))

```
pL_hat = 31/48
pN_hat = 27/48
pooled_p = 58/96
nL = 48
nN = 48
SE = sqrt(pooled_p*(1-pooled_p)*(1/nL + 1/nN))
z = (pL_hat - pN_hat) / SE
p_value = 2*(1-pnorm(z))
p_value
```

`[1] 0.4038223`

We fail to reject H0 and conclude that there is not enough evidence that a lie detector can tell whether a person is lying or telling the truth.

```
pL_hat <- 31/48
pN_hat <- 21/48
conf_level = 0.95
z_star = qnorm(1-(1-conf_level)/2)
SE <- sqrt(pL_hat*(1-pL_hat)/48 + pN_hat*(1-pN_hat)/48)
margin_of_error = z_star * SE
CI_lower = (pL_hat - pN_hat) - margin_of_error
CI_upper = (pL_hat - pN_hat) + margin_of_error
CI = c(CI_lower, CI_upper)
CI
```

`[1] 0.01339606 0.40327061`

The 95% confidence interval for the difference in proportions is (0.0134, 0.4033). We are 95% confident that the proportion of lying people correctly identified by the lie detector is from 1.3% to 40% larger than the proportion of not lying people correctly identified by the lie detector.

Does smoking negatively affect a person’s ability to become pregnant? A study collected data on 678 women who were trying to get pregnant. The two-way table shows the proportion who successfully became pregnant during the first cycle trying and smoking status.

Smoker | Non-smoker | Total | |
---|---|---|---|

Pregnant | 38 | 206 | 244 |

Not Pregnant | 97 | 337 | 434 |

Total | 135 | 543 | 678 |

The conditions are met for using the normal distribution (at least 10 values in each cell of the table). We see that the proportion of smokers who got pregnant is 38/135 = 0.281 while the proportion of non-smokers who got pregnant is 206/543 = 0.379. The confidence interval is given by:

\[statistic \pm z^* \cdot SE \] \[(\hat{p}_S - \hat{p}_N) \pm z^* \cdot \sqrt{\frac{\hat{p}_S(1- \hat{p}_S)}{n_S} + \frac{\hat{p}_N(1- \hat{p}_N)}{n_N}} \]

\[(0.281 - 0.379) \pm 1.645\cdot \sqrt{\frac{0.281(1-0.281)}{135} + \frac{0.379(1-0.379)}{543}} \] \[-0.098 \pm 0.072 = (-0.170, -0.026) \] We are 90% sure that the proportion of smokers who get pregnant in the first cycle is between 0.170 and 0.026 less than the proportion of non-smokers who get pregnant on the first cycle. Note that if we had subtracted the other way, the interval would have only positive values, but the interpretation would be the same.

We are testing \(H_0:p_S = p_{NS}\) vs \(H_a:p_S \neq p_{NS}\). The test statistic is:

```
pS_hat = 38/135
pNS_hat = 206/543
pooled_p2 = (38+206)/(135+543)
nS = 135
nNS = 543
SE2 = sqrt(pooled_p2*(1-pooled_p2)*(1/nS + 1/nNS))
z2 = (pS_hat - pNS_hat) / SE2
p_value2 = 2*(pnorm(z2))
p_value2
```

`[1] 0.03394234`

Based on the p-value, we reject \(H_0\) and conclude that there is a difference in the proportion of women who get pregnant between smokers and non-smokers.