Population Mean
- Confidence interval is a formula that tells us how to use the sample data to calculate an interval that estimates the target parameter
- we always have to add confidence level: 95%, 99%, etc
Large Samples
- similar to creating the z score
- we can expect the sample mean xΛ is good enough because of Central Limit Theorem
z=ΟxΛβxΛβΞΌβ
zβΌN(0,1)
- region is defined by confidence level as a symmetric range
- tails share 100%βconfidenceΒ level=1βΞ±
- i.e. 95% confidence β tails share 5%
- βz=2.5Β percentile
- +z=97.5Β percentile
- look at any z table to get the results
Alpha Table
| Ξ± | zΞ±/2β |
|---|
| 20% | 1.282 |
| 15% | 1.440 |
| 10% | 1.645 |
| 5% | 1.960 |
| 1% | 2.576 |
Small Samples
- for small samples the Central Limit Theorem is not holding anymore
- we need another assumption
- we can use t-statistic distributions
- t-statistic distributions have thiccer tails
- i.e. more extreme events are more likely
- degree of freedom df defines how exact the t-static will match the z value
- normally df=(nβ1) degree of freedom is almost the sample size
- for large n the t and z value will be ever more similar
- therefore we donβt need t-statistic with larger n sample sizes
- we assume that the population of the sample is normally distributed
- therefore we can use s for Ο β normally not allowed
t=s/nβxΛβΞΌβ
Large Sample Confidence Intervals
Theory
- when asking a binary question (e.g. Is coffee overpriced?) the result is a binomial
- remember np(1βp) average number of successes from Probability
- p^β=n#1β β¦ p-hat is number of success over sample size
- p^β is unbiased, i.e. the expected value = the probability
- all estimators in QM2 are unbiased
- expressable as p^β=n1βββi=1nβBernoulli(p) where Bournoulli(p) is either 1 or 0
- summing up all yes (1) and no (0) values and divide by sample size n
- after Central Limit Theorem this can be considered the same as sample mean as long as n is βlargeβ enough
- p^β=xΛ
- large is true when
- nβp^ββ₯15 and nβ(1βp^β)β₯15
- therefore p^ββN(mean(p^β,sd(p^β))
- therefore mean(p^β)=p
- and sd(p^β)=np(1βp)β
- results in p^ββN(p,np(1βp)ββ)
- now going for the z-value
- z=Οp^ββp^ββpβ=np(1βp)ββp^ββpβ
- zβΌN(0,1) β¦ standard normal
- finally for the confidence interval
- p^βΒ±zΞ±/2ββΟpββp^βΒ±zΞ±/2ββp^β(1βp^β)/nβ
- p^ββzΞ±/2ββΟpββ€pβ€p^β+zΞ±/2ββΟpβ
- P(p^β)=1βΞ± β¦ probability of ΞΌ being in the confidence interval
Example Coffee
- givens:
- asked for: 95% Confidence Interval
- solution:
- zΞ±/2β(97.5%)=1.96
- Οp^ββ=np(1βp)ββ=5000.6β0.4ββ=0.022
- 0.6β1.96β0.022β€pβ€0.6+1.96β0022
More Theory - Error
- S=zΞ±/2ββnβΟβ is just the half width β careful! sometimes the full width is given/asked for
- S is the distance from the center to the Confidence Interval edge
- therefore n=S2zΞ±/22ββΟ2β
- always round up β¦ n is the minimum, therefore lower would be outside of Confidence Interval