Gentle Intro
- Hypothesis
- after Popper Logic
- every hypothesis has to be able to be disproven
- null-Hypothesis
- default value (status quo) for a parameter (until proven false)
- like defendant in court … unguilty until proven otherwise
- denoted as H0
- alternative Hypothesis
- deviation from current knowledge
- must be proven to be valid
- denoted as Ha
Example Machine
- machine must produce with mean diameter of 0.5 inch
Testing
- possible outcome of any Test
- reject the null hypothesis
- finding a non-white swan (significant result)
- fail to reject the null hypothesis
- even tho I fail to reject, I still do not accept the null-Hypothesis
- there can still be a black swan out there
- in reality, large enough sample size might result in “impractical” status
- no further research to be done
- rejecting is positive
Errors
- Errors
- standard error=nstandard deviation
- standard deviation … data - how spread out the data points are
- standard error … meaning - how relevant/meaningful the conclusions are
Ingredients
- confidence level
- rejection region
- defining when H0 is rejected in favor of Ha
- e.g. when arbitrary experiment result is greater than 5
- rejection region is always outside of confidence interval
- test statistic
- depends on problem we have
Interpretation
- when result is inside confidence interval
- i.e. outside the rejection region
- we know that we cannot reject H0, but still not accept it
- at the current confidence level
Tests
Population Mean
One-Tailed
- upper Ha:Θ>Θ0 or lower Ha:Θ<Θ0
- Theta Θ (measured) and Θ0 (expected) are placeholder for the corresponding values compared
- ignored in this course, but not hard to grasp or adjust the formulas
Two-Tailed
- two-tailed Ha:Θ=Θ0
- then we collect sample data and get xˉ and σ
- or s sample standard deviation if σ is not known
- therefore for large samples: tstat=nσxˉ−μ0∼N(0,1)
- for small samples: tstat=nσxˉ−μ0∼t-distribution
- choose significance level α
- reminder: α = chance of Type I error
- region within confidence interval → do not reject H0
- region outside confidence interval → reject H0
- confidence interval can be constructed without data!
- only distribution type, sample size and α needed
- z-critical value (end points of rejection region)
- zc=qnorm(21−α) if n is large (> 30) → CLT
- zc=qt(21−α) if n is small and population is normally distributed
Population Proportion
p-Values
- the probability of obtaining a sample “more extreme” than the one observed in the data set, assuming that H0 is true
- basically reversing the calculation
- finding α for the given xˉ (two-sided CI)
- leaving it up to the reader to interpret the result
- p-value =
- 2∗P(observed z<Z) for xˉ<μ → Z will be negative
- 2∗P(observed z>Z) for xˉ>μ → Z will be positive