Setup

Linear Regression
dependent variable … what I am trying to predict
- continuous, quantitative data
independent variable … variables I am changing in the experiment
- continuous or categorical
- categorical … Dummy Variable
  - $n$ of category - 1 = $n$ of dummy variables
    - first category is reference category
$R^{2}$ value check
- how much spread/variation is in the dependent variables
  - what percentage of DV values can be predicted with my IV values
- $0 \leq R^{2} \leq 1$ … higher = better … capturing more variation of dataset
  - global test: check with all IVs
    - can also indicate if the model is good/bad
  - local test: check with single/some IVs but not all of them
    - can help isolate correlated IVs
$R^{2}$ vs $R_{a d j}^{2}$
- $R^{2}$ is independent of the number of variables
  - best used for global test
- $R_{a d j}^{2}$ takes number of variables into account … get’s better with more variables
  - best used for deciding which local test is best
$p$ -Value < $α$ … $α$ is normally 0.05 … 5%
simple regression (only 1 IV) vs multiple regression (multiple IVs)

Assumption

linear relationship between DVs and IVs
- linear for linear regression
- exponential for exponential regression, etc
normal distribution of errors
homoscedasticitytodo spellcheck
no multi-column linearity
- IVs are independent of one another

todo get a sample regression summary

todo get a multiple regression summary

doing a z-transformation scales all values along the same mean with a standard deviation of 1
allows us to also use the coefficient of the regression analysis to base our assumption of the data instead of just the p-value