Setup
- Linear Regression
- dependent variable … what I am trying to predict
- continuous, quantitative data
- independent variable … variables I am changing in the experiment
- continuous or categorical
- categorical … Dummy Variable
- of category - 1 = of dummy variables
- first category is reference category
- of category - 1 = of dummy variables
- value check
- how much spread/variation is in the dependent variables
- what percentage of DV values can be predicted with my IV values
- … higher = better … capturing more variation of dataset
- global test: check with all IVs
- can also indicate if the model is good/bad
- local test: check with single/some IVs but not all of them
- can help isolate correlated IVs
- global test: check with all IVs
- how much spread/variation is in the dependent variables
- vs
- is independent of the number of variables
- best used for global test
- takes number of variables into account … get’s better with more variables
- best used for deciding which local test is best
- is independent of the number of variables
- -Value < … is normally 0.05 … 5%
- simple regression (only 1 IV) vs multiple regression (multiple IVs)
Assumption
- linear relationship between DVs and IVs
- linear for linear regression
- exponential for exponential regression, etc
- normal distribution of errors
- homoscedasticitytodo spellcheck
- no multi-column linearity
- IVs are independent of one another
Interpret Simple Regression Analysis
todo get a sample regression summary
Interpret Multiple Regression Analysis
todo get a multiple regression summary
Z-Transformation
- doing a z-transformation scales all values along the same mean with a standard deviation of 1
- allows us to also use the coefficient of the regression analysis to base our assumption of the data instead of just the p-value
Adding another IV / Predictor
- some correlations will change
- some correlations might flip signs
Model Assumptions
- todo 4 plots of
plot(mreg2) - Cook’s Distance
Predict using Regression Analysis
predict(mreg4, newdata=new)