Simple Linear Regression
- question: “is the price of a house dependent on the size ()?”
Deterministic vs Probabilistic
- deterministic … exact relationship
- very unrealistic, we are not in a deterministic world
- probabilistic … deterministic component + random error
- oftentimes used
- when modelling we need to take error into account
- error term also takes care of the data points not taken into account
- sample vs population
Theory
Info
line of means
- y does not increase, it is expected to increase
- and are unknown, we have to find/compute them
- we want to find the best fitting parameters and
- best … least error (distance) between the actual values and the expected values
- … error of any data point
-
- … residual sum of squares
- …
- minimize to reach and
- take partial derivatives in respect to and
- solves beautifully, allegedly
- 2 equations with 2 unknowns
- not exam relevant, the minimizing function(s) will be given on the formula sheet, just plug in
- method of least squares
- coefficient of determination
-
- why…? long formula
- relation of variation in X and variation of the error
- variation of the error ⇒ variation of X
- relation of variation in X and variation in variation in X
- is always between 0 and 1
-
Multiple Regression Model
- question: “which parameters is the price of a house dependent on?”
- e.g. size, # of bedrooms, distance to nearest grocery/school/public transport, has garden
- similar, just more variables
- again method of least squares (OLS estimation)
- F-Distribution test
Important
with R … stars in p-values are significant keep them in your model, the more stars the merrier
- multiple takes also number of variables into account
- when comparing models with different number of variables, look at adjusted values