Simple Linear Regression

  • question: “is the price of a house dependent on the size ()?”

Deterministic vs Probabilistic

  • deterministic … exact relationship
    • very unrealistic, we are not in a deterministic world
  • probabilistic … deterministic component + random error
    • oftentimes used
    • when modelling we need to take error into account
    • error term also takes care of the data points not taken into account
      • sample vs population

Theory

Info

line of means

  • y does not increase, it is expected to increase
  • and are unknown, we have to find/compute them
    • we want to find the best fitting parameters and
  • best … least error (distance) between the actual values and the expected values
    • … error of any data point
      • … residual sum of squares
  • minimize to reach and
    • take partial derivatives in respect to and
    • solves beautifully, allegedly
      • 2 equations with 2 unknowns
    • not exam relevant, the minimizing function(s) will be given on the formula sheet, just plug in
  • method of least squares
  • coefficient of determination
    • relation of variation in X and variation of the error
      • variation of the error variation of X
    • relation of variation in X and variation in variation in X
    • is always between 0 and 1

Multiple Regression Model

  • question: “which parameters is the price of a house dependent on?”
    • e.g. size, # of bedrooms, distance to nearest grocery/school/public transport, has garden
  • similar, just more variables

Important

with R … stars in p-values are significant keep them in your model, the more stars the merrier

  • multiple takes also number of variables into account
    • when comparing models with different number of variables, look at adjusted values