Science of data

Descriptive Statistics

predictions, estimates of sample data → stock market
- confidence intervals
- hypothesis testing
- regression analysis

experimental unit → what we collect data from
population → group of experimental units → subgroup of all global data
- e.g. all working adults in Austria
variable → characteristic or property in a population
sample → a subset of the population (not biased/selected)
- e.g. the working adults in Austria asked by survey X
statistical inference → estimate, prediction, etc on population based on sample
- e.g. the preference of cola brand based on 1.000 people survey
measure of reliability → statement about uncertainty

cannot be measured in numbers
nominal variable
- classification → e.g. member of religion
- definitely true with binary data points (yes/no)
ordinal variable
- same like nominal variables
- have an order
- can be sorted, ranked
  - e.g. course grades

continuous, metric variable
interval variable
- linear scale of values → differences between values are meaningful
  - e.g. temperature, time
ratio variable
- ratio between 2 values is meaningful
- there must be a natural 0
  - e.g. temperature in Celsius has no natural 0, Calvin does

representative sample
- mostly done using random samples
- analogy with pot of soup → could taste different in different parts of the pot
ensuring reliability of results

Selection Bias → if certain subsets of a population are under/over represented within sample
non-response bias → not ensuring that data is collected from all experimental units
measurement error → just wrong values at time of analysis