+ - 0:00:00
Notes for current slide
Notes for next slide

Statistical Thinking for Machine Learning: Lecture 1

Reda Mastouri
UChicago MastersTrack: Coursera
Thank you to Gregory Bernstein for parts of these slides

1

Statistics and machine learning

  • Statistics and Probability Theory are the foundations for modeling and machine learning

  • Statistical Modeling will be more concerned with inference. Understanding an effect we hope to observe

  • Machine Learning will be more concerned with prediction. Being able to predict an unknown value based off a sample we observed. Having a representative sample and generalizable results will be crucial

  • Fit the data to desired output and learn from the data!

2

Agenda

3

Agenda

  • Comprehending data
    • Data in the grand scheme of things
    • Understanding your data frame: Rows and columns
    • Different types of data
4

Agenda

  • Comprehending data

    • Data in the grand scheme of things
    • Understanding your data frame: Rows and columns
    • Different types of data
  • Distributions

    • Data types and distributions
    • PDF and CDF
    • Parameters and Method of Moments
    • The Normal Distribution and the Central Limit Theorem
5

Agenda

  • Comprehending data

    • Data in the grand scheme of things
    • Understanding your data frame: Rows and columns
    • Different types of data
  • Distributions

    • Data types and distributions
    • PDF and CDF
    • Parameters and Method of Moments
    • The Normal Distribution and the Central Limit Theorem
  • Sampling

    • How we can use sampling to achieve our goals
    • Descriptive statistics
6

Data structure and data types

7

Where does data fit in the big picture?

  • Defining a problem statement
  • Collecting and storing data
    • recording data, downloading data sets, web scraping or using API, data structures and lakes
  • Data
  • Sampling and Modeling
  • Insights, inference, and visualization
8

Data types

Data Description
Binary a value of 0/1, true/false
Categorical a discrete value with limited possibilities
Continuous a numerical value that has an infinite range of possibilities
9

Data types

Data Description
Binary a value of 0/1, true/false
Categorical a discrete value with limited possibilities
Continuous a numerical value that has an infinite range of possibilities

10

Rows and columns

11

Rows and columns: Diamond Data Set

  • Two use-cases: supervised and unsupervised learning
carat cut color clarity depth table price x y z id purchased
1.54 Ideal J VS1 62.2 59 8848 7.34 7.38 4.58 1 TRUE
0.91 Ideal E SI2 61.5 56 3968 6.20 6.23 3.82 2 FALSE
2.01 Good H SI2 63.1 55 13849 7.99 8.09 5.07 3 FALSE
1.01 Good E SI1 64.1 62 4480 6.26 6.19 3.99 4 TRUE
1.52 Good F VS2 57.8 59 12283 7.58 7.50 4.36 5 TRUE
1.61 Premium D SI1 61.4 60 13582 7.56 7.51 4.63 6 FALSE
0.33 Ideal G VS1 61.5 56 699 4.45 4.48 2.74 7 FALSE
1.17 Ideal F VS2 61.8 55 8072 6.81 6.74 4.19 8 FALSE
0.30 Premium H VS2 62.6 58 608 4.28 4.22 2.66 9 TRUE
0.70 Ideal G VS2 61.8 57 2929 5.68 5.71 3.52 10 TRUE
12

Rows and columns: Diamond Data Set

  • Two use-cases: supervised and unsupervised learning
carat cut color clarity depth table price x y z id purchased
1.54 Ideal J VS1 62.2 59 8848 7.34 7.38 4.58 1 TRUE
0.91 Ideal E SI2 61.5 56 3968 6.20 6.23 3.82 2 FALSE
2.01 Good H SI2 63.1 55 13849 7.99 8.09 5.07 3 FALSE
1.01 Good E SI1 64.1 62 4480 6.26 6.19 3.99 4 TRUE
1.52 Good F VS2 57.8 59 12283 7.58 7.50 4.36 5 TRUE
1.61 Premium D SI1 61.4 60 13582 7.56 7.51 4.63 6 FALSE
0.33 Ideal G VS1 61.5 56 699 4.45 4.48 2.74 7 FALSE
1.17 Ideal F VS2 61.8 55 8072 6.81 6.74 4.19 8 FALSE
0.30 Premium H VS2 62.6 58 608 4.28 4.22 2.66 9 TRUE
0.70 Ideal G VS2 61.8 57 2929 5.68 5.71 3.52 10 TRUE
  • Independent variables can be used for unsupervised modeling
  • Dependent variables and independent variables can be used together for supervised modeling
13

Rows and columns

  • Rows are linked — some rows belong to the same group
  • Are the group means the same?
city_type population_mil rainfall_inches
urban 1.2 38
urban .75 6
suburban .5 14
suburban .5 18
rural .5 32
rural .5 12
  • Label and categorical independent variable that can be used in a model: city_type
  • potential dependent variables: population_mil, rainfall_inches
14

Categorical variables and dummy encoding

id price clarity cut
1 8848 VS1 Ideal
2 3968 SI2 Ideal
3 13849 SI2 Good
4 4480 SI1 Good
5 12283 VS2 Good
6 13582 SI1 Premium
7 699 VS1 Ideal
8 8072 VS2 Ideal
9 608 VS2 Premium
10 2929 VS2 Ideal
15

Categorical variables and dummy encoding

id price clarity cut
1 8848 VS1 Ideal
2 3968 SI2 Ideal
3 13849 SI2 Good
4 4480 SI1 Good
5 12283 VS2 Good
6 13582 SI1 Premium
7 699 VS1 Ideal
8 8072 VS2 Ideal
9 608 VS2 Premium
10 2929 VS2 Ideal
id price clarity_1 clarity_2 clarity_3 clarity_4 clarity_5 clarity_6 clarity_7 clarity_8 cut_1 cut_2 cut_3 cut_4 cut_5
1 8848 0 0 0 0 1 0 0 0 0 0 0 0 1
2 3968 0 1 0 0 0 0 0 0 0 0 0 0 1
3 13849 0 1 0 0 0 0 0 0 0 1 0 0 0
4 4480 0 0 1 0 0 0 0 0 0 1 0 0 0
5 12283 0 0 0 1 0 0 0 0 0 1 0 0 0
6 13582 0 0 1 0 0 0 0 0 0 0 0 1 0
7 699 0 0 0 0 1 0 0 0 0 0 0 0 1
8 8072 0 0 0 1 0 0 0 0 0 0 0 0 1
9 608 0 0 0 1 0 0 0 0 0 0 0 1 0
10 2929 0 0 0 1 0 0 0 0 0 0 0 0 1
16

Distributions

17

Distributions depend on the data

18

Distributions depend on the data

  • A Random Variable can follow a certain distribution, depending on the data type

  • A normal distribution describes continuous, numeric data

  • Bernoulli and Binomial distributions describes random variables that take binary values

  • A uniform distribution can handle discrete or continuous data

19

Distributions depend on the data

  • A Random Variable can follow a certain distribution, depending on the data type

  • A normal distribution describes continuous, numeric data

  • Bernoulli and Binomial distributions describes random variables that take binary values

  • A uniform distribution can handle discrete or continuous data

20

Distributions depend on the data

  • A Random Variable can follow a certain distribution, depending on the data type

  • A normal distribution describes continuous, numeric data

  • Bernoulli and Binomial distributions describes random variables that take binary values

  • A uniform distribution can handle discrete or continuous data

21

Terminology of distributions

  • Moments: Values from our sample that can help us understand our distribution

  • Parameters: The values that define our distribution]

    • Normal: \(\mu\), \(\sigma\)
    • Binomial: \(n\), \(p\)
    • Uniform: \(a\), \(b\)
  • Probability Density (Mass) Function (PDF)/(PMF): A description of how likely certain outcomes are in a distribution

  • Cumulative Distribution Function (CDF): A description of how much of a distribution is contained up to a certain point

22

Moments of random variables

23

Moments of random variables

  • Moments describe a distribution with a set of attributes
24

Moments of random variables

  • Moments describe a distribution with a set of attributes

  • \(\text{E}[X^n]\): general raw moment

25

Moments of random variables

  • Moments describe a distribution with a set of attributes

  • \(\text{E}[X^n]\): general raw moment

  • \(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)

26

Moments of random variables

  • Moments describe a distribution with a set of attributes

  • \(\text{E}[X^n]\): general raw moment

  • \(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)

  • \(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)

27

Moments of random variables

  • Moments describe a distribution with a set of attributes

  • \(\text{E}[X^n]\): general raw moment

  • \(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)

  • \(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)

  • \(\text{E}[X^2]\) - \(\text{E}[X]^2\): second central moment, \(\sigma^2=\frac{\sum{(x-\mu})^2}{n}\)

28

Moments of random variables

  • Moments describe a distribution with a set of attributes

  • \(\text{E}[X^n]\): general raw moment

  • \(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)

  • \(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)

  • \(\text{E}[X^2]\) - \(\text{E}[X]^2\): second central moment, \(\sigma^2=\frac{\sum{(x-\mu})^2}{n}\)

  • For continuous data, we can mathematically represent the second central moment:

29

Moments of random variables

  • Moments describe a distribution with a set of attributes

  • \(\text{E}[X^n]\): general raw moment

  • \(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)

  • \(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)

  • \(\text{E}[X^2]\) - \(\text{E}[X]^2\): second central moment, \(\sigma^2=\frac{\sum{(x-\mu})^2}{n}\)

  • For continuous data, we can mathematically represent the second central moment:

$$\int_{\infty}^{-\infty}f(x)x^2dx - \text{E}[X]^2$$ where \(f(x)=PDF\)

30

Uniform PMF/PDF, CDF

31

Normal PDF, CDF

32

Method of Moments Calculation — Normal, Uniform

33

Simulated data: Normal

import random, numpy, math
import statistics as stats, matplotlib.pyplot as plt
import seaborn as sns, pandas as pd
random.seed(50390) # reproducibility
sample_df = numpy.random.normal(
loc=0, scale = 2, size = 1000
) # Normal
sample_df[0:3]
## array([-0.42501843, -0.93878938, -2.88702722])
stats.mean(sample_df)
## -0.051744188640231566
# Solving for sigma using sample variance
math.sqrt(stats.variance(sample_df))
## 1.9919425634418428

34

Mystery data: Uniform

my_nums = pd.read_csv("my_sample.csv") #sample is uniform
my_nums = my_nums["x"].tolist()
my_nums[0:5]
## [8.74912820779718, 9.16765952832066, 6.329065448371691, 4.0551836041268, 8.59842579229735]
## Data length: 10000
stats.mean(my_nums) #Can we use sample moments to find parameters
## 5.513906104243407
stats.variance(my_nums)
## 6.673169412967319
35

Calculating Method of Moments: Uniform

\(\bar{\text{X}}=5.51\) \(\text{s}^2=6.67\)
\(\text{~U}(a, b)\)

\begin{align} \text{E}[\mu] & =\int_{\infty}^{-\infty}\frac{1}{b-a}xdx \\ & = \frac{X^2}{2(b-a)} \\ \frac{X^2}{2(b-a)} ]_a^b& = \frac{b^2}{2(b-a)}-\frac{a^2}{2(b-a)} \\ & = \frac{b^2-a^2}{2(b-a)} \\ \frac{b^2-a^2}{2(b-a)} & = \frac{{(b-a)}(b+a)}{2(b-a)} \\ \text{E}[\mu] & = \frac{a+b}{2} \\ \end{align}

36

Calculating Method of Moments: Uniform

37

Calculating Method of Moments: Uniform

\(\bar{\text{X}}=5.51\) \(\text{s}^2=6.67\)
\(\text{~U}(a, b)\)

\(\text{1) } \text{E}[\mu]=5.51=\frac{a+b}{2}\) \(\text{2) } \text{V}=6.67=\frac{(b-a)^2}{12}\)

38

Calculating Method of Moments: Uniform

\(\bar{\text{X}}=5.51\) \(\text{s}^2=6.67\)
\(\text{~U}(a, b)\)

\(\text{1) } \text{E}[\mu]=5.51=\frac{a+b}{2}\) \(\text{2) } \text{V}=6.67=\frac{(b-a)^2}{12}\)

\begin{align} \text{Solve eq. 1 for b: } a+b & =11.02 \\ b & =11.02 - a \\ \end{align}

\begin{align} \text{Plug b into eq. 2: } \frac{(11.02-a-a)^2}{12} & =6.67 \\ 11.02-2a & =6.67\sqrt{12} \\ -2a& =6.67\sqrt{12}-11.02 \\ a& =\frac{6.67\sqrt{12}-11.02}{-2} \\ a& =1.04 \\ \end{align}

\begin{align} \text{Complete eq. 1 for b: } b & =11.02 - 1.04 \\ b & =9.98 \\ \end{align}

39

Checking Method of Moments with the data

print(f"a = {min(my_nums)}, max = {max(my_nums)}")
## a = 1.0011236150749, max = 9.999872184358539

40

Sampling and descriptive statistics

41

Choosing a representative sample

42

Choosing a representative sample

  • Our goal when modeling is to use a sample to draw conclusions about a greater population
43

Choosing a representative sample

  • Our goal when modeling is to use a sample to draw conclusions about a greater population

  • It is important that our sample is representative of our population; Ensuring we have a sufficient sample size is helpful to accomplish this goal

44

Choosing a representative sample

  • Our goal when modeling is to use a sample to draw conclusions about a greater population

  • It is important that our sample is representative of our population; Ensuring we have a sufficient sample size is helpful to accomplish this goal

  • Sample Moments = Theoretical Moments! If the sample is not representative, this equation will mislead us

45

Choosing a representative sample

  • Our goal when modeling is to use a sample to draw conclusions about a greater population

  • It is important that our sample is representative of our population; Ensuring we have a sufficient sample size is helpful to accomplish this goal

  • Sample Moments = Theoretical Moments! If the sample is not representative, this equation will mislead us

  • Methods of repeated sampling are also good to utilize, Central Limit Theorem

46

Central Limit Theorem

dice = [1, 2, 3, 4, 5, 6] #uniform
# Central limit
# Draws from Uniform Distribution becoming normal distribution
rolls = 6
dice_means = [0] * rolls
for i in range(0,rolls):
this_roll = random.choices(dice,k=rolls)
dice_means[i] = stats.mean(this_roll)
47

Using descriptive statistics from a sample

  • Measure of central tendency:
    • Mean — arithmetic average
    • Median — middle quantile (based on data sorted)
    • Mode — most common value
  • Variance
  • Range, IQ range
  • Correlation — the foundation of linear regression
48

Statistics and machine learning

  • Statistics and Probability Theory are the foundations for modeling and machine learning

  • Statistical Modeling will be more concerned with inference. Understanding an effect we hope to observe

  • Machine Learning will be more concerned with prediction. Being able to predict an unknown value based off a sample we observed. Having a representative sample and generalizable results will be crucial

  • Fit the data to desired output and learn from the data!

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow