Statistics and Probability Theory are the foundations for modeling and machine learning
Statistical Modeling will be more concerned with inference. Understanding an effect we hope to observe
Machine Learning will be more concerned with prediction. Being able to predict an unknown value based off a sample we observed. Having a representative sample and generalizable results will be crucial
Fit the data to desired output and learn from the data!
Comprehending data
Distributions
Comprehending data
Distributions
Sampling
| Data | Description | |
|---|---|---|
| Binary | a value of 0/1, true/false | |
| Categorical | a discrete value with limited possibilities | |
| Continuous | a numerical value that has an infinite range of possibilities |
| Data | Description | |
|---|---|---|
| Binary | a value of 0/1, true/false | |
| Categorical | a discrete value with limited possibilities | |
| Continuous | a numerical value that has an infinite range of possibilities |


| carat | cut | color | clarity | depth | table | price | x | y | z | id | purchased |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.54 | Ideal | J | VS1 | 62.2 | 59 | 8848 | 7.34 | 7.38 | 4.58 | 1 | TRUE |
| 0.91 | Ideal | E | SI2 | 61.5 | 56 | 3968 | 6.20 | 6.23 | 3.82 | 2 | FALSE |
| 2.01 | Good | H | SI2 | 63.1 | 55 | 13849 | 7.99 | 8.09 | 5.07 | 3 | FALSE |
| 1.01 | Good | E | SI1 | 64.1 | 62 | 4480 | 6.26 | 6.19 | 3.99 | 4 | TRUE |
| 1.52 | Good | F | VS2 | 57.8 | 59 | 12283 | 7.58 | 7.50 | 4.36 | 5 | TRUE |
| 1.61 | Premium | D | SI1 | 61.4 | 60 | 13582 | 7.56 | 7.51 | 4.63 | 6 | FALSE |
| 0.33 | Ideal | G | VS1 | 61.5 | 56 | 699 | 4.45 | 4.48 | 2.74 | 7 | FALSE |
| 1.17 | Ideal | F | VS2 | 61.8 | 55 | 8072 | 6.81 | 6.74 | 4.19 | 8 | FALSE |
| 0.30 | Premium | H | VS2 | 62.6 | 58 | 608 | 4.28 | 4.22 | 2.66 | 9 | TRUE |
| 0.70 | Ideal | G | VS2 | 61.8 | 57 | 2929 | 5.68 | 5.71 | 3.52 | 10 | TRUE |
| carat | cut | color | clarity | depth | table | price | x | y | z | id | purchased |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.54 | Ideal | J | VS1 | 62.2 | 59 | 8848 | 7.34 | 7.38 | 4.58 | 1 | TRUE |
| 0.91 | Ideal | E | SI2 | 61.5 | 56 | 3968 | 6.20 | 6.23 | 3.82 | 2 | FALSE |
| 2.01 | Good | H | SI2 | 63.1 | 55 | 13849 | 7.99 | 8.09 | 5.07 | 3 | FALSE |
| 1.01 | Good | E | SI1 | 64.1 | 62 | 4480 | 6.26 | 6.19 | 3.99 | 4 | TRUE |
| 1.52 | Good | F | VS2 | 57.8 | 59 | 12283 | 7.58 | 7.50 | 4.36 | 5 | TRUE |
| 1.61 | Premium | D | SI1 | 61.4 | 60 | 13582 | 7.56 | 7.51 | 4.63 | 6 | FALSE |
| 0.33 | Ideal | G | VS1 | 61.5 | 56 | 699 | 4.45 | 4.48 | 2.74 | 7 | FALSE |
| 1.17 | Ideal | F | VS2 | 61.8 | 55 | 8072 | 6.81 | 6.74 | 4.19 | 8 | FALSE |
| 0.30 | Premium | H | VS2 | 62.6 | 58 | 608 | 4.28 | 4.22 | 2.66 | 9 | TRUE |
| 0.70 | Ideal | G | VS2 | 61.8 | 57 | 2929 | 5.68 | 5.71 | 3.52 | 10 | TRUE |
| city_type | population_mil | rainfall_inches |
|---|---|---|
| urban | 1.2 | 38 |
| urban | .75 | 6 |
| suburban | .5 | 14 |
| suburban | .5 | 18 |
| rural | .5 | 32 |
| rural | .5 | 12 |
city_typepopulation_mil, rainfall_inches| id | price | clarity | cut |
|---|---|---|---|
| 1 | 8848 | VS1 | Ideal |
| 2 | 3968 | SI2 | Ideal |
| 3 | 13849 | SI2 | Good |
| 4 | 4480 | SI1 | Good |
| 5 | 12283 | VS2 | Good |
| 6 | 13582 | SI1 | Premium |
| 7 | 699 | VS1 | Ideal |
| 8 | 8072 | VS2 | Ideal |
| 9 | 608 | VS2 | Premium |
| 10 | 2929 | VS2 | Ideal |
| id | price | clarity | cut |
|---|---|---|---|
| 1 | 8848 | VS1 | Ideal |
| 2 | 3968 | SI2 | Ideal |
| 3 | 13849 | SI2 | Good |
| 4 | 4480 | SI1 | Good |
| 5 | 12283 | VS2 | Good |
| 6 | 13582 | SI1 | Premium |
| 7 | 699 | VS1 | Ideal |
| 8 | 8072 | VS2 | Ideal |
| 9 | 608 | VS2 | Premium |
| 10 | 2929 | VS2 | Ideal |
| id | price | clarity_1 | clarity_2 | clarity_3 | clarity_4 | clarity_5 | clarity_6 | clarity_7 | clarity_8 | cut_1 | cut_2 | cut_3 | cut_4 | cut_5 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 8848 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 3968 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 13849 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 4 | 4480 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 5 | 12283 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 6 | 13582 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7 | 699 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 8 | 8072 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 9 | 608 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 10 | 2929 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
A Random Variable can follow a certain distribution, depending on the data type
A normal distribution describes continuous, numeric data
Bernoulli and Binomial distributions describes random variables that take binary values
A uniform distribution can handle discrete or continuous data
A Random Variable can follow a certain distribution, depending on the data type
A normal distribution describes continuous, numeric data
Bernoulli and Binomial distributions describes random variables that take binary values
A uniform distribution can handle discrete or continuous data

A Random Variable can follow a certain distribution, depending on the data type
A normal distribution describes continuous, numeric data
Bernoulli and Binomial distributions describes random variables that take binary values
A uniform distribution can handle discrete or continuous data


Moments: Values from our sample that can help us understand our distribution
Parameters: The values that define our distribution]
Probability Density (Mass) Function (PDF)/(PMF): A description of how likely certain outcomes are in a distribution
Cumulative Distribution Function (CDF): A description of how much of a distribution is contained up to a certain point
Moments describe a distribution with a set of attributes
\(\text{E}[X^n]\): general raw moment
Moments describe a distribution with a set of attributes
\(\text{E}[X^n]\): general raw moment
\(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)
Moments describe a distribution with a set of attributes
\(\text{E}[X^n]\): general raw moment
\(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)
\(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)
Moments describe a distribution with a set of attributes
\(\text{E}[X^n]\): general raw moment
\(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)
\(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)
\(\text{E}[X^2]\) - \(\text{E}[X]^2\): second central moment, \(\sigma^2=\frac{\sum{(x-\mu})^2}{n}\)
Moments describe a distribution with a set of attributes
\(\text{E}[X^n]\): general raw moment
\(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)
\(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)
\(\text{E}[X^2]\) - \(\text{E}[X]^2\): second central moment, \(\sigma^2=\frac{\sum{(x-\mu})^2}{n}\)
For continuous data, we can mathematically represent the second central moment:
Moments describe a distribution with a set of attributes
\(\text{E}[X^n]\): general raw moment
\(\text{E}[X]\): first moment, \(\mu=\frac{\sum{x}}{n}\)
\(\text{E}[X^2]\): second raw moment (unrefined, so contains first moment information)
\(\text{E}[X^2]\) - \(\text{E}[X]^2\): second central moment, \(\sigma^2=\frac{\sum{(x-\mu})^2}{n}\)
For continuous data, we can mathematically represent the second central moment:
$$\int_{\infty}^{-\infty}f(x)x^2dx - \text{E}[X]^2$$ where \(f(x)=PDF\)




import random, numpy, mathimport statistics as stats, matplotlib.pyplot as pltimport seaborn as sns, pandas as pdrandom.seed(50390) # reproducibilitysample_df = numpy.random.normal( loc=0, scale = 2, size = 1000 ) # Normalsample_df[0:3]
## array([-0.42501843, -0.93878938, -2.88702722])stats.mean(sample_df)
## -0.051744188640231566# Solving for sigma using sample variancemath.sqrt(stats.variance(sample_df))
## 1.9919425634418428
my_nums = pd.read_csv("my_sample.csv") #sample is uniformmy_nums = my_nums["x"].tolist()my_nums[0:5]
## [8.74912820779718, 9.16765952832066, 6.329065448371691, 4.0551836041268, 8.59842579229735]## Data length: 10000stats.mean(my_nums) #Can we use sample moments to find parameters
## 5.513906104243407stats.variance(my_nums)
## 6.673169412967319\(\bar{\text{X}}=5.51\)
\(\text{s}^2=6.67\)
\(\text{~U}(a, b)\)
\begin{align} \text{E}[\mu] & =\int_{\infty}^{-\infty}\frac{1}{b-a}xdx \\ & = \frac{X^2}{2(b-a)} \\ \frac{X^2}{2(b-a)} ]_a^b& = \frac{b^2}{2(b-a)}-\frac{a^2}{2(b-a)} \\ & = \frac{b^2-a^2}{2(b-a)} \\ \frac{b^2-a^2}{2(b-a)} & = \frac{{(b-a)}(b+a)}{2(b-a)} \\ \text{E}[\mu] & = \frac{a+b}{2} \\ \end{align}
\(\bar{\text{X}}=5.51\) \(\text{s}^2=6.67\)
\(\text{~U}(a, b)\)
\(\text{1) } \text{E}[\mu]=5.51=\frac{a+b}{2}\) \(\text{2) } \text{V}=6.67=\frac{(b-a)^2}{12}\)
\(\bar{\text{X}}=5.51\) \(\text{s}^2=6.67\)
\(\text{~U}(a, b)\)
\(\text{1) } \text{E}[\mu]=5.51=\frac{a+b}{2}\) \(\text{2) } \text{V}=6.67=\frac{(b-a)^2}{12}\)
\begin{align} \text{Solve eq. 1 for b: } a+b & =11.02 \\ b & =11.02 - a \\ \end{align}
\begin{align} \text{Plug b into eq. 2: } \frac{(11.02-a-a)^2}{12} & =6.67 \\ 11.02-2a & =6.67\sqrt{12} \\ -2a& =6.67\sqrt{12}-11.02 \\ a& =\frac{6.67\sqrt{12}-11.02}{-2} \\ a& =1.04 \\ \end{align}
\begin{align} \text{Complete eq. 1 for b: } b & =11.02 - 1.04 \\ b & =9.98 \\ \end{align}
print(f"a = {min(my_nums)}, max = {max(my_nums)}")
## a = 1.0011236150749, max = 9.999872184358539
Our goal when modeling is to use a sample to draw conclusions about a greater population
It is important that our sample is representative of our population; Ensuring we have a sufficient sample size is helpful to accomplish this goal
Our goal when modeling is to use a sample to draw conclusions about a greater population
It is important that our sample is representative of our population; Ensuring we have a sufficient sample size is helpful to accomplish this goal
Sample Moments = Theoretical Moments! If the sample is not representative, this equation will mislead us
Our goal when modeling is to use a sample to draw conclusions about a greater population
It is important that our sample is representative of our population; Ensuring we have a sufficient sample size is helpful to accomplish this goal
Sample Moments = Theoretical Moments! If the sample is not representative, this equation will mislead us
Methods of repeated sampling are also good to utilize, Central Limit Theorem
dice = [1, 2, 3, 4, 5, 6] #uniform
# Central limit# Draws from Uniform Distribution becoming normal distributionrolls = 6dice_means = [0] * rollsfor i in range(0,rolls): this_roll = random.choices(dice,k=rolls) dice_means[i] = stats.mean(this_roll)Statistics and Probability Theory are the foundations for modeling and machine learning
Statistical Modeling will be more concerned with inference. Understanding an effect we hope to observe
Machine Learning will be more concerned with prediction. Being able to predict an unknown value based off a sample we observed. Having a representative sample and generalizable results will be crucial
Fit the data to desired output and learn from the data!
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |