Skip to content

Commit 7409557

Browse files
Minor updates
1 parent 72d4db7 commit 7409557

File tree

1 file changed

+80
-9
lines changed

1 file changed

+80
-9
lines changed

lectures/prob_dist.md

Lines changed: 80 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@ kernelspec:
2323

2424
## Outline
2525

26-
In this lecture we give a quick introduction to data and probability distributions using Python
26+
In this lecture we give a quick introduction to data and probability distributions using Python.
2727

2828
```{code-cell} ipython3
2929
:tags: [hide-output]
30-
!pip install --upgrade yfinance
30+
! pip install --upgrade yfinance
3131
```
3232

3333
```{code-cell} ipython3
@@ -42,7 +42,7 @@ import seaborn as sns
4242

4343
## Common distributions
4444

45-
In this section we recall the definitions of some well-known distributions and show how to manipulate them with SciPy.
45+
In this section we recall the definitions of some well-known distributions and explore how to manipulate them with SciPy.
4646

4747
### Discrete distributions
4848

@@ -61,7 +61,7 @@ $$ \mathbb P\{X = x_i\} = p(x_i) \quad \text{for } i= 1, \ldots, n $$
6161
The **mean** or **expected value** of a random variable $X$ with distribution $p$ is
6262

6363
$$
64-
\mathbb E X = \sum_{i=1}^n x_i p(x_i)
64+
\mathbb{E}[X] = \sum_{i=1}^n x_i p(x_i)
6565
$$
6666

6767
Expectation is also called the *first moment* of the distribution.
@@ -71,15 +71,15 @@ We also refer to this number as the mean of the distribution (represented by) $p
7171
The **variance** of $X$ is defined as
7272

7373
$$
74-
\mathbb V X = \sum_{i=1}^n (x_i - \mathbb E X)^2 p(x_i)
74+
\mathbb{V}[X] = \sum_{i=1}^n (x_i - \mathbb{E}[X])^2 p(x_i)
7575
$$
7676

7777
Variance is also called the *second central moment* of the distribution.
7878

7979
The **cumulative distribution function** (CDF) of $X$ is defined by
8080

8181
$$
82-
F(x) = \mathbb P\{X \leq x\}
82+
F(x) = \mathbb{P}\{X \leq x\}
8383
= \sum_{i=1}^n \mathbb 1\{x_i \leq x\} p(x_i)
8484
$$
8585

@@ -157,6 +157,75 @@ Check that your answers agree with `u.mean()` and `u.var()`.
157157
```
158158

159159

160+
#### Bernoulli distribution
161+
162+
Another useful (and more interesting) distribution is the Bernoulli distribution
163+
164+
We can import the uniform distribution on $S = \{1, \ldots, n\}$ from SciPy like so:
165+
166+
```{code-cell} ipython3
167+
n = 10
168+
u = scipy.stats.randint(1, n+1)
169+
```
170+
171+
172+
Here's the mean and variance
173+
174+
```{code-cell} ipython3
175+
u.mean(), u.var()
176+
```
177+
178+
The formula for the mean is $(n+1)/2$, and the formula for the variance is $(n^2 - 1)/12$.
179+
180+
181+
Now let's evaluate the PMF
182+
183+
```{code-cell} ipython3
184+
u.pmf(1)
185+
```
186+
187+
```{code-cell} ipython3
188+
u.pmf(2)
189+
```
190+
191+
192+
Here's a plot of the probability mass function:
193+
194+
```{code-cell} ipython3
195+
fig, ax = plt.subplots()
196+
S = np.arange(1, n+1)
197+
ax.plot(S, u.pmf(S), linestyle='', marker='o', alpha=0.8, ms=4)
198+
ax.vlines(S, 0, u.pmf(S), lw=0.2)
199+
ax.set_xticks(S)
200+
plt.show()
201+
```
202+
203+
204+
Here's a plot of the CDF:
205+
206+
```{code-cell} ipython3
207+
fig, ax = plt.subplots()
208+
S = np.arange(1, n+1)
209+
ax.step(S, u.cdf(S))
210+
ax.vlines(S, 0, u.cdf(S), lw=0.2)
211+
ax.set_xticks(S)
212+
plt.show()
213+
```
214+
215+
216+
The CDF jumps up by $p(x_i)$ and $x_i$.
217+
218+
219+
```{exercise}
220+
:label: prob_ex1
221+
222+
Calculate the mean and variance for this parameterization (i.e., $n=10$)
223+
directly from the PMF, using the expressions given above.
224+
225+
Check that your answers agree with `u.mean()` and `u.var()`.
226+
```
227+
228+
160229

161230
#### Binomial distribution
162231

@@ -170,7 +239,7 @@ Here $\theta \in [0,1]$ is a parameter.
170239

171240
The interpretation of $p(i)$ is: the number of successes in $n$ independent trials with success probability $\theta$.
172241

173-
(If $\theta=0.5$, this is "how many heads in $n$ flips of a fair coin")
242+
(If $\theta=0.5$, p(i) can be "how many heads in $n$ flips of a fair coin")
174243

175244
The mean and variance are
176245

@@ -304,7 +373,7 @@ The definition of the mean and variance of a random variable $X$ with distributi
304373
For example, the mean of $X$ is
305374

306375
$$
307-
\mathbb E X = \int_{-\infty}^\infty x p(x) dx
376+
\mathbb{E}[X] = \int_{-\infty}^\infty x p(x) dx
308377
$$
309378

310379
The **cumulative distribution function** (CDF) of $X$ is defined by
@@ -328,7 +397,7 @@ This distribution has two parameters, $\mu$ and $\sigma$.
328397

329398
It can be shown that, for this distribution, the mean is $\mu$ and the variance is $\sigma^2$.
330399

331-
We can obtain the moments, PDF, and CDF of the normal density as follows:
400+
We can obtain the moments, PDF and CDF of the normal density as follows:
332401

333402
```{code-cell} ipython3
334403
μ, σ = 0.0, 1.0
@@ -700,6 +769,7 @@ The monthly return is calculated as the percent change in the share price over e
700769
So we will have one observation for each month.
701770

702771
```{code-cell} ipython3
772+
:tags: [hide-output]
703773
df = yf.download('AMZN', '2000-1-1', '2023-1-1', interval='1mo' )
704774
prices = df['Adj Close']
705775
data = prices.pct_change()[1:] * 100
@@ -777,6 +847,7 @@ Violin plots are particularly useful when we want to compare different distribut
777847
For example, let's compare the monthly returns on Amazon shares with the monthly return on Apple shares.
778848

779849
```{code-cell} ipython3
850+
:tags: [hide-output]
780851
df = yf.download('AAPL', '2000-1-1', '2023-1-1', interval='1mo' )
781852
prices = df['Adj Close']
782853
data = prices.pct_change()[1:] * 100

0 commit comments

Comments
 (0)