Skip to content

Commit 13e51a8

Browse files
committed
fix small errors in the formula
1 parent 7a77956 commit 13e51a8

File tree

2 files changed

+29
-20
lines changed

2 files changed

+29
-20
lines changed

lectures/cross_section.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ even more extreme observations.
207207

208208
See, for example, {cite}`mandelbrot1963variation` or {cite}`rachev2003handbook`.
209209

210-
210+
(heavy_tail)=
211211
### Other Data
212212

213213
The data we have just seen is said to be "heavy-tailed".

lectures/prob_dist.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@ kernelspec:
1515

1616
# Distributions and Probabilities
1717

18+
```{index} single: Distributions and Probabilities
19+
```
20+
21+
```{contents} Contents
22+
:depth: 2
23+
```
24+
1825
## Outline
1926

2027
In this lecture we give a quick introduction to data and probability distributions using Python
@@ -162,12 +169,12 @@ Check that your answers agree with `u.mean()` and `u.var()`.
162169
Another useful (and more interesting) distribution is the **binomial distribution** on $S=\{0, \ldots, n\}$, which has PMF
163170

164171
$$
165-
p(i) = \binom{i}{n} \theta^i (1-\theta)^{n-i}
172+
p(i) = \binom{n}{i} \theta^i (1-\theta)^{n-i}
166173
$$
167174

168175
Here $\theta \in [0,1]$ is a parameter.
169176

170-
The interpretatin of $p(i)$ is: the number of successes in $n$ independent trials with success probability $\theta$.
177+
The interpretation of $p(i)$ is: the number of successes in $n$ independent trials with success probability $\theta$.
171178

172179
(If $\theta=0.5$, this is "how many heads in $n$ flips of a fair coin")
173180

@@ -272,7 +279,7 @@ plt.show()
272279

273280
Continuous distributions are represented by a **density function**, which is a function $p$ over $\mathbb R$ (the set of all numbers) such that $p(x) \geq 0$ for all $x$ and
274281

275-
$$ \int_{-\infty}^\infty p(x) = 1 $$
282+
$$ \int_{-\infty}^\infty p(x) dx = 1 $$
276283

277284
We say that random variable $X$ has distribution $p$ if
278285

@@ -294,14 +301,14 @@ The **cumulative distribution function** (CDF) of $X$ is defined by
294301

295302
$$
296303
F(x) = \mathbb P\{X \leq x\}
297-
= \int_{-\infty}^y p(y) dy
304+
= \int_{-\infty}^x p(x) dx
298305
$$
299306

300307
+++ {"user_expressions": []}
301308

302309
#### Normal distribution
303310

304-
Perhaps the most famous distribution is the **normal distribution**, which as density
311+
Perhaps the most famous distribution is the **normal distribution**, which has density
305312

306313
$$
307314
p(x) = \frac{1}{\sqrt{2\pi}\sigma}
@@ -312,7 +319,7 @@ This distribution has two parameters, $\mu$ and $\sigma$.
312319

313320
It can be shown that, for this distribution, the mean is $\mu$ and the variance is $\sigma^2$.
314321

315-
We can obtain the moments, PDF, CDF of the normal density via SciPy as follows:
322+
We can obtain the moments, PDF, and CDF of the normal density as follows:
316323

317324
```{code-cell} ipython3
318325
μ, σ = 0.0, 1.0
@@ -376,7 +383,7 @@ It has a nice interpretation: if $X$ is lognormally distributed, then $\log X$ i
376383

377384
It is often used to model variables that are "multiplicative" in nature, such as income or asset prices.
378385

379-
We can obtain the moments, PDF, CDF of the normal density via SciPy as follows:
386+
We can obtain the moments, PDF, and CDF of the normal density as follows:
380387

381388
```{code-cell} ipython3
382389
μ, σ = 0.0, 1.0
@@ -390,10 +397,9 @@ u.mean(), u.var()
390397
```{code-cell} ipython3
391398
μ_vals = [-1, 0, 1]
392399
σ_vals = [0.25, 0.5, 1]
393-
fig, ax = plt.subplots()
394-
395400
x_grid = np.linspace(0, 3, 200)
396401
402+
fig, ax = plt.subplots()
397403
for μ, σ in zip(μ_vals, σ_vals):
398404
u = scipy.stats.lognorm(σ, scale=np.exp(μ))
399405
ax.plot(x_grid, u.pdf(x_grid),
@@ -432,7 +438,7 @@ It is related to the Poisson distribution as it describes the distribution of th
432438

433439
It can be shown that, for this distribution, the mean is $1/\lambda$ and the variance is $1/\lambda^2$.
434440

435-
We can obtain the moments, PDF, CDF of the normal density via SciPy as follows:
441+
We can obtain the moments, PDF, and CDF of the normal density as follows:
436442

437443
```{code-cell} ipython3
438444
λ = 1.0
@@ -446,6 +452,8 @@ u.mean(), u.var()
446452
```{code-cell} ipython3
447453
fig, ax = plt.subplots()
448454
λ_vals = [0.5, 1, 2]
455+
x_grid = np.linspace(0, 6, 200)
456+
449457
for λ in λ_vals:
450458
u = scipy.stats.expon(scale=1/λ)
451459
ax.plot(x_grid, u.pdf(x_grid),
@@ -486,12 +494,13 @@ For example, if $\alpha = \beta = 1$, then the beta distribution is uniform on $
486494

487495
While, if $\alpha = 3$ and $\beta = 2$, then the beta distribution is located more towards 1 as there are more successes than failures.
488496

489-
It can be shown that, for this distribution, the mean is $\alpha / (\alpha + \beta)$ and the variance is $\alpha \beta / (\alpha + \beta)^2 (\alpha + \beta + 1)$.
497+
It can be shown that, for this distribution, the mean is $\alpha / (\alpha + \beta)$ and
498+
the variance is $\alpha \beta / (\alpha + \beta)^2 (\alpha + \beta + 1)$.
490499

491-
We can obtain the moments, PDF, CDF of the normal density via SciPy as follows:
500+
We can obtain the moments, PDF, and CDF of the normal density as follows:
492501

493502
```{code-cell} ipython3
494-
α, β = 1.0, 1.0
503+
α, β = 3.0, 1.0
495504
u = scipy.stats.beta(α, β)
496505
```
497506

@@ -500,8 +509,8 @@ u.mean(), u.var()
500509
```
501510

502511
```{code-cell} ipython3
503-
α_vals = [0.5, 1, 50, 250, 3]
504-
β_vals = [3, 1, 100, 200, 1]
512+
α_vals = [0.5, 1, 5, 25, 3]
513+
β_vals = [3, 1, 10, 20, 0.5]
505514
x_grid = np.linspace(0, 1, 200)
506515
507516
fig, ax = plt.subplots()
@@ -541,10 +550,10 @@ It can be shown that, for this distribution, the mean is $\alpha / \beta$ and th
541550

542551
One interpretation is that if $X$ is gamma distributed, then $X$ is the sum of $\alpha$ independent exponentially distributed random variables with mean $1/\beta$.
543552

544-
We can obtain the moments, PDF, CDF of the normal density via SciPy as follows:
553+
We can obtain the moments, PDF, and CDF of the normal density as follows:
545554

546555
```{code-cell} ipython3
547-
α, β = 1.0, 1.0
556+
α, β = 3.0, 2.0
548557
u = scipy.stats.gamma(α, scale=1/β)
549558
```
550559

@@ -742,7 +751,7 @@ plt.show()
742751

743752
When we use a larger bandwidth, the KDE is smoother.
744753

745-
A suitable bandwith is the one that is not too smooth (underfitting) or too wiggly (overfitting).
754+
A suitable bandwidth is not too smooth (underfitting) or too wiggly (overfitting).
746755

747756

748757
#### Violin plots
@@ -813,7 +822,7 @@ plt.show()
813822

814823
The match between the histogram and the density is not very bad but also not very good.
815824

816-
One reason is that the normal distribution is not really a good fit for this observed data --- we will discuss this point again when we talk about heavy tailed distributions in TODO add link.
825+
One reason is that the normal distribution is not really a good fit for this observed data --- we will discuss this point again when we talk about {ref}`heavy tailed distributions<heavy_tail>`.
817826

818827
+++ {"user_expressions": []}
819828

0 commit comments

Comments
 (0)