Skip to content

Commit 47a2a3d

Browse files
committed
misc
1 parent d11b1c7 commit 47a2a3d

File tree

1 file changed

+57
-47
lines changed

1 file changed

+57
-47
lines changed

lectures/mle.md

Lines changed: 57 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ from math import exp, log
2424

2525
## Introduction
2626

27-
Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax
28-
will raise.
27+
Consider a situation where a policymaker is trying to estimate how much revenue
28+
a proposed wealth tax will raise.
2929

3030
The proposed tax is
3131

@@ -39,12 +39,15 @@ $$
3939

4040
where $w$ is wealth.
4141

42-
For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means a
43-
5% tax on wealth up to 2.5 and 10% tax on wealth in excess of 2.5.
4442

45-
(The units will be 100,000 so 2.5 means 250,000 dollars.)
43+
For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means
4644

47-
Here we define $h$
45+
* a 5% tax on wealth up to 2.5 and
46+
* a 10% tax on wealth in excess of 2.5.
47+
48+
The unit is 100,000, so $w= 2.5$ means 250,000 dollars.
49+
50+
Let's go ahead and define $h$:
4851

4952
```{code-cell} ipython3
5053
def h(w, a=0.05, b=0.1, w_bar=2.5):
@@ -54,18 +57,21 @@ def h(w, a=0.05, b=0.1, w_bar=2.5):
5457
return a * w_bar + b * (w - w_bar)
5558
```
5659

57-
For a population of size $N$, where individual $i$ has wealth $w_i$, the total revenue will be given by
60+
For a population of size $N$, where individual $i$ has wealth $w_i$, total revenue raised by
61+
the tax will be
5862

5963
$$
6064
T = \sum_{i=1}^{N} h(w_i)
6165
$$
6266

63-
However, in most countries wealth is not observed for all individuals.
67+
We wish to calculate this quantity.
68+
69+
The problem we face is that, in most countries, wealth is not observed for all individuals.
6470

6571
Collecting and maintaining accurate wealth data for all individuals or households in a country
6672
is just too hard.
6773

68-
So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ individuals.
74+
So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ randomly selected individuals.
6975

7076
For our exercise we are going to use a sample of $n = 10,000$ observations from wealth data in the US in 2016.
7177

@@ -140,15 +146,15 @@ Maximum likelihood estimation has two steps:
140146
2. Estimate the parameter values (e.g., estimate $\mu$ and $\sigma$ for the
141147
normal distribution)
142148
143-
One reasonable assumption for the wealth is that each
144-
that $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
149+
One possible assumption for the wealth is that each
150+
$w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
145151
with parameters $\mu \in (-\infty,\infty)$ and $\sigma \in (0,\infty)$.
146152
147-
This means that $\ln w_i$ is normally distributed with mean $\mu$ and
148-
standard deviation $\sigma$.
153+
(This means that $\ln w_i$ is normally distributed with mean $\mu$ and standard deviation $\sigma$.)
149154
150-
You can see that this is a reasonable assumption because if we histogram log wealth
151-
instead of wealth the picture starts to look something like a bell-shaped curve.
155+
You can see that this assumption is not completely unreasonable because, if we
156+
histogram log wealth instead of wealth, the picture starts to look something
157+
like a bell-shaped curve.
152158
153159
```{code-cell} ipython3
154160
ln_sample = np.log(sample)
@@ -157,59 +163,69 @@ ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
157163
plt.show()
158164
```
159165
160-
+++ {"user_expressions": []}
161-
162166
Now our job is to obtain the maximum likelihood estimates of $\mu$ and $\sigma$, which
163167
we denote by $\hat{\mu}$ and $\hat{\sigma}$.
164168
165169
These estimates can be found by maximizing the likelihood function given the
166170
data.
167171
168172
The pdf of a lognormally distributed random variable $X$ is given by:
169-
$$
170-
f(x) = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}} exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
171-
$$
172173
173-
Since $\ln X$ is normally distributed this is the same as
174174
$$
175-
f(x) = \frac{1}{x} \phi(x)
175+
f(x, \mu, \sigma)
176+
= \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}}
177+
\exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
176178
$$
177-
where $\phi$ is the pdf of $\ln X$ which is normally distibuted with mean $\mu$ and variance $\sigma ^2$.
178179
179-
For a sample $x = (x_1, x_2, \cdots, x_n)$ the _likelihood function_ is given by:
180+
For our sample $w_1, w_2, \cdots, w_n$, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by
181+
180182
$$
181183
\begin{aligned}
182-
L(\mu, \sigma | x_i) = \prod_{i=1}^{n} f(\mu, \sigma | x_i) \\
183-
L(\mu, \sigma | x_i) = \prod_{i=1}^{n} \frac{1}{x_i} \phi(\ln x_i)
184+
L(\mu, \sigma | w_i) = \prod_{i=1}^{n} f(w_i, \mu, \sigma) \\
184185
\end{aligned}
185186
$$
186187
187-
Taking $\log$ on both sides gives us the _log likelihood function_ which is:
188+
The likelihood function can be viewed as both
189+
190+
* the joint distribution of the sample (which is assumed to be IID) and
191+
* the "likelihood" of parameters $(\mu, \sigma)$ given the data.
192+
193+
Taking logs on both sides gives us the log likelihood function, which is
194+
188195
$$
189196
\begin{aligned}
190-
l(\mu, \sigma | x_i) = -\sum_{i=1}^{n} \ln x_i + \sum_{i=1}^n \phi(\ln x_i) \\
191-
l(\mu, \sigma | x_i) = -\sum_{i=1}^{n} \ln x_i - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
192-
\sum_{i=1}^n (\ln x_i - \mu)^2
197+
\ell(\mu, \sigma | w_i)
198+
& = \ln \left[ \prod_{i=1}^{n} f(w_i, \mu, \sigma) \right] \\
199+
& = -\sum_{i=1}^{n} \ln w_i
200+
- \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
201+
\sum_{i=1}^n (\ln w_i - \mu)^2
193202
\end{aligned}
194203
$$
195204
196205
To find where this function is maximised we find its partial derivatives wrt $\mu$ and $\sigma ^2$ and equate them to $0$.
197206
198-
Let's first find the MLE of $\mu$,
207+
Let's first find the maximum likelihood estimate (MLE) of $\mu$
208+
199209
$$
200210
\begin{aligned}
201-
\frac{\delta l}{\delta \mu} = - \frac{1}{2\sigma^2} \times 2 \sum_{i=1}^n (\ln x_i - \mu) = 0 \\
202-
\Rightarrow \sum_{i=1}^n \ln x_i - n \mu = 0 \\
203-
\Rightarrow \hat{\mu} = \frac{\sum_{i=1}^n \ln x_i}{n}
211+
\frac{\delta \ell}{\delta \mu}
212+
= - \frac{1}{2\sigma^2} \times 2 \sum_{i=1}^n (\ln w_i - \mu) = 0 \\
213+
\implies \sum_{i=1}^n \ln w_i - n \mu = 0 \\
214+
\implies \hat{\mu} = \frac{\sum_{i=1}^n \ln w_i}{n}
204215
\end{aligned}
205216
$$
206217
207-
Now let's find the MLE of $\sigma$,
218+
Now let's find the MLE of $\sigma$
219+
208220
$$
209221
\begin{aligned}
210-
\frac{\delta l}{\delta \sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \mu)^2 = 0 \\
211-
\Rightarrow \frac{n}{2\sigma^2} = \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \mu)^2 \\
212-
\Rightarrow \hat{\sigma} = \left( \frac{\sum_{i=1}^{n}(\ln x_i - \hat{\mu})^2}{n} \right)^{1/2}
222+
\frac{\delta \ell}{\delta \sigma^2}
223+
= - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}
224+
\sum_{i=1}^n (\ln w_i - \mu)^2 = 0 \\
225+
\implies \frac{n}{2\sigma^2} =
226+
\frac{1}{2\sigma^4} \sum_{i=1}^n (\ln w_i - \mu)^2 \\
227+
\implies \hat{\sigma} =
228+
\left( \frac{\sum_{i=1}^{n}(\ln w_i - \hat{\mu})^2}{n} \right)^{1/2}
213229
\end{aligned}
214230
$$
215231
@@ -242,7 +258,7 @@ ax.legend()
242258
plt.show()
243259
```
244260
245-
Our estimated lognormal distribution appears to be a decent fit for the overall data.
261+
Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
246262
247263
We now use {eq}`eq:est_rev` to calculate total revenue.
248264
@@ -331,8 +347,6 @@ ax.legend()
331347
plt.show()
332348
```
333349
334-
+++ {"user_expressions": []}
335-
336350
We observe that in this case the fit for the Pareto distribution is not very
337351
good, so we can probably reject it.
338352
@@ -417,8 +431,6 @@ ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
417431
plt.show()
418432
```
419433
420-
+++ {"user_expressions": []}
421-
422434
The Pareto distribution is a better fit for the right hand tail of our dataset.
423435
424436
### So what is the best distribution?
@@ -432,7 +444,8 @@ One test is to plot the data against the fitted distribution, as we did.
432444
433445
There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
434446
435-
We omit the details.
447+
We omit such advanced topics (but encourage readers to study them once
448+
they have completed these lectures).
436449
437450
## Exercises
438451
@@ -469,8 +482,6 @@ tr_expo = total_revenue(dist_exp)
469482
tr_expo
470483
```
471484
472-
+++ {"user_expressions": []}
473-
474485
```{solution-end}
475486
```
476487
@@ -498,7 +509,6 @@ ax.legend()
498509
plt.show()
499510
```
500511
501-
+++ {"user_expressions": []}
502512
503513
Clearly, this distribution is not a good fit for our data.
504514

0 commit comments

Comments
 (0)