Skip to content

Commit 1c03be8

Browse files
2 parents 82f573c + 47a2a3d commit 1c03be8

File tree

1 file changed

+52
-49
lines changed

1 file changed

+52
-49
lines changed

lectures/mle.md

Lines changed: 52 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ from math import exp, log
2424

2525
## Introduction
2626

27-
Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax
28-
will raise.
27+
Consider a situation where a policymaker is trying to estimate how much revenue
28+
a proposed wealth tax will raise.
2929

3030
The proposed tax is
3131

@@ -39,12 +39,15 @@ $$
3939

4040
where $w$ is wealth.
4141

42-
For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means a
43-
5% tax on wealth up to 2.5 and 10% tax on wealth in excess of 2.5.
4442

45-
(The units will be 100,000 so 2.5 means 250,000 dollars.)
43+
For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means
4644

47-
Here we define $h$
45+
* a 5% tax on wealth up to 2.5 and
46+
* a 10% tax on wealth in excess of 2.5.
47+
48+
The unit is 100,000, so $w= 2.5$ means 250,000 dollars.
49+
50+
Let's go ahead and define $h$:
4851

4952
```{code-cell} ipython3
5053
def h(w, a=0.05, b=0.1, w_bar=2.5):
@@ -54,18 +57,21 @@ def h(w, a=0.05, b=0.1, w_bar=2.5):
5457
return a * w_bar + b * (w - w_bar)
5558
```
5659

57-
For a population of size $N$, where individual $i$ has wealth $w_i$, the total revenue will be given by
60+
For a population of size $N$, where individual $i$ has wealth $w_i$, total revenue raised by
61+
the tax will be
5862

5963
$$
6064
T = \sum_{i=1}^{N} h(w_i)
6165
$$
6266

63-
However, in most countries wealth is not observed for all individuals.
67+
We wish to calculate this quantity.
68+
69+
The problem we face is that, in most countries, wealth is not observed for all individuals.
6470

6571
Collecting and maintaining accurate wealth data for all individuals or households in a country
6672
is just too hard.
6773

68-
So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ individuals.
74+
So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ randomly selected individuals.
6975

7076
For our exercise we are going to use a sample of $n = 10,000$ observations from wealth data in the US in 2016.
7177

@@ -140,15 +146,15 @@ Maximum likelihood estimation has two steps:
140146
2. Estimate the parameter values (e.g., estimate $\mu$ and $\sigma$ for the
141147
normal distribution)
142148
143-
One reasonable assumption for the wealth is that each
144-
that $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
149+
One possible assumption for the wealth is that each
150+
$w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
145151
with parameters $\mu \in (-\infty,\infty)$ and $\sigma \in (0,\infty)$.
146152
147-
This means that $\ln w_i$ is normally distributed with mean $\mu$ and
148-
standard deviation $\sigma$.
153+
(This means that $\ln w_i$ is normally distributed with mean $\mu$ and standard deviation $\sigma$.)
149154
150-
You can see that this is a reasonable assumption because if we histogram log wealth
151-
instead of wealth the picture starts to look something like a bell-shaped curve.
155+
You can see that this assumption is not completely unreasonable because, if we
156+
histogram log wealth instead of wealth, the picture starts to look something
157+
like a bell-shaped curve.
152158
153159
```{code-cell} ipython3
154160
ln_sample = np.log(sample)
@@ -157,8 +163,6 @@ ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
157163
plt.show()
158164
```
159165
160-
+++ {"user_expressions": []}
161-
162166
Now our job is to obtain the maximum likelihood estimates of $\mu$ and $\sigma$, which
163167
we denote by $\hat{\mu}$ and $\hat{\sigma}$.
164168
@@ -168,55 +172,60 @@ data.
168172
The pdf of a lognormally distributed random variable $X$ is given by:
169173
170174
$$
171-
f(x) = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}} exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
175+
f(x, \mu, \sigma)
176+
= \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}}
177+
\exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
172178
$$
173179
174-
Since $\ln X$ is normally distributed this is the same as
175-
176-
$$
177-
f(x) = \frac{1}{x} \phi(x)
178-
$$
179-
180-
where $\phi$ is the pdf of $\ln X$ which is normally distibuted with mean $\mu$ and variance $\sigma ^2$.
181-
182-
For a sample $x = (x_1, x_2, \cdots, x_n)$ the _likelihood function_ is given by:
180+
For our sample $w_1, w_2, \cdots, w_n$, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by
183181
184182
$$
185183
\begin{aligned}
186-
L(\mu, \sigma | x_i) &= \prod_{i=1}^{n} f(\mu, \sigma | x_i) \\
187-
&= \prod_{i=1}^{n} \frac{1}{x_i} \phi(\ln x_i)
184+
L(\mu, \sigma | w_i) = \prod_{i=1}^{n} f(w_i, \mu, \sigma) \\
188185
\end{aligned}
189186
$$
190187
188+
The likelihood function can be viewed as both
189+
190+
* the joint distribution of the sample (which is assumed to be IID) and
191+
* the "likelihood" of parameters $(\mu, \sigma)$ given the data.
191192
192-
Taking $\log$ on both sides gives us the _log likelihood function_ which is:
193+
Taking logs on both sides gives us the log likelihood function, which is
193194
194195
$$
195196
\begin{aligned}
196-
\ell(\mu, \sigma | x_i) &= -\sum_{i=1}^{n} \ln x_i + \sum_{i=1}^n \phi(\ln x_i) \\
197-
&= -\sum_{i=1}^{n} \ln x_i - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln x_i - \mu)^2
197+
\ell(\mu, \sigma | w_i)
198+
& = \ln \left[ \prod_{i=1}^{n} f(w_i, \mu, \sigma) \right] \\
199+
& = -\sum_{i=1}^{n} \ln w_i
200+
- \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
201+
\sum_{i=1}^n (\ln w_i - \mu)^2
198202
\end{aligned}
199203
$$
200204
201205
To find where this function is maximised we find its partial derivatives wrt $\mu$ and $\sigma ^2$ and equate them to $0$.
202206
203-
Let's first find the MLE of $\mu$,
207+
Let's first find the maximum likelihood estimate (MLE) of $\mu$
204208
205209
$$
206210
\begin{aligned}
207-
\frac{\delta l}{\delta \mu} = - \frac{1}{2\sigma^2} \times 2 \sum_{i=1}^n (\ln x_i - \mu) &= 0 \\
208-
\sum_{i=1}^n (\ln x_i - n \mu) &= 0 \\
209-
\hat{\mu} &= \frac{\sum_{i=1}^n \ln x_i}{n}
211+
\frac{\delta \ell}{\delta \mu}
212+
= - \frac{1}{2\sigma^2} \times 2 \sum_{i=1}^n (\ln w_i - \mu) = 0 \\
213+
\implies \sum_{i=1}^n \ln w_i - n \mu = 0 \\
214+
\implies \hat{\mu} = \frac{\sum_{i=1}^n \ln w_i}{n}
210215
\end{aligned}
211216
$$
212217
213-
Now let's find the MLE of $\sigma$,
218+
Now let's find the MLE of $\sigma$
214219
215220
$$
216221
\begin{aligned}
217-
\frac{\delta l}{\delta \sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \mu)^2 &= 0 \\
218-
\frac{n}{2\sigma^2} &= \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \mu)^2 \\
219-
\hat{\sigma} &= \left( \frac{\sum_{i=1}^{n}(\ln x_i - \hat{\mu})^2}{n} \right)^{1/2}
222+
\frac{\delta \ell}{\delta \sigma^2}
223+
= - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}
224+
\sum_{i=1}^n (\ln w_i - \mu)^2 = 0 \\
225+
\implies \frac{n}{2\sigma^2} =
226+
\frac{1}{2\sigma^4} \sum_{i=1}^n (\ln w_i - \mu)^2 \\
227+
\implies \hat{\sigma} =
228+
\left( \frac{\sum_{i=1}^{n}(\ln w_i - \hat{\mu})^2}{n} \right)^{1/2}
220229
\end{aligned}
221230
$$
222231
@@ -249,7 +258,7 @@ ax.legend()
249258
plt.show()
250259
```
251260
252-
Our estimated lognormal distribution appears to be a decent fit for the overall data.
261+
Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
253262
254263
We now use {eq}`eq:est_rev` to calculate total revenue.
255264
@@ -337,8 +346,6 @@ ax.legend()
337346
plt.show()
338347
```
339348
340-
+++ {"user_expressions": []}
341-
342349
We observe that in this case the fit for the Pareto distribution is not very
343350
good, so we can probably reject it.
344351
@@ -423,8 +430,6 @@ ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
423430
plt.show()
424431
```
425432
426-
+++ {"user_expressions": []}
427-
428433
The Pareto distribution is a better fit for the right hand tail of our dataset.
429434
430435
### So what is the best distribution?
@@ -438,7 +443,8 @@ One test is to plot the data against the fitted distribution, as we did.
438443
439444
There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
440445
441-
We omit the details.
446+
We omit such advanced topics (but encourage readers to study them once
447+
they have completed these lectures).
442448
443449
## Exercises
444450
@@ -475,8 +481,6 @@ tr_expo = total_revenue(dist_exp)
475481
tr_expo
476482
```
477483
478-
+++ {"user_expressions": []}
479-
480484
```{solution-end}
481485
```
482486
@@ -504,7 +508,6 @@ ax.legend()
504508
plt.show()
505509
```
506510
507-
+++ {"user_expressions": []}
508511
509512
Clearly, this distribution is not a good fit for our data.
510513

0 commit comments

Comments
 (0)