@@ -24,8 +24,8 @@ from math import exp, log
24
24
25
25
## Introduction
26
26
27
- Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax
28
- will raise.
27
+ Consider a situation where a policymaker is trying to estimate how much revenue
28
+ a proposed wealth tax will raise.
29
29
30
30
The proposed tax is
31
31
39
39
40
40
where $w$ is wealth.
41
41
42
- For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means a
43
- 5% tax on wealth up to 2.5 and 10% tax on wealth in excess of 2.5.
44
42
45
- (The units will be 100,000 so 2.5 means 250,000 dollars.)
43
+ For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means
46
44
47
- Here we define $h$
45
+ * a 5% tax on wealth up to 2.5 and
46
+ * a 10% tax on wealth in excess of 2.5.
47
+
48
+ The unit is 100,000, so $w= 2.5$ means 250,000 dollars.
49
+
50
+ Let's go ahead and define $h$:
48
51
49
52
``` {code-cell} ipython3
50
53
def h(w, a=0.05, b=0.1, w_bar=2.5):
@@ -54,18 +57,21 @@ def h(w, a=0.05, b=0.1, w_bar=2.5):
54
57
return a * w_bar + b * (w - w_bar)
55
58
```
56
59
57
- For a population of size $N$, where individual $i$ has wealth $w_i$, the total revenue will be given by
60
+ For a population of size $N$, where individual $i$ has wealth $w_i$, total revenue raised by
61
+ the tax will be
58
62
59
63
$$
60
64
T = \sum_{i=1}^{N} h(w_i)
61
65
$$
62
66
63
- However, in most countries wealth is not observed for all individuals.
67
+ We wish to calculate this quantity.
68
+
69
+ The problem we face is that, in most countries, wealth is not observed for all individuals.
64
70
65
71
Collecting and maintaining accurate wealth data for all individuals or households in a country
66
72
is just too hard.
67
73
68
- So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ individuals.
74
+ So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ randomly selected individuals.
69
75
70
76
For our exercise we are going to use a sample of $n = 10,000$ observations from wealth data in the US in 2016.
71
77
@@ -140,15 +146,15 @@ Maximum likelihood estimation has two steps:
140
146
2. Estimate the parameter values (e.g., estimate $\mu$ and $\sigma$ for the
141
147
normal distribution)
142
148
143
- One reasonable assumption for the wealth is that each
144
- that $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
149
+ One possible assumption for the wealth is that each
150
+ $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
145
151
with parameters $\mu \in (-\infty,\infty)$ and $\sigma \in (0,\infty)$.
146
152
147
- This means that $\ln w_i$ is normally distributed with mean $\mu$ and
148
- standard deviation $\sigma$.
153
+ (This means that $\ln w_i$ is normally distributed with mean $\mu$ and standard deviation $\sigma$.)
149
154
150
- You can see that this is a reasonable assumption because if we histogram log wealth
151
- instead of wealth the picture starts to look something like a bell-shaped curve.
155
+ You can see that this assumption is not completely unreasonable because, if we
156
+ histogram log wealth instead of wealth, the picture starts to look something
157
+ like a bell-shaped curve.
152
158
153
159
```{code-cell} ipython3
154
160
ln_sample = np.log(sample)
@@ -157,8 +163,6 @@ ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
157
163
plt.show()
158
164
```
159
165
160
- +++ {"user_expressions": []}
161
-
162
166
Now our job is to obtain the maximum likelihood estimates of $\mu$ and $\sigma$, which
163
167
we denote by $\hat{\mu}$ and $\hat{\sigma}$.
164
168
@@ -168,55 +172,60 @@ data.
168
172
The pdf of a lognormally distributed random variable $X$ is given by:
169
173
170
174
$$
171
- f(x) = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}} exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
175
+ f(x, \mu, \sigma)
176
+ = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}}
177
+ \exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
172
178
$$
173
179
174
- Since $\ln X$ is normally distributed this is the same as
175
-
176
- $$
177
- f(x) = \frac{1}{x} \phi(x)
178
- $$
179
-
180
- where $\phi$ is the pdf of $\ln X$ which is normally distibuted with mean $\mu$ and variance $\sigma ^2$.
181
-
182
- For a sample $x = (x_1, x_2, \cdots, x_n)$ the _likelihood function_ is given by:
180
+ For our sample $w_1, w_2, \cdots, w_n$, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by
183
181
184
182
$$
185
183
\begin{aligned}
186
- L(\mu, \sigma | x_i) &= \prod_ {i=1}^{n} f(\mu, \sigma | x_i) \\
187
- &= \prod_ {i=1}^{n} \frac{1}{x_i} \phi(\ln x_i)
184
+ L(\mu, \sigma | w_i) = \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \\
188
185
\end{aligned}
189
186
$$
190
187
188
+ The likelihood function can be viewed as both
189
+
190
+ * the joint distribution of the sample (which is assumed to be IID) and
191
+ * the "likelihood" of parameters $(\mu, \sigma)$ given the data.
191
192
192
- Taking $\log$ on both sides gives us the _log likelihood function_ which is:
193
+ Taking logs on both sides gives us the log likelihood function, which is
193
194
194
195
$$
195
196
\begin{aligned}
196
- \ell(\mu, \sigma | x_i) &= -\sum_ {i=1}^{n} \ln x_i + \sum_ {i=1}^n \phi(\ln x_i) \\
197
- &= -\sum_ {i=1}^{n} \ln x_i - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2} \sum_ {i=1}^n (\ln x_i - \mu)^2
197
+ \ell(\mu, \sigma | w_i)
198
+ & = \ln \left[ \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \right] \\
199
+ & = -\sum_ {i=1}^{n} \ln w_i
200
+ - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
201
+ \sum_ {i=1}^n (\ln w_i - \mu)^2
198
202
\end{aligned}
199
203
$$
200
204
201
205
To find where this function is maximised we find its partial derivatives wrt $\mu$ and $\sigma ^2$ and equate them to $0$.
202
206
203
- Let's first find the MLE of $\mu$,
207
+ Let's first find the maximum likelihood estimate ( MLE) of $\mu$
204
208
205
209
$$
206
210
\begin{aligned}
207
- \frac{\delta l}{\delta \mu} = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln x_i - \mu) &= 0 \\
208
- \sum_ {i=1}^n (\ln x_i - n \mu) &= 0 \\
209
- \hat{\mu} &= \frac{\sum_ {i=1}^n \ln x_i}{n}
211
+ \frac{\delta \ell}{\delta \mu}
212
+ = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln w_i - \mu) = 0 \\
213
+ \implies \sum_ {i=1}^n \ln w_i - n \mu = 0 \\
214
+ \implies \hat{\mu} = \frac{\sum_ {i=1}^n \ln w_i}{n}
210
215
\end{aligned}
211
216
$$
212
217
213
- Now let's find the MLE of $\sigma$,
218
+ Now let's find the MLE of $\sigma$
214
219
215
220
$$
216
221
\begin{aligned}
217
- \frac{\delta l}{\delta \sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 &= 0 \\
218
- \frac{n}{2\sigma^2} &= \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 \\
219
- \hat{\sigma} &= \left( \frac{\sum_ {i=1}^{n}(\ln x_i - \hat{\mu})^2}{n} \right)^{1/2}
222
+ \frac{\delta \ell}{\delta \sigma^2}
223
+ = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}
224
+ \sum_ {i=1}^n (\ln w_i - \mu)^2 = 0 \\
225
+ \implies \frac{n}{2\sigma^2} =
226
+ \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln w_i - \mu)^2 \\
227
+ \implies \hat{\sigma} =
228
+ \left( \frac{\sum_ {i=1}^{n}(\ln w_i - \hat{\mu})^2}{n} \right)^{1/2}
220
229
\end{aligned}
221
230
$$
222
231
@@ -249,7 +258,7 @@ ax.legend()
249
258
plt.show()
250
259
```
251
260
252
- Our estimated lognormal distribution appears to be a decent fit for the overall data.
261
+ Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
253
262
254
263
We now use {eq}`eq:est_rev` to calculate total revenue.
255
264
@@ -337,8 +346,6 @@ ax.legend()
337
346
plt.show()
338
347
```
339
348
340
- +++ {"user_expressions": []}
341
-
342
349
We observe that in this case the fit for the Pareto distribution is not very
343
350
good, so we can probably reject it.
344
351
@@ -423,8 +430,6 @@ ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
423
430
plt.show()
424
431
```
425
432
426
- +++ {"user_expressions": []}
427
-
428
433
The Pareto distribution is a better fit for the right hand tail of our dataset.
429
434
430
435
### So what is the best distribution?
@@ -438,7 +443,8 @@ One test is to plot the data against the fitted distribution, as we did.
438
443
439
444
There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
440
445
441
- We omit the details.
446
+ We omit such advanced topics (but encourage readers to study them once
447
+ they have completed these lectures).
442
448
443
449
## Exercises
444
450
@@ -475,8 +481,6 @@ tr_expo = total_revenue(dist_exp)
475
481
tr_expo
476
482
```
477
483
478
- +++ {"user_expressions": []}
479
-
480
484
```{solution-end}
481
485
```
482
486
@@ -504,7 +508,6 @@ ax.legend()
504
508
plt.show()
505
509
```
506
510
507
- +++ {"user_expressions": []}
508
511
509
512
Clearly, this distribution is not a good fit for our data.
510
513
0 commit comments