@@ -24,8 +24,8 @@ from math import exp, log
24
24
25
25
## Introduction
26
26
27
- Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax
28
- will raise.
27
+ Consider a situation where a policymaker is trying to estimate how much revenue
28
+ a proposed wealth tax will raise.
29
29
30
30
The proposed tax is
31
31
39
39
40
40
where $w$ is wealth.
41
41
42
- For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means a
43
- 5% tax on wealth up to 2.5 and 10% tax on wealth in excess of 2.5.
44
42
45
- (The units will be 100,000 so 2.5 means 250,000 dollars.)
43
+ For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means
46
44
47
- Here we define $h$
45
+ * a 5% tax on wealth up to 2.5 and
46
+ * a 10% tax on wealth in excess of 2.5.
47
+
48
+ The unit is 100,000, so $w= 2.5$ means 250,000 dollars.
49
+
50
+ Let's go ahead and define $h$:
48
51
49
52
``` {code-cell} ipython3
50
53
def h(w, a=0.05, b=0.1, w_bar=2.5):
@@ -54,18 +57,21 @@ def h(w, a=0.05, b=0.1, w_bar=2.5):
54
57
return a * w_bar + b * (w - w_bar)
55
58
```
56
59
57
- For a population of size $N$, where individual $i$ has wealth $w_i$, the total revenue will be given by
60
+ For a population of size $N$, where individual $i$ has wealth $w_i$, total revenue raised by
61
+ the tax will be
58
62
59
63
$$
60
64
T = \sum_{i=1}^{N} h(w_i)
61
65
$$
62
66
63
- However, in most countries wealth is not observed for all individuals.
67
+ We wish to calculate this quantity.
68
+
69
+ The problem we face is that, in most countries, wealth is not observed for all individuals.
64
70
65
71
Collecting and maintaining accurate wealth data for all individuals or households in a country
66
72
is just too hard.
67
73
68
- So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ individuals.
74
+ So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ randomly selected individuals.
69
75
70
76
For our exercise we are going to use a sample of $n = 10,000$ observations from wealth data in the US in 2016.
71
77
@@ -140,15 +146,15 @@ Maximum likelihood estimation has two steps:
140
146
2. Estimate the parameter values (e.g., estimate $\mu$ and $\sigma$ for the
141
147
normal distribution)
142
148
143
- One reasonable assumption for the wealth is that each
144
- that $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
149
+ One possible assumption for the wealth is that each
150
+ $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
145
151
with parameters $\mu \in (-\infty,\infty)$ and $\sigma \in (0,\infty)$.
146
152
147
- This means that $\ln w_i$ is normally distributed with mean $\mu$ and
148
- standard deviation $\sigma$.
153
+ (This means that $\ln w_i$ is normally distributed with mean $\mu$ and standard deviation $\sigma$.)
149
154
150
- You can see that this is a reasonable assumption because if we histogram log wealth
151
- instead of wealth the picture starts to look something like a bell-shaped curve.
155
+ You can see that this assumption is not completely unreasonable because, if we
156
+ histogram log wealth instead of wealth, the picture starts to look something
157
+ like a bell-shaped curve.
152
158
153
159
```{code-cell} ipython3
154
160
ln_sample = np.log(sample)
@@ -157,59 +163,69 @@ ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
157
163
plt.show()
158
164
```
159
165
160
- +++ {"user_expressions": []}
161
-
162
166
Now our job is to obtain the maximum likelihood estimates of $\mu$ and $\sigma$, which
163
167
we denote by $\hat{\mu}$ and $\hat{\sigma}$.
164
168
165
169
These estimates can be found by maximizing the likelihood function given the
166
170
data.
167
171
168
172
The pdf of a lognormally distributed random variable $X$ is given by:
169
- $$
170
- f(x) = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}} exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
171
- $$
172
173
173
- Since $\ln X$ is normally distributed this is the same as
174
174
$$
175
- f(x) = \frac{1}{x} \phi(x)
175
+ f(x, \mu, \sigma)
176
+ = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}}
177
+ \exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
176
178
$$
177
- where $\phi$ is the pdf of $\ln X$ which is normally distibuted with mean $\mu$ and variance $\sigma ^2$.
178
179
179
- For a sample $x = (x_1, x_2, \cdots, x_n)$ the _likelihood function_ is given by:
180
+ For our sample $w_1, w_2, \cdots, w_n$, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by
181
+
180
182
$$
181
183
\begin{aligned}
182
- L(\mu, \sigma | x_i) = \prod_ {i=1}^{n} f(\mu, \sigma | x_i) \\
183
- L(\mu, \sigma | x_i) = \prod_ {i=1}^{n} \frac{1}{x_i} \phi(\ln x_i)
184
+ L(\mu, \sigma | w_i) = \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \\
184
185
\end{aligned}
185
186
$$
186
187
187
- Taking $\log$ on both sides gives us the _log likelihood function_ which is:
188
+ The likelihood function can be viewed as both
189
+
190
+ * the joint distribution of the sample (which is assumed to be IID) and
191
+ * the "likelihood" of parameters $(\mu, \sigma)$ given the data.
192
+
193
+ Taking logs on both sides gives us the log likelihood function, which is
194
+
188
195
$$
189
196
\begin{aligned}
190
- l(\mu, \sigma | x_i) = -\sum_ {i=1}^{n} \ln x_i + \sum_ {i=1}^n \phi(\ln x_i) \\
191
- l(\mu, \sigma | x_i) = -\sum_ {i=1}^{n} \ln x_i - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
192
- \sum_ {i=1}^n (\ln x_i - \mu)^2
197
+ \ell(\mu, \sigma | w_i)
198
+ & = \ln \left[ \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \right] \\
199
+ & = -\sum_ {i=1}^{n} \ln w_i
200
+ - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
201
+ \sum_ {i=1}^n (\ln w_i - \mu)^2
193
202
\end{aligned}
194
203
$$
195
204
196
205
To find where this function is maximised we find its partial derivatives wrt $\mu$ and $\sigma ^2$ and equate them to $0$.
197
206
198
- Let's first find the MLE of $\mu$,
207
+ Let's first find the maximum likelihood estimate (MLE) of $\mu$
208
+
199
209
$$
200
210
\begin{aligned}
201
- \frac{\delta l}{\delta \mu} = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln x_i - \mu) = 0 \\
202
- \Rightarrow \sum_ {i=1}^n \ln x_i - n \mu = 0 \\
203
- \Rightarrow \hat{\mu} = \frac{\sum_ {i=1}^n \ln x_i}{n}
211
+ \frac{\delta \ell}{\delta \mu}
212
+ = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln w_i - \mu) = 0 \\
213
+ \implies \sum_ {i=1}^n \ln w_i - n \mu = 0 \\
214
+ \implies \hat{\mu} = \frac{\sum_ {i=1}^n \ln w_i}{n}
204
215
\end{aligned}
205
216
$$
206
217
207
- Now let's find the MLE of $\sigma$,
218
+ Now let's find the MLE of $\sigma$
219
+
208
220
$$
209
221
\begin{aligned}
210
- \frac{\delta l}{\delta \sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 = 0 \\
211
- \Rightarrow \frac{n}{2\sigma^2} = \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 \\
212
- \Rightarrow \hat{\sigma} = \left( \frac{\sum_ {i=1}^{n}(\ln x_i - \hat{\mu})^2}{n} \right)^{1/2}
222
+ \frac{\delta \ell}{\delta \sigma^2}
223
+ = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}
224
+ \sum_ {i=1}^n (\ln w_i - \mu)^2 = 0 \\
225
+ \implies \frac{n}{2\sigma^2} =
226
+ \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln w_i - \mu)^2 \\
227
+ \implies \hat{\sigma} =
228
+ \left( \frac{\sum_ {i=1}^{n}(\ln w_i - \hat{\mu})^2}{n} \right)^{1/2}
213
229
\end{aligned}
214
230
$$
215
231
@@ -242,7 +258,7 @@ ax.legend()
242
258
plt.show()
243
259
```
244
260
245
- Our estimated lognormal distribution appears to be a decent fit for the overall data.
261
+ Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
246
262
247
263
We now use {eq}`eq:est_rev` to calculate total revenue.
248
264
@@ -331,8 +347,6 @@ ax.legend()
331
347
plt.show()
332
348
```
333
349
334
- +++ {"user_expressions": []}
335
-
336
350
We observe that in this case the fit for the Pareto distribution is not very
337
351
good, so we can probably reject it.
338
352
@@ -417,8 +431,6 @@ ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
417
431
plt.show()
418
432
```
419
433
420
- +++ {"user_expressions": []}
421
-
422
434
The Pareto distribution is a better fit for the right hand tail of our dataset.
423
435
424
436
### So what is the best distribution?
@@ -432,7 +444,8 @@ One test is to plot the data against the fitted distribution, as we did.
432
444
433
445
There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
434
446
435
- We omit the details.
447
+ We omit such advanced topics (but encourage readers to study them once
448
+ they have completed these lectures).
436
449
437
450
## Exercises
438
451
@@ -469,8 +482,6 @@ tr_expo = total_revenue(dist_exp)
469
482
tr_expo
470
483
```
471
484
472
- +++ {"user_expressions": []}
473
-
474
485
```{solution-end}
475
486
```
476
487
@@ -498,7 +509,6 @@ ax.legend()
498
509
plt.show()
499
510
```
500
511
501
- +++ {"user_expressions": []}
502
512
503
513
Clearly, this distribution is not a good fit for our data.
504
514
0 commit comments