Skip to content

Commit 2c96d34

Browse files
committed
@mmcky edits
1 parent c251463 commit 2c96d34

File tree

2 files changed

+186
-25
lines changed

2 files changed

+186
-25
lines changed
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
year,n_wealth,t_income,l_income
2+
1950,0.8257332034366347,0.44248654139458754,0.5342948198773423
3+
1953,0.8059487586599338,0.42645440609359486,0.5158978980963697
4+
1956,0.8121790488050631,0.4442694287339931,0.5349293526208138
5+
1959,0.7952068741637922,0.43749348077061523,0.5213985948309406
6+
1962,0.808694507657936,0.4435843103853638,0.5345127915054347
7+
1965,0.7904149225687935,0.43763715466663455,0.7487860020887751
8+
1968,0.7982885066993515,0.4208620794438898,0.5242396427381535
9+
1971,0.7911574835420261,0.42333442460902587,0.5576454812313479
10+
1977,0.7571418922185222,0.46187678800902643,0.5704448110072055
11+
1983,0.7494335400643035,0.43934561846447007,0.5662220844385907
12+
1989,0.7715705301674298,0.5115249581654171,0.6013995687471435
13+
1992,0.75081266140553,0.47406506720767927,0.5983592657979551
14+
1995,0.756949238811024,0.4896552355840093,0.5969779516716919
15+
1998,0.7603291991801191,0.49117441585168625,0.5774462841723321
16+
2001,0.7816118750507045,0.5239092994681127,0.6042739644967284
17+
2004,0.7700355469522353,0.48843503839032515,0.598143220179272
18+
2007,0.782141377648697,0.5197156312086194,0.6263452195753192
19+
2010,0.825082529519343,0.5195972120145608,0.6453653328291911
20+
2013,0.8227698931835298,0.5314001749843348,0.649868291777264
21+
2016,0.8342975903562216,0.5541400068900835,0.6706846793375284

lectures/inequality.md

Lines changed: 165 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,7 @@ In this lecture we discuss standard measures of inequality used in economic rese
6161

6262
For each of these measures, we will look at both simulated and real data.
6363

64-
We need to install the `quantecon` package.
65-
66-
```{code-cell} ipython3
67-
!pip install quantecon
68-
```
64+
+++
6965

7066
We will also use the following imports.
7167

@@ -74,7 +70,7 @@ import pandas as pd
7470
import numpy as np
7571
import matplotlib.pyplot as plt
7672
import random as rd
77-
import quantecon as qe
73+
import wbgapi as wb
7874
```
7975

8076
## The Lorenz curve
@@ -92,7 +88,7 @@ We suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to l
9288

9389
To aid our interpretation, suppose that we are measuring wealth
9490

95-
* $w_1$ is the wealth of the poorest member of the population and
91+
* $w_1$ is the wealth of the poorest member of the population, and
9692
* $w_n$ is the wealth of the richest member of the population.
9793

9894
The curve $L$ is just a function $y = L(x)$ that we can plot and interpret.
@@ -187,7 +183,7 @@ distribution and treat these draws as our population.
187183

188184
The straight 45-degree line ($x=L(x)$ for all $x$) corresponds to perfect equality.
189185

190-
The lognormal draws produce a less equal distribution.
186+
The log-normal draws produce a less equal distribution.
191187

192188
For example, if we imagine these draws as being observations of wealth across
193189
a sample of households, then the dashed lines show that the bottom 80\% of
@@ -223,6 +219,8 @@ plt.show()
223219

224220
Next let's look at the real data, focusing on income and wealth in the US in 2016.
225221

222+
(data:survey-consumer-finance)=
223+
226224
The following code block imports a subset of the dataset `SCF_plus`,
227225
which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF).
228226

@@ -333,9 +331,8 @@ The Gini coefficient is defined for the sample above as
333331

334332
$$
335333
G :=
336-
\frac
337-
{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|}
338-
{2n\sum_{i=1}^n w_i}.
334+
\frac{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|}
335+
{2n\sum_{i=1}^n w_i}.
339336
$$ (eq:gini)
340337
341338
@@ -439,7 +436,7 @@ ginis = []
439436
for σ in σ_vals:
440437
μ = -σ**2 / 2
441438
y = np.exp(μ + σ * np.random.randn(n))
442-
ginis.append(qe.gini_coefficient(y))
439+
ginis.append(gini_coefficient(y))
443440
```
444441
445442
```{code-cell} ipython3
@@ -474,14 +471,131 @@ coefficient.
474471
475472
### Gini coefficient dynamics for US data
476473
477-
Now let's look at Gini coefficients for US data derived from the SCF.
474+
Now let's look at Gini coefficients for US data.
478475
479-
The following code creates a list called `ginis`.
476+
In this section we will get Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data).
480477
481-
It stores data of Gini coefficients generated from the dataframe `df_income_wealth` and method [gini_coefficient](https://quanteconpy.readthedocs.io/en/latest/tools/inequality.html#quantecon.inequality.gini_coefficient), from [QuantEcon](https://quantecon.org/quantecon-py/) library.
478+
Let's search the world bank data for gini to find the Series ID.
482479
483480
```{code-cell} ipython3
484-
:tags: [hide-input]
481+
wb.search("gini")
482+
```
483+
484+
We now know the series ID is `SI.POV.GINI`.
485+
486+
```{tip}
487+
Another, and often useful way to find series ID, is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data.
488+
```
489+
490+
Let us fetch the data for the USA.
491+
492+
```{code-cell} ipython3
493+
data = wb.data.DataFrame("SI.POV.GINI", "USA")
494+
```
495+
496+
```{code-cell} ipython3
497+
data
498+
```
499+
500+
```{note}
501+
This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting
502+
```
503+
504+
```{code-cell} ipython3
505+
data = data.T
506+
data_usa = data['USA']
507+
```
508+
509+
```{code-cell} ipython3
510+
fig, ax = plt.subplots()
511+
ax = data_usa.plot(ax=ax)
512+
ax.set_ylim(0,data_usa.max()+5)
513+
plt.show()
514+
```
515+
516+
The gini coefficient does not have significant variation in the full range from 0 to 100.
517+
518+
In fact we can take a quick look across all countries and all years in the world bank dataset to observe this.
519+
520+
```{code-cell} ipython3
521+
gini_all = wb.data.DataFrame("SI.POV.GINI")
522+
```
523+
524+
```{code-cell} ipython3
525+
# Create a long series with a multi-index of the data to get global min and max values
526+
gini_all = gini_all.unstack(level='economy').dropna()
527+
```
528+
529+
```{code-cell} ipython3
530+
gini_all.plot(kind="hist", title="Gini coefficient");
531+
```
532+
533+
Therefore we can see that across 50 years of data and all countries the measure only varies between 20 and 65.
534+
535+
This variation would be even smaller for the subset of wealthy countries, so let us zoom in a little on the US data and add some trendlines.
536+
537+
```{code-cell} ipython3
538+
data_usa.index = data_usa.index.map(lambda x: int(x.replace('YR','')))
539+
```
540+
541+
```{code-cell} ipython3
542+
data_usa
543+
```
544+
545+
The data suggests there is a change in trend around the year 1981
546+
547+
```{code-cell} ipython3
548+
pre_1981 = data_usa[data_usa.index <= 1981]
549+
post_1981 = data_usa[data_usa.index > 1981]
550+
```
551+
552+
```{code-cell} ipython3
553+
# Pre 1981 Data Trend
554+
x1 = pre_1981.dropna().index.values
555+
y1 = pre_1981.dropna().values
556+
a1, b1 = np.polyfit(x1, y1, 1)
557+
558+
# Post 1981 Data Trend
559+
x2 = post_1981.dropna().index.values
560+
y2 = post_1981.dropna().values
561+
a2, b2 = np.polyfit(x2, y2, 1)
562+
```
563+
564+
```{code-cell} ipython3
565+
x = data_usa.dropna().index.values
566+
y = data_usa.dropna().values
567+
plt.scatter(x,y)
568+
plt.plot(x1, a1*x1+b1, 'r-')
569+
plt.plot(x2, a2*x2+b2, 'y-')
570+
plt.title("USA gini coefficient dynamics")
571+
plt.legend(['Gini coefficient', 'Trend (before 1981)', 'Trend (after 1981)'])
572+
plt.ylim(25,45)
573+
plt.ylabel("Gini coefficient")
574+
plt.xlabel("Year")
575+
plt.show()
576+
```
577+
578+
Looking at this graph you can see that inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time (growing inequality).
579+
580+
```{admonition} TODO
581+
:class: warning
582+
Why did GINI fall in 2020? I would have thought it accelerate in the other direction or was there a lag in investment returns around COVID
583+
```
584+
585+
+++
586+
587+
## Comparing income and wealth inequality (the US case)
588+
589+
+++
590+
591+
We can use the data collected above {ref}`survey of consumer finances <data:survey-consumer-finance>` to look at the gini coefficient when using income when compared to wealth data.
592+
593+
Let's compute the gin coefficient for net wealth, total income, and labour income.
594+
595+
This section makes use of the following code to compute the data, however to speed up execution we have pre-compiled the results and will use that in the subsequent analysis.
596+
597+
```{code-cell} ipython3
598+
import quantecon as qe
485599
486600
varlist = ['n_wealth', # net wealth
487601
't_income', # total income
@@ -508,20 +622,36 @@ for var in varlist:
508622
gini_yr.append(gini)
509623
510624
results[var] = gini_yr
625+
626+
# Convert to DataFrame
627+
results = pd.DataFrame(results, index=years)
628+
results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year')
629+
```
630+
631+
```{code-cell} ipython3
632+
ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year')
511633
```
512634
513635
```{code-cell} ipython3
514-
ginis_nw = results['n_wealth'] # net wealth
515-
ginis_ti = results['t_income'] # total income
516-
ginis_li = results['l_income'] # labour income
636+
ginis
517637
```
518638
519639
Let's plot the Gini coefficients for net wealth, labor income and total income.
520640
641+
Looking at each data series we see an outlier in gini coefficient computed for 1965.
642+
643+
We will smooth our data and take an average of the data either side of it for the time being.
644+
645+
```{admonition} TODO
646+
Figure out why there is such a spike in the data for this year
647+
```
648+
521649
```{code-cell} ipython3
522-
# use an average to replace an outlier in labor income gini
523-
ginis_li_new = ginis_li
524-
ginis_li_new[5] = (ginis_li[4] + ginis_li[6]) / 2
650+
ginis["l_income"][1965] = (ginis["l_income"][1962] + ginis["l_income"][1968]) / 2
651+
```
652+
653+
```{code-cell} ipython3
654+
ginis["l_income"].plot()
525655
```
526656
527657
```{code-cell} ipython3
@@ -534,7 +664,7 @@ mystnb:
534664
alt: gini_wealth_us
535665
---
536666
fig, ax = plt.subplots()
537-
ax.plot(years, ginis_nw, marker='o')
667+
ax.plot(years, ginis["n_wealth"], marker='o')
538668
ax.set_xlabel("year")
539669
ax.set_ylabel("gini coefficient")
540670
plt.show()
@@ -550,8 +680,18 @@ mystnb:
550680
alt: gini_income_us
551681
---
552682
fig, ax = plt.subplots()
553-
ax.plot(years, ginis_li_new, marker='o', label="labor income")
554-
ax.plot(years, ginis_ti, marker='o', label="total income")
683+
ax.plot(years, ginis["l_income"], marker='o', label="labor income")
684+
ax.plot(years, ginis["t_income"], marker='o', label="total income")
685+
ax.set_xlabel("year")
686+
ax.set_ylabel("gini coefficient")
687+
ax.legend()
688+
plt.show()
689+
```
690+
691+
```{code-cell} ipython3
692+
fig, ax = plt.subplots()
693+
ax.plot(years, ginis["n_wealth"], marker='o', label="net wealth")
694+
ax.plot(years, ginis["l_income"], marker='o', label="labour income")
555695
ax.set_xlabel("year")
556696
ax.set_ylabel("gini coefficient")
557697
ax.legend()

0 commit comments

Comments
 (0)