diff --git a/lectures/_config.yml b/lectures/_config.yml index 407cc295..1d5a36e6 100644 --- a/lectures/_config.yml +++ b/lectures/_config.yml @@ -44,7 +44,10 @@ sphinx: nb_render_image_options: width: 80% nb_code_prompt_show: "Show {type}" + suppress_warnings: [mystnb.unknown_mime_type, myst.domains] # ------------- + html_js_files: + - https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js html_favicon: _static/lectures-favicon.ico html_theme: quantecon_book_theme html_static_path: ['_static'] diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv new file mode 100644 index 00000000..bf820364 --- /dev/null +++ b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv @@ -0,0 +1,21 @@ +year,n_wealth,t_income,l_income +1950,0.8257332034366338,0.44248654139458626,0.5342948198773412 +1953,0.8059487586599329,0.4264544060935945,0.5158978980963702 +1956,0.8121790488050616,0.44426942873399283,0.5349293526208142 +1959,0.795206874163792,0.43749348077061573,0.5213985948309416 +1962,0.8086945076579359,0.4435843103853645,0.5345127915054341 +1965,0.7904149225687935,0.43763715466663444,0.7487860020887753 +1968,0.7982885066993497,0.4208620794438902,0.5242396427381545 +1971,0.7911574835420259,0.4233344246090255,0.5576454812313466 +1977,0.7571418922185215,0.46187678800902543,0.5704448110072049 +1983,0.7494335400643013,0.439345618464469,0.5662220844385915 +1989,0.7715705301674302,0.5115249581654197,0.601399568747142 +1992,0.7508126614055308,0.4740650672076798,0.5983592657979563 +1995,0.7569492388110265,0.48965523558400603,0.5969779516716903 +1998,0.7603291991801185,0.49117441585168614,0.5774462841723305 +2001,0.7816118750507056,0.5239092994681135,0.6042739644967272 +2004,0.7700355469522361,0.4884350383903255,0.5981432201792727 +2007,0.7821413776486978,0.5197156312086187,0.626345219575322 +2010,0.8250825295193438,0.5195972120145615,0.6453653328291903 +2013,0.8227698931835303,0.531400174984336,0.6498682917772644 +2016,0.8342975903562234,0.5541400068900825,0.6706846793375284 diff --git a/lectures/inequality.md b/lectures/inequality.md index 0d59aa53..acfded55 100644 --- a/lectures/inequality.md +++ b/lectures/inequality.md @@ -4,7 +4,7 @@ jupytext: extension: .md format_name: myst format_version: 0.13 - jupytext_version: 1.14.5 + jupytext_version: 1.15.1 kernelspec: display_name: Python 3 (ipykernel) language: python @@ -13,7 +13,6 @@ kernelspec: # Income and Wealth Inequality - ## Overview In this section we @@ -26,8 +25,8 @@ In this section we Many historians argue that inequality played a key role in the fall of the Roman Republic. -After defeating Carthage and invading Spain, money flowed into Rome and -greatly enriched those in power. +Following the defeat of Carthage and the invasion of Spain, money flowed into +Rome from across the empire, greatly enriched those in power. Meanwhile, ordinary citizens were taken from their farms to fight for long periods, diminishing their wealth. @@ -41,44 +40,41 @@ with Octavian (Augustus) in 27 BCE. This history is fascinating in its own right, and we can see some parallels with certain countries in the modern world. -Many recent political debates revolve around inequality. - -Many economic policies, from taxation to the welfare state, are -aimed at addressing inequality. +Let's now look at inequality in some of these countries. ### Measurement + +Political debates often revolve around inequality. + One problem with these debates is that inequality is often poorly defined. Moreover, debates on inequality are often tied to political beliefs. -This is dangerous for economists because allowing political beliefs to -shape our findings reduces objectivity. +This is dangerous for economists because allowing political beliefs to shape our findings reduces objectivity. -To bring a truly scientific perspective to the topic of inequality we must -start with careful definitions. +To bring a truly scientific perspective to the topic of inequality we must start with careful definitions. -In this lecture we discuss standard measures of inequality used in economic research. +Hence we begin by discussing ways that inequality can be measured in economic research. -For each of these measures, we will look at both simulated and real data. - -We will install the following libraries. +We will need to install the following packages ```{code-cell} ipython3 :tags: [hide-output] -!pip install quantecon +!pip install wbgapi plotly ``` -And we use the following imports. +We will also use the following imports. ```{code-cell} ipython3 import pandas as pd import numpy as np import matplotlib.pyplot as plt -import quantecon as qe import random as rd +import wbgapi as wb +import plotly.express as px ``` ## The Lorenz curve @@ -92,27 +88,31 @@ In this section we define the Lorenz curve and examine its properties. The Lorenz curve takes a sample $w_1, \ldots, w_n$ and produces a curve $L$. -We suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. +We suppose that the sample has been sorted from smallest to largest. To aid our interpretation, suppose that we are measuring wealth -* $w_1$ is the wealth of the poorest member of the population and +* $w_1$ is the wealth of the poorest member of the population, and * $w_n$ is the wealth of the richest member of the population. The curve $L$ is just a function $y = L(x)$ that we can plot and interpret. To create it we first generate data points $(x_i, y_i)$ according to -\begin{equation*} - x_i = \frac{i}{n}, - \qquad - y_i = \frac{\sum_{j \leq i} w_j}{\sum_{j \leq n} w_j}, - \qquad i = 1, \ldots, n -\end{equation*} +```{prf:definition} +:label: define-lorenz + +$$ +x_i = \frac{i}{n}, +\qquad +y_i = \frac{\sum_{j \leq i} w_j}{\sum_{j \leq n} w_j}, +\qquad i = 1, \ldots, n +$$ +``` Now the Lorenz curve $L$ is formed from these data points using interpolation. -(If we use a line plot in Matplotlib, the interpolation will be done for us.) +If we use a line plot in `matplotlib`, the interpolation will be done for us. The meaning of the statement $y = L(x)$ is that the lowest $(100 \times x)$\% of people have $(100 \times y)$\% of all wealth. @@ -123,18 +123,71 @@ The meaning of the statement $y = L(x)$ is that the lowest $(100 In the discussion above we focused on wealth but the same ideas apply to income, consumption, etc. -+++ ### Lorenz curves of simulated data Let's look at some examples and try to build understanding. +First let us construct a `lorenz_curve` function that we can +use in our simulations below. + +It is useful to construct a function that translates an array of +income or wealth data into the cumulative share +of individuals (or households) and the cumulative share of income (or wealth). + +```{code-cell} ipython3 +def lorenz_curve(y): + """ + Calculates the Lorenz Curve, a graphical representation of + the distribution of income or wealth. + + It returns the cumulative share of people (x-axis) and + the cumulative share of income earned. + + Parameters + ---------- + y : array_like(float or int, ndim=1) + Array of income/wealth for each individual. + Unordered or ordered is fine. + + Returns + ------- + cum_people : array_like(float, ndim=1) + Cumulative share of people for each person index (i/n) + cum_income : array_like(float, ndim=1) + Cumulative share of income for each person index + + + References + ---------- + .. [1] https://en.wikipedia.org/wiki/Lorenz_curve + + Examples + -------- + >>> a_val, n = 3, 10_000 + >>> y = np.random.pareto(a_val, size=n) + >>> f_vals, l_vals = lorenz(y) + + """ + + n = len(y) + y = np.sort(y) + s = np.zeros(n + 1) + s[1:] = np.cumsum(y) + cum_people = np.zeros(n + 1) + cum_income = np.zeros(n + 1) + for i in range(1, n + 1): + cum_people[i] = i / n + cum_income[i] = s[i] / s[n] + return cum_people, cum_income +``` + In the next figure, we generate $n=2000$ draws from a lognormal distribution and treat these draws as our population. -The straight line ($x=L(x)$ for all $x$) corresponds to perfect equality. +The straight 45-degree line ($x=L(x)$ for all $x$) corresponds to perfect equality. -The lognormal draws produce a less equal distribution. +The log-normal draws produce a less equal distribution. For example, if we imagine these draws as being observations of wealth across a sample of households, then the dashed lines show that the bottom 80\% of @@ -144,7 +197,7 @@ households own just over 40\% of total wealth. --- mystnb: figure: - caption: "Lorenz curve of simulated data" + caption: Lorenz curve of simulated data name: lorenz_simulated --- n = 2000 @@ -152,41 +205,39 @@ sample = np.exp(np.random.randn(n)) fig, ax = plt.subplots() -f_vals, l_vals = qe.lorenz_curve(sample) +f_vals, l_vals = lorenz_curve(sample) ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) -ax.legend(fontsize=12) - ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') - -ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) - +ax.set_xlabel("share of households (%)") +ax.set_ylim((0, 1)) +ax.set_ylabel("share of income (%)") +ax.legend() plt.show() ``` ### Lorenz curves for US data -Next let's look at the real data, focusing on income and wealth in the US in -2016. +Next let's look at US data for both income and wealth. -The following code block imports a subset of the dataset ``SCF_plus``, +(data:survey-consumer-finance)= +The following code block imports a subset of the dataset `SCF_plus` for 2016, which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF). ```{code-cell} ipython3 url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv' df = pd.read_csv(url) -df = df.dropna() -df_income_wealth = df +df_income_wealth = df.dropna() ``` ```{code-cell} ipython3 -df_income_wealth.head() +df_income_wealth.head(n=5) ``` -The following code block uses data stored in dataframe ``df_income_wealth`` to generate the Lorenz curves. +The next code block uses data stored in dataframe `df_income_wealth` to generate the Lorenz curves. (The code is somewhat complex because we need to adjust the data according to population weights supplied by the SCF.) @@ -221,7 +272,7 @@ for var in varlist: rd.shuffle(y) # calculate and store Lorenz curve data - f_val, l_val = qe.lorenz_curve(y) + f_val, l_val = lorenz_curve(y) f_vals.append(f_val) l_vals.append(l_val) @@ -235,69 +286,67 @@ l_vals_nw, l_vals_ti, l_vals_li = L_vals Now we plot Lorenz curves for net wealth, total income and labor income in the US in 2016. +Total income is the sum of households' all income sources, including labor income but excluding capital gains. + +(All income measures are pre-tax.) + ```{code-cell} ipython3 --- mystnb: figure: - caption: "2016 US Lorenz curves" + caption: 2016 US Lorenz curves name: lorenz_us image: alt: lorenz_us --- fig, ax = plt.subplots() - ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth') ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income') ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income') ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality') - -ax.legend(fontsize=12) +ax.set_xlabel("share of households (%)") +ax.set_ylabel("share of income/wealth (%)") +ax.legend() plt.show() ``` -Here all the income and wealth measures are pre-tax. -Total income is the sum of households' all income sources, including labor income but excluding capital gains. +One key finding from this figure is that wealth inequality is more extreme than income inequality. -One key finding from this figure is that wealth inequality is significantly -more extreme than income inequality. -+++ -## The Gini coefficient -The Lorenz curve is a useful visual representation of inequality in a -distribution. +## The Gini coefficient -Another popular measure of income and wealth inequality is the Gini coefficient. +The Lorenz curve is a useful visual representation of inequality in a distribution. -The Gini coefficient is just a number, rather than a curve. +Another way to study income and wealth inequality is via the Gini coefficient. In this section we discuss the Gini coefficient and its relationship to the Lorenz curve. -### Definition +### Definition -As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from -smallest to largest. +As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. The Gini coefficient is defined for the sample above as -\begin{equation} - \label{eq:gini} - G := - \frac - {\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} - {2n\sum_{i=1}^n w_i}. -\end{equation} +```{prf:definition} +:label: define-gini +$$ +G := +\frac{\sum_{i=1}^n \sum_{j = 1}^n |w_j - w_i|} + {2n\sum_{i=1}^n w_i}. +$$ +``` The Gini coefficient is closely related to the Lorenz curve. In fact, it can be shown that its value is twice the area between the line of -equality and the Lorenz curve (e.g., the shaded area in the following Figure below). +equality and the Lorenz curve (e.g., the shaded area in {numref}`lorenz_gini`). The idea is that $G=0$ indicates complete equality, while $G=1$ indicates complete inequality. @@ -305,38 +354,100 @@ The idea is that $G=0$ indicates complete equality, while $G=1$ indicates comple --- mystnb: figure: - caption: "Shaded Lorenz curve of simulated data" + caption: Shaded Lorenz curve of simulated data name: lorenz_gini - image: - alt: lorenz_gini --- fig, ax = plt.subplots() - -f_vals, l_vals = qe.lorenz_curve(sample) +f_vals, l_vals = lorenz_curve(sample) ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2) ax.plot(f_vals, f_vals, label='equality', lw=2) - -ax.legend(fontsize=12) - ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--') ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--') - ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) - ax.set_ylim((0, 1)) ax.set_xlim((0, 1)) +ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area') +ax.set_xlabel("share of households (%)") +ax.set_ylabel("share of income/wealth (%)") +ax.legend() +plt.show() +``` + +In fact the Gini coefficient can also be expressed as + +$$ +G = \frac{A}{A+B} +$$ -ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area', fontsize=12) - +where $A$ is the area between the 45-degree line of +perfect equality and the Lorenz curve, while $B$ is the area below the Lorenze curve -- see {numref}`lorenz_gini2`. + +```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Lorenz curve and Gini coefficient + name: lorenz_gini2 +--- +fig, ax = plt.subplots() +f_vals, l_vals = lorenz_curve(sample) +ax.plot(f_vals, l_vals, label='lognormal sample', lw=2) +ax.plot(f_vals, f_vals, label='equality', lw=2) +ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06) +ax.fill_between(f_vals, l_vals, np.zeros_like(f_vals), alpha=0.06) +ax.set_ylim((0, 1)) +ax.set_xlim((0, 1)) +ax.text(0.55, 0.4, 'A') +ax.text(0.75, 0.15, 'B') +ax.set_xlabel("share of households (%)") +ax.set_ylabel("share of income/wealth (%)") +ax.legend() plt.show() ``` -### Gini coefficient dynamics of simulated data + + +```{seealso} +The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient) +``` + +### Gini coefficient of simulated data Let's examine the Gini coefficient in some simulations. -The following code computes the Gini coefficients for five different -populations. +The code below computes the Gini coefficient from a sample. + +```{code-cell} ipython3 + +def gini_coefficient(y): + r""" + Implements the Gini inequality index + + Parameters + ---------- + y : array_like(float) + Array of income/wealth for each individual. + Ordered or unordered is fine + + Returns + ------- + Gini index: float + The gini index describing the inequality of the array of income/wealth + + References + ---------- + + https://en.wikipedia.org/wiki/Gini_coefficient + """ + n = len(y) + i_sum = np.zeros(n) + for i in range(n): + for j in range(n): + i_sum[i] += abs(y[i] - y[j]) + return np.sum(i_sum) / (2 * n * np.sum(y)) +``` + +Now we can compute the Gini coefficients for five different populations. Each of these populations is generated by drawing from a lognormal distribution with parameters $\mu$ (mean) and $\sigma$ (standard deviation). @@ -348,8 +459,8 @@ In each case we set $\mu = - \sigma^2 / 2$. This implies that the mean of the distribution does not change with $\sigma$. -(You can check this by looking up the expression for the mean of a lognormal -distribution.) +You can check this by looking up the expression for the mean of a lognormal +distribution. ```{code-cell} ipython3 k = 5 @@ -361,53 +472,140 @@ ginis = [] for σ in σ_vals: μ = -σ**2 / 2 y = np.exp(μ + σ * np.random.randn(n)) - ginis.append(qe.gini_coefficient(y)) + ginis.append(gini_coefficient(y)) ``` +Let's build a function that returns a figure (so that we can use it later in the lecture). + ```{code-cell} ipython3 def plot_inequality_measures(x, y, legend, xlabel, ylabel): - fig, ax = plt.subplots() ax.plot(x, y, marker='o', label=legend) - - ax.set_xlabel(xlabel, fontsize=12) - ax.set_ylabel(ylabel, fontsize=12) - - ax.legend(fontsize=12) - plt.show() + ax.set_xlabel(xlabel) + ax.set_ylabel(ylabel) + ax.legend() + return fig, ax ``` ```{code-cell} ipython3 --- mystnb: figure: - caption: "Gini coefficients of simulated data" + caption: Gini coefficients of simulated data name: gini_simulated - image: - alt: gini_simulated --- -plot_inequality_measures(σ_vals, - ginis, - 'simulated', - '$\sigma$', - 'gini coefficients') +fix, ax = plot_inequality_measures(σ_vals, + ginis, + 'simulated', + '$\sigma$', + 'Gini coefficients') +plt.show() ``` The plots show that inequality rises with $\sigma$, according to the Gini coefficient. -+++ +### Gini coefficient for income (US data) -### Gini coefficient dynamics for US data +Let's look at the Gini coefficient for the distribution of income in the US. -Now let's look at Gini coefficients for US data derived from the SCF. +We will get pre-computed Gini coefficients (based on income) from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data). -The following code creates a list called ``Ginis``. +Let's use the `wbgapi` package we imported earlier to search the world bank data for Gini to find the Series ID. - It stores data of Gini coefficients generated from the dataframe ``df_income_wealth`` and method [gini_coefficient](https://quanteconpy.readthedocs.io/en/latest/tools/inequality.html#quantecon.inequality.gini_coefficient), from [QuantEcon](https://quantecon.org/quantecon-py/) library. +```{code-cell} ipython3 +wb.search("gini") +``` + +We now know the series ID is `SI.POV.GINI`. + +(Another way to find the series ID is to use the [World Bank data portal](https://data.worldbank.org) and then use `wbgapi` to fetch the data.) + +To get a quick overview, let's histogram Gini coefficients across all countries and all years in the World Bank dataset. ```{code-cell} ipython3 -:tags: [hide-input] +--- +mystnb: + figure: + caption: Histogram of Gini coefficients + name: gini_histogram +--- +# Fetch gini data for all countries +gini_all = wb.data.DataFrame("SI.POV.GINI") +# remove 'YR' in index and convert to integer +gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) + +# Create a long series with a multi-index of the data to get global min and max values +gini_all = gini_all.unstack(level='economy').dropna() + +# Build a histogram +ax = gini_all.plot(kind="hist", bins=20) +ax.set_xlabel("Gini coefficient") +ax.set_ylabel("frequency") +plt.show() +``` + +We can see in {numref}`gini_histogram` that across 50 years of data and all countries the measure varies between 20 and 65. + +Let us fetch the data `DataFrame` for the USA. + +```{code-cell} ipython3 +data = wb.data.DataFrame("SI.POV.GINI", "USA") +data.head(n=5) +# remove 'YR' in index and convert to integer +data.columns = data.columns.map(lambda x: int(x.replace('YR',''))) +``` + +(This package often returns data with year information contained in the columns. This is not always convenient for simple plotting with pandas so it can be useful to transpose the results before plotting.) + + +```{code-cell} ipython3 +data = data.T # Obtain years as rows +data_usa = data['USA'] # pd.Series of US data +``` + +Let us take a look at the data for the US. + +```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients for income distribution (USA) + name: gini_usa1 +--- +fig, ax = plt.subplots() +ax = data_usa.plot(ax=ax) +ax.set_ylim(data_usa.min()-1, data_usa.max()+1) +ax.set_ylabel("Gini coefficient (income)") +ax.set_xlabel("year") +plt.show() +``` + +As can be seen in {numref}`gini_usa1`, the income Gini +trended upward from 1980 to 2020 and then dropped following at the start of the COVID pandemic. + +(compare-income-wealth-usa-over-time)= +### Gini coefficient for wealth (US data) + +In the previous section we looked at the Gini coefficient for income using US data. + +Now let's look at the Gini coefficient for the distribution of wealth. + +We can use the {ref}`Survey of Consumer Finances data ` to look at the Gini coefficient +computed over the wealth distribution. + + +```{code-cell} ipython3 +df_income_wealth.year.describe() +``` + +**Note:** This code can be used to compute this information over the full dataset. + +```{code-cell} ipython3 +:tags: [skip-execution, hide-input, hide-output] + +!pip install quantecon +import quantecon as qe varlist = ['n_wealth', # net wealth 't_income', # total income @@ -416,13 +614,11 @@ varlist = ['n_wealth', # net wealth df = df_income_wealth # create lists to store Gini for each inequality measure - -Ginis = [] +results = {} for var in varlist: # create lists to store Gini - ginis = [] - + gini_yr = [] for year in years: # repeat the observations according to their weights counts = list(round(df[df['year'] == year]['weights'] )) @@ -430,108 +626,227 @@ for var in varlist: y = np.asarray(y) rd.shuffle(y) # shuffle the sequence - + # calculate and store Gini gini = qe.gini_coefficient(y) - ginis.append(gini) + gini_yr.append(gini) - Ginis.append(ginis) -``` + results[var] = gini_yr -```{code-cell} ipython3 -ginis_nw, ginis_ti, ginis_li = Ginis +# Convert to DataFrame +results = pd.DataFrame(results, index=years) +results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year') ``` -Let's plot the Gini coefficients for net wealth, labor income and total income. +However, to speed up execution we will import a pre-computed dataset from the lecture repository. + + ```{code-cell} ipython3 -# use an average to replace an outlier in labor income gini -ginis_li_new = ginis_li -ginis_li_new[5] = (ginis_li[4] + ginis_li[6]) / 2 +ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year') +ginis.head(n=5) ``` +Let's plot the Gini coefficients for net wealth. + ```{code-cell} ipython3 --- mystnb: figure: - caption: "Gini coefficients of US net wealth" + caption: Gini coefficients of US net wealth name: gini_wealth_us - image: - alt: gini_wealth_us --- -xlabel = "year" -ylabel = "gini coefficient" - fig, ax = plt.subplots() +ax.plot(years, ginis["n_wealth"], marker='o') +ax.set_xlabel("year") +ax.set_ylabel("Gini coefficient") +plt.show() +``` -ax.plot(years, ginis_nw, marker='o') +The time series for the wealth Gini exhibits a U-shape, falling until the early +1980s and then increasing rapidly. -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) - -plt.show() + +One possibility is that this change is mainly driven by technology. + +However, we will see below that not all advanced economies experienced similar growth of inequality. + + + + + +### Cross-country comparisons of income inequality + +Earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all` + +In this section we will use this data to compare several advanced economies, and +to look at the evolution in their respective income Ginis. + +```{code-cell} ipython3 +data = gini_all.unstack() +data.columns ``` +There are 167 countries represented in this dataset. + +Let us compare three advanced economies: the US, the UK, and Norway + ```{code-cell} ipython3 --- mystnb: figure: - caption: "Gini coefficients of US income" - name: gini_income_us - image: - alt: gini_income_us + caption: Gini coefficients for income (USA, United Kingdom, and Norway) + name: gini_usa_gbr_nor1 --- -xlabel = "year" -ylabel = "gini coefficient" +ax = data[['USA','GBR', 'NOR']].plot() +ax.set_xlabel('year') +ax.set_ylabel('Gini coefficient') +ax.legend(title="") +plt.show() +``` -fig, ax = plt.subplots() +We see that Norway has a shorter time series. + +Let us take a closer look at the underlying data and see if we can rectify this. + +```{code-cell} ipython3 +data[['NOR']].dropna().head(n=5) +``` -ax.plot(years, ginis_li_new, marker='o', label="labor income") -ax.plot(years, ginis_ti, marker='o', label="total income") +The data for Norway in this dataset goes back to 1979 but there are gaps in the time series and matplotlib is not showing those data points. -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) +We can use the `.ffill()` method to copy and bring forward the last known value in a series to fill in these gaps -ax.legend(fontsize=12) +```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Gini coefficients for income (USA, United Kingdom, and Norway) + name: gini_usa_gbr_nor2 +--- +data['NOR'] = data['NOR'].ffill() +ax = data[['USA','GBR', 'NOR']].plot() +ax.set_xlabel('year') +ax.set_ylabel('Gini coefficient') +ax.legend(title="") plt.show() ``` -We see that, by this measure, inequality in wealth and income has risen -substantially since 1980. +From this plot we can observe that the US has a higher Gini coefficient (i.e. +higher income inequality) when compared to the UK and Norway. + +Norway has the lowest Gini coefficient over the three economies and, moreover, +the Gini coefficient shows no upward trend. + + + +### Gini Coefficient and GDP per capita (over time) + +We can also look at how the Gini coefficient compares with GDP per capita (over time). + +Let's take another look at the US, Norway, and the UK. + +```{code-cell} ipython3 +countries = ['USA', 'NOR', 'GBR'] +gdppc = wb.data.DataFrame("NY.GDP.PCAP.KD", countries) +# remove 'YR' in index and convert to integer +gdppc.columns = gdppc.columns.map(lambda x: int(x.replace('YR',''))) +gdppc = gdppc.T +``` + +We can rearrange the data so that we can plot GDP per capita and the Gini coefficient across years + +```{code-cell} ipython3 +plot_data = pd.DataFrame(data[countries].unstack()) +plot_data.index.names = ['country', 'year'] +plot_data.columns = ['gini'] +``` + +Now we can get the GDP per capita data into a shape that can be merged with `plot_data` + +```{code-cell} ipython3 +pgdppc = pd.DataFrame(gdppc.unstack()) +pgdppc.index.names = ['country', 'year'] +pgdppc.columns = ['gdppc'] +plot_data = plot_data.merge(pgdppc, left_index=True, right_index=True) +plot_data.reset_index(inplace=True) +``` + +Now we use Plotly to build a plot with GDP per capita on the y-axis and the Gini coefficient on the x-axis. + +```{code-cell} ipython3 +min_year = plot_data.year.min() +max_year = plot_data.year.max() +``` + +The time series for all three countries start and stop in different years. We will add a year mask to the data to +improve clarity in the chart including the different end years associated with each countries time series. + +```{code-cell} ipython3 +labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + \ + list(range(min_year,max_year,5)) +plot_data.year = plot_data.year.map(lambda x: x if x in labels else None) +``` + +(fig:plotly-gini-gdppc-years)= + +```{code-cell} ipython3 +fig = px.line(plot_data, + x = "gini", + y = "gdppc", + color = "country", + text = "year", + height = 800, + labels = {"gini" : "Gini coefficient", "gdppc" : "GDP per capita"} + ) +fig.update_traces(textposition="bottom right") +fig.show() +``` + +```{only} latex +This figure is built using `plotly` and is {ref}` available on the website ` +``` + +This plot shows that all three Western economies GDP per capita has grown over +time with some fluctuations in the Gini coefficient. -The wealth time series exhibits a strong U-shape. +From the early 80's the United Kingdom and the US economies both saw increases +in income inequality. + +Interestingly, since the year 2000, the United Kingdom saw a decline in income inequality while +the US exhibits persistent but stable levels around a Gini coefficient of 40. ## Top shares Another popular measure of inequality is the top shares. -Measuring specific shares is less complex than the Lorenz curve or the Gini -coefficient. In this section we show how to compute top shares. -### Definition +### Definition As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from smallest to largest. Given the Lorenz curve $y = L(x)$ defined above, the top $100 \times p \%$ share is defined as +```{prf:definition} +:label: top-shares + $$ T(p) = 1 - L (1-p) \approx \frac{\sum_{j\geq i} w_j}{ \sum_{j \leq n} w_j}, \quad i = \lfloor n (1-p)\rfloor -$$(topshares) +$$ (topshares) +``` Here $\lfloor \cdot \rfloor$ is the floor function, which rounds any number down to the integer less than or equal to that number. -+++ - -The following code uses the data from dataframe ``df_income_wealth`` to generate another dataframe ``df_topshares``. +The following code uses the data from dataframe `df_income_wealth` to generate another dataframe `df_topshares`. -``df_topshares`` stores the top 10 percent shares for the total income, the labor income and net wealth from 1950 to 2016 in US. +`df_topshares` stores the top 10 percent shares for the total income, the labor income and net wealth from 1950 to 2016 in US. ```{code-cell} ipython3 :tags: [hide-input] @@ -545,19 +860,16 @@ df4 = pd.merge(df3, df1, how="left", on=["year"]) df4['r_weights'] = df4['weights'] / df4['r_weights'] # create weighted nw, ti, li - df4['weighted_n_wealth'] = df4['n_wealth'] * df4['r_weights'] df4['weighted_t_income'] = df4['t_income'] * df4['r_weights'] df4['weighted_l_income'] = df4['l_income'] * df4['r_weights'] # extract two top 10% groups by net wealth and total income. - df6 = df4[df4['nw_groups'] == 'Top 10%'] df7 = df4[df4['ti_groups'] == 'Top 10%'] # calculate the sum of weighted top 10% by net wealth, # total income and labor income. - df5 = df4.groupby('year').sum(numeric_only=True).reset_index() df8 = df6.groupby('year').sum(numeric_only=True).reset_index() df9 = df7.groupby('year').sum(numeric_only=True).reset_index() @@ -567,7 +879,6 @@ df5['weighted_t_income_top10'] = df9['weighted_t_income'] df5['weighted_l_income_top10'] = df9['weighted_l_income'] # calculate the top 10% shares of the three variables. - df5['topshare_n_wealth'] = df5['weighted_n_wealth_top10'] / \ df5['weighted_n_wealth'] df5['topshare_t_income'] = df5['weighted_t_income_top10'] / \ @@ -586,34 +897,24 @@ Then let's plot the top shares. --- mystnb: figure: - caption: "US top shares" + caption: US top shares name: top_shares_us - image: - alt: top_shares_us --- -xlabel = "year" -ylabel = "top $10\%$ share" - fig, ax = plt.subplots() - ax.plot(years, df_topshares["topshare_l_income"], marker='o', label="labor income") ax.plot(years, df_topshares["topshare_n_wealth"], marker='o', label="net wealth") ax.plot(years, df_topshares["topshare_t_income"], marker='o', label="total income") - -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) - -ax.legend(fontsize=12) +ax.set_xlabel("year") +ax.set_ylabel("top $10\%$ share") +ax.legend() plt.show() ``` ## Exercises -+++ - ```{exercise} :label: inequality_ex1 @@ -634,8 +935,6 @@ Confirm that higher variance generates more dispersion in the sample, and hence greater inequality. ``` -+++ - ```{solution-start} inequality_ex1 :class: dropdown ``` @@ -664,10 +963,10 @@ l_vals = [] for σ in σ_vals: μ = -σ ** 2 / 2 y = np.exp(μ + σ * np.random.randn(n)) - f_val, l_val = qe._inequality.lorenz_curve(y) + f_val, l_val = lorenz_curve(y) f_vals.append(f_val) l_vals.append(l_val) - ginis.append(qe._inequality.gini_coefficient(y)) + ginis.append(gini_coefficient(y)) topshares.append(calculate_top_share(y)) ``` @@ -675,39 +974,41 @@ for σ in σ_vals: --- mystnb: figure: - caption: "Top shares of simulated data" + caption: Top shares of simulated data name: top_shares_simulated image: alt: top_shares_simulated --- -plot_inequality_measures(σ_vals, - topshares, - "simulated data", - "$\sigma$", - "top $10\%$ share") +fig, ax = plot_inequality_measures(σ_vals, + topshares, + "simulated data", + "$\sigma$", + "top $10\%$ share") +plt.show() ``` ```{code-cell} ipython3 --- mystnb: figure: - caption: "Gini coefficients of simulated data" + caption: Gini coefficients of simulated data name: gini_coef_simulated image: alt: gini_coef_simulated --- -plot_inequality_measures(σ_vals, - ginis, - "simulated data", - "$\sigma$", - "gini coefficient") +fig, ax = plot_inequality_measures(σ_vals, + ginis, + "simulated data", + "$\sigma$", + "gini coefficient") +plt.show() ``` ```{code-cell} ipython3 --- mystnb: figure: - caption: "Lorenz curves for simulated data" + caption: Lorenz curves for simulated data name: lorenz_curve_simulated image: alt: lorenz_curve_simulated @@ -735,8 +1036,6 @@ Plot the top shares generated from Lorenz curve and the top shares approximated ``` -+++ - ```{solution-start} inequality_ex2 :class: dropdown ``` @@ -759,24 +1058,20 @@ for f_val, l_val in zip(f_vals_nw, l_vals_nw): --- mystnb: figure: - caption: "US top shares: approximation vs Lorenz" + caption: 'US top shares: approximation vs Lorenz' name: top_shares_us_al image: alt: top_shares_us_al --- -xlabel = "year" -ylabel = "top $10\%$ share" - fig, ax = plt.subplots() ax.plot(years, df_topshares["topshare_n_wealth"], marker='o',\ label="net wealth-approx") ax.plot(years, top_shares_nw, marker='o', label="net wealth-lorenz") -ax.set_xlabel(xlabel, fontsize=12) -ax.set_ylabel(ylabel, fontsize=12) - -ax.legend(fontsize=12) +ax.set_xlabel("year") +ax.set_ylabel("top $10\%$ share") +ax.legend() plt.show() ```