Skip to content

Commit f97b92f

Browse files
committed
modification
1 parent e1d91ee commit f97b92f

File tree

1 file changed

+27
-34
lines changed

1 file changed

+27
-34
lines changed

lectures/heavy_tails.md

Lines changed: 27 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ Many statisticians and econometricians
125125
use rules of thumb such as "outcomes more than four or five
126126
standard deviations from the mean can safely be ignored."
127127

128-
But this is only true when distributions have light tails...
128+
But this is only true when distributions have light tails.
129129

130130

131131
### When are light tails valid?
@@ -400,7 +400,7 @@ Notice how extreme outcomes are more common.
400400

401401
### Counter CDFs
402402

403-
For nonnegative random varialbes, one way to visualize the difference between
403+
For nonnegative random variables, one way to visualize the difference between
404404
light and heavy tails is to look at the
405405
**counter CDF** (CCDF).
406406

@@ -424,7 +424,7 @@ $$ G_P(x) = x^{- \alpha} $$
424424

425425
This function goes to zero as $x \to \infty$, but much slower than $G_E$.
426426

427-
Here's a plot that illustrates how $G_E$ goes to zero faster that $G_P$.
427+
Here's a plot that illustrates how $G_E$ goes to zero faster than $G_P$.
428428

429429
```{code-cell} ipython3
430430
x = np.linspace(1.5, 100, 1000)
@@ -452,12 +452,12 @@ In the log-log plot, the Pareto CCDF is linear, while the exponential one is
452452
concave.
453453

454454
This idea is often used to separate light- and heavy-tailed distributions in
455-
visualations --- we return to this point below.
455+
visualisations --- we return to this point below.
456456

457457

458458
### Empirical CCDFs
459459

460-
The sample countpart of the CCDF function is the **empirical CCDF**.
460+
The sample counterpart of the CCDF function is the **empirical CCDF**.
461461

462462
Given a sample $x_1, \ldots, x_n$, the empirical CCDF is given by
463463

@@ -529,7 +529,7 @@ We can write this more mathematically as
529529
```
530530

531531
It is also common to say that a random variable $X$ with this property
532-
has a **Pareto tail** with **tail index** $\alpha$ if
532+
has a **Pareto tail** with **tail index** $\alpha$.
533533

534534
Notice that every Pareto distribution with tail index $\alpha$
535535
has a **Pareto tail** with **tail index** $\alpha$.
@@ -548,7 +548,7 @@ As mentioned above, heavy tails are pervasive in economic data.
548548

549549
In fact power laws seem to be very common as well.
550550

551-
We now illustrate this by showing the empirical CCDF of
551+
We now illustrate this by showing the empirical CCDF of heavy tails.
552552

553553
All plots are in log-log, so that a power law shows up as a linear log-log
554554
plot, at least in the upper tail.
@@ -642,7 +642,7 @@ def extract_wb(varlist=['NY.GDP.MKTP.CD'],
642642

643643
### Firm size
644644

645-
Here is a plot of the firm size distribution taken from Forbes Global 2000.
645+
Here is a plot of the firm size distribution for the largest 500 firms in 2020 taken from Forbes Global 2000.
646646

647647
```{code-cell} ipython3
648648
:tags: [hide-input]
@@ -652,46 +652,39 @@ df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']]
652652
fig, ax = plt.subplots(figsize=(6.4, 3.5))
653653
654654
label="firm size (market value)"
655+
top = 500 # set the cutting for top
655656
d = df_fs.sort_values('Market Value', ascending=False)
656-
empirical_ccdf(np.asarray(d['Market Value'])[0:500], ax, label=label, add_reg_line=True)
657+
empirical_ccdf(np.asarray(d['Market Value'])[:top], ax, label=label, add_reg_line=True)
657658
658659
plt.show()
659660
```
660661

661662
### City size
662663

663-
Here is a plot of the city size distribution for the US, where size is
664-
measured by population.
664+
Here are plots of the city size distribution for the US and brazil in 2023 from world population review.
665+
666+
The size is measured by population.
665667

666668
```{code-cell} ipython3
667669
:tags: [hide-input]
668670
669-
df_cs_us = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.txt', delimiter="\t", header=None)
670-
df_cs_us = df_cs_us[[0, 3]]
671-
df_cs_us.columns = 'rank', 'pop'
672-
x = np.asarray(df_cs_us['pop'])
673-
citysize = []
674-
for i in x:
675-
i = i.replace(",", "")
676-
citysize.append(int(i))
677-
df_cs_us['pop'] = citysize
678-
df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_brazil.csv', delimiter=",", header=None)
679-
df_cs_br.columns = df_cs_br.iloc[0]
680-
df_cs_br = df_cs_br[1:401]
681-
df_cs_br = df_cs_br.astype({"pop2023": float})
671+
# import population data of cities in 2023 United States and 2023 Brazil from world population review
672+
df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.csv')
673+
df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_brazil.csv')
682674
683675
fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
684-
empirical_ccdf(np.asarray(df_cs_us['pop']), axes[0], label="US", add_reg_line=True)
676+
677+
empirical_ccdf(np.asarray(df_cs_us["pop2023"]), axes[0], label="US", add_reg_line=True)
685678
empirical_ccdf(np.asarray(df_cs_br['pop2023']), axes[1], label="Brazil", add_reg_line=True)
686679
687680
plt.show()
688681
```
689682

690683
### Wealth
691684

692-
Here is a plot of the upper tail of the wealth distribution.
685+
Here is a plot of the upper tail (top 500) of the wealth distribution.
693686

694-
The data is from the Forbes billionaires list.
687+
The data is from the Forbes Billionaires list in 2020.
695688

696689
```{code-cell} ipython3
697690
:tags: [hide-input]
@@ -710,10 +703,11 @@ for i, c in enumerate(countries):
710703
df_w_c = df_w[df_w['country'] == c].reset_index()
711704
z = np.asarray(df_w_c['realTimeWorth'])
712705
# print('number of the global richest 2000 from '+ c, len(z))
713-
if len(z) <= 500: # cut-off number: top 500
714-
z = z[0:500]
706+
top = 500 # cut-off number: top 500
707+
if len(z) <= top:
708+
z = z[:top]
715709
716-
empirical_ccdf(z[0:500], axs[i], label=c, xlabel='log wealth', add_reg_line=True)
710+
empirical_ccdf(z[:top], axs[i], label=c, xlabel='log wealth', add_reg_line=True)
717711
718712
fig.tight_layout()
719713
@@ -742,6 +736,8 @@ df_gdp1 = extract_wb(varlist=variable_code,
742736
```
743737

744738
```{code-cell} ipython3
739+
:tags: [hide-input]
740+
745741
fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
746742
747743
for name, ax in zip(variable_names, axes):
@@ -856,7 +852,7 @@ You will find that
856852

857853
Diversification reduces risk, as expected.
858854

859-
But there is a hidden assumption here: the variance is of returns is finite.
855+
But there is a hidden assumption here: the variance of returns is finite.
860856

861857
If the distribution is heavy-tailed and the variance is infinite, then this
862858
logic is incorrect.
@@ -1071,9 +1067,6 @@ Present discounted value of tax revenue will be estimated by
10711067
1. multiplying by the tax rate, and
10721068
1. summing the results with discounting to obtain present value.
10731069
1074-
If $X$ has the Pareto distribution, then there are positive constants $\bar x$
1075-
and $\alpha$ such that
1076-
10771070
The Pareto distribution is assumed to take the form {eq}`pareto` with $\bar x = 1$ and $\alpha = 1.05$.
10781071
10791072
(The value the tail index $\alpha$ is plausible given the data {cite}`gabaix2016power`.)

0 commit comments

Comments
 (0)