@@ -125,7 +125,7 @@ Many statisticians and econometricians
125
125
use rules of thumb such as "outcomes more than four or five
126
126
standard deviations from the mean can safely be ignored."
127
127
128
- But this is only true when distributions have light tails...
128
+ But this is only true when distributions have light tails.
129
129
130
130
131
131
### When are light tails valid?
@@ -400,7 +400,7 @@ Notice how extreme outcomes are more common.
400
400
401
401
### Counter CDFs
402
402
403
- For nonnegative random varialbes , one way to visualize the difference between
403
+ For nonnegative random variables , one way to visualize the difference between
404
404
light and heavy tails is to look at the
405
405
** counter CDF** (CCDF).
406
406
@@ -424,7 +424,7 @@ $$ G_P(x) = x^{- \alpha} $$
424
424
425
425
This function goes to zero as $x \to \infty$, but much slower than $G_E$.
426
426
427
- Here's a plot that illustrates how $G_E$ goes to zero faster that $G_P$.
427
+ Here's a plot that illustrates how $G_E$ goes to zero faster than $G_P$.
428
428
429
429
``` {code-cell} ipython3
430
430
x = np.linspace(1.5, 100, 1000)
@@ -452,12 +452,12 @@ In the log-log plot, the Pareto CCDF is linear, while the exponential one is
452
452
concave.
453
453
454
454
This idea is often used to separate light- and heavy-tailed distributions in
455
- visualations --- we return to this point below.
455
+ visualisations --- we return to this point below.
456
456
457
457
458
458
### Empirical CCDFs
459
459
460
- The sample countpart of the CCDF function is the ** empirical CCDF** .
460
+ The sample counterpart of the CCDF function is the ** empirical CCDF** .
461
461
462
462
Given a sample $x_1, \ldots, x_n$, the empirical CCDF is given by
463
463
@@ -529,7 +529,7 @@ We can write this more mathematically as
529
529
```
530
530
531
531
It is also common to say that a random variable $X$ with this property
532
- has a ** Pareto tail** with ** tail index** $\alpha$ if
532
+ has a ** Pareto tail** with ** tail index** $\alpha$.
533
533
534
534
Notice that every Pareto distribution with tail index $\alpha$
535
535
has a ** Pareto tail** with ** tail index** $\alpha$.
@@ -548,7 +548,7 @@ As mentioned above, heavy tails are pervasive in economic data.
548
548
549
549
In fact power laws seem to be very common as well.
550
550
551
- We now illustrate this by showing the empirical CCDF of
551
+ We now illustrate this by showing the empirical CCDF of heavy tails.
552
552
553
553
All plots are in log-log, so that a power law shows up as a linear log-log
554
554
plot, at least in the upper tail.
@@ -642,7 +642,7 @@ def extract_wb(varlist=['NY.GDP.MKTP.CD'],
642
642
643
643
### Firm size
644
644
645
- Here is a plot of the firm size distribution taken from Forbes Global 2000.
645
+ Here is a plot of the firm size distribution for the largest 500 firms in 2020 taken from Forbes Global 2000.
646
646
647
647
``` {code-cell} ipython3
648
648
:tags: [hide-input]
@@ -652,46 +652,39 @@ df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']]
652
652
fig, ax = plt.subplots(figsize=(6.4, 3.5))
653
653
654
654
label="firm size (market value)"
655
+ top = 500 # set the cutting for top
655
656
d = df_fs.sort_values('Market Value', ascending=False)
656
- empirical_ccdf(np.asarray(d['Market Value'])[0:500 ], ax, label=label, add_reg_line=True)
657
+ empirical_ccdf(np.asarray(d['Market Value'])[:top ], ax, label=label, add_reg_line=True)
657
658
658
659
plt.show()
659
660
```
660
661
661
662
### City size
662
663
663
- Here is a plot of the city size distribution for the US, where size is
664
- measured by population.
664
+ Here are plots of the city size distribution for the US and brazil in 2023 from world population review.
665
+
666
+ The size is measured by population.
665
667
666
668
``` {code-cell} ipython3
667
669
:tags: [hide-input]
668
670
669
- df_cs_us = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.txt', delimiter="\t", header=None)
670
- df_cs_us = df_cs_us[[0, 3]]
671
- df_cs_us.columns = 'rank', 'pop'
672
- x = np.asarray(df_cs_us['pop'])
673
- citysize = []
674
- for i in x:
675
- i = i.replace(",", "")
676
- citysize.append(int(i))
677
- df_cs_us['pop'] = citysize
678
- df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_brazil.csv', delimiter=",", header=None)
679
- df_cs_br.columns = df_cs_br.iloc[0]
680
- df_cs_br = df_cs_br[1:401]
681
- df_cs_br = df_cs_br.astype({"pop2023": float})
671
+ # import population data of cities in 2023 United States and 2023 Brazil from world population review
672
+ df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.csv')
673
+ df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_brazil.csv')
682
674
683
675
fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
684
- empirical_ccdf(np.asarray(df_cs_us['pop']), axes[0], label="US", add_reg_line=True)
676
+
677
+ empirical_ccdf(np.asarray(df_cs_us["pop2023"]), axes[0], label="US", add_reg_line=True)
685
678
empirical_ccdf(np.asarray(df_cs_br['pop2023']), axes[1], label="Brazil", add_reg_line=True)
686
679
687
680
plt.show()
688
681
```
689
682
690
683
### Wealth
691
684
692
- Here is a plot of the upper tail of the wealth distribution.
685
+ Here is a plot of the upper tail (top 500) of the wealth distribution.
693
686
694
- The data is from the Forbes billionaires list.
687
+ The data is from the Forbes Billionaires list in 2020 .
695
688
696
689
``` {code-cell} ipython3
697
690
:tags: [hide-input]
@@ -710,10 +703,11 @@ for i, c in enumerate(countries):
710
703
df_w_c = df_w[df_w['country'] == c].reset_index()
711
704
z = np.asarray(df_w_c['realTimeWorth'])
712
705
# print('number of the global richest 2000 from '+ c, len(z))
713
- if len(z) <= 500: # cut-off number: top 500
714
- z = z[0:500]
706
+ top = 500 # cut-off number: top 500
707
+ if len(z) <= top:
708
+ z = z[:top]
715
709
716
- empirical_ccdf(z[0:500 ], axs[i], label=c, xlabel='log wealth', add_reg_line=True)
710
+ empirical_ccdf(z[:top ], axs[i], label=c, xlabel='log wealth', add_reg_line=True)
717
711
718
712
fig.tight_layout()
719
713
@@ -742,6 +736,8 @@ df_gdp1 = extract_wb(varlist=variable_code,
742
736
```
743
737
744
738
``` {code-cell} ipython3
739
+ :tags: [hide-input]
740
+
745
741
fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
746
742
747
743
for name, ax in zip(variable_names, axes):
@@ -856,7 +852,7 @@ You will find that
856
852
857
853
Diversification reduces risk, as expected.
858
854
859
- But there is a hidden assumption here: the variance is of returns is finite.
855
+ But there is a hidden assumption here: the variance of returns is finite.
860
856
861
857
If the distribution is heavy-tailed and the variance is infinite, then this
862
858
logic is incorrect.
@@ -1071,9 +1067,6 @@ Present discounted value of tax revenue will be estimated by
1071
1067
1. multiplying by the tax rate, and
1072
1068
1. summing the results with discounting to obtain present value.
1073
1069
1074
- If $X$ has the Pareto distribution, then there are positive constants $\bar x$
1075
- and $\alpha$ such that
1076
-
1077
1070
The Pareto distribution is assumed to take the form {eq}`pareto` with $\bar x = 1$ and $\alpha = 1.05$.
1078
1071
1079
1072
(The value the tail index $\alpha$ is plausible given the data {cite}`gabaix2016power`.)
0 commit comments