@@ -280,7 +280,7 @@ The bottom subfigure shows 120 independent draws from [the Cauchy
280
280
distribution] ( https://en.wikipedia.org/wiki/Cauchy_distribution ) , which is
281
281
heavy-tailed.
282
282
283
- ``` {code-cell} python3
283
+ ``` {code-cell} ipython3
284
284
n = 120
285
285
np.random.seed(11)
286
286
@@ -333,7 +333,7 @@ The exponential distribution is a light-tailed distribution.
333
333
334
334
Here are some draws from the exponential distribution.
335
335
336
- ``` {code-cell} python3
336
+ ``` {code-cell} ipython3
337
337
n = 120
338
338
np.random.seed(11)
339
339
@@ -382,7 +382,7 @@ is Pareto-distributed with minimum $\bar x$ and tail index $\alpha$.
382
382
Here are some draws from the Pareto distribution with tail index $1$ and minimum
383
383
$1$.
384
384
385
- ``` {code-cell} python3
385
+ ``` {code-cell} ipython3
386
386
n = 120
387
387
np.random.seed(11)
388
388
@@ -426,7 +426,7 @@ This function goes to zero as $x \to \infty$, but much slower than $G_E$.
426
426
427
427
Here's a plot that illustrates how $G_E$ goes to zero faster that $G_P$.
428
428
429
- ``` {code-cell} python3
429
+ ``` {code-cell} ipython3
430
430
x = np.linspace(1.5, 100, 1000)
431
431
fig, ax = plt.subplots()
432
432
alpha = 1.0
@@ -435,11 +435,11 @@ ax.plot(x, x**(- alpha), label='Pareto', alpha=0.8)
435
435
ax.legend()
436
436
plt.show()
437
437
```
438
+
438
439
Here's a log-log plot of the same functions, which makes visual comparison a
439
440
bit easier.
440
441
441
-
442
- ``` {code-cell} python3
442
+ ``` {code-cell} ipython3
443
443
fig, ax = plt.subplots()
444
444
alpha = 1.0
445
445
ax.loglog(x, np.exp(- alpha * x), label='exponential', alpha=0.8)
@@ -461,13 +461,12 @@ The sample countpart of the CCDF function is the **empirical CCDF**.
461
461
462
462
Given a sample $x_1, \ldots, x_n$, the empirical CCDF is given by
463
463
464
- $$ \hat G(x) = \frac{1}{n} \sum_{i=1}^n \1\{x_i > x\} $$
464
+ $$ \hat G(x) = \frac{1}{n} \sum_{i=1}^n \mathbb 1\{x_i > x\} $$
465
465
466
466
Thus, $\hat G(x)$ shows the fraction of the sample that exceeds $x$.
467
467
468
468
Here's a figure containing some empirical CCDFs from simulated data.
469
469
470
-
471
470
``` {code-cell} ipython3
472
471
def eccdf(x, data):
473
472
"Simple empirical CCDF function."
@@ -559,6 +558,7 @@ readers are of course welcome to explore the code (perhaps after examining the f
559
558
560
559
``` {code-cell} ipython3
561
560
:tags: [hide-input]
561
+
562
562
def empirical_ccdf(data,
563
563
ax,
564
564
aw=None, # weights
@@ -616,35 +616,35 @@ def empirical_ccdf(data,
616
616
return np.log(data), y_vals, p_vals
617
617
```
618
618
619
-
620
619
``` {code-cell} ipython3
621
620
:tags: [hide-input]
621
+
622
622
def extract_wb(varlist=['NY.GDP.MKTP.CD'],
623
- c='all ',
623
+ c='all_countries ',
624
624
s=1900,
625
625
e=2021,
626
626
varnames=None):
627
627
if c == "all_countries":
628
- # keep countries only (no aggregated regions)
628
+ # Keep countries only (no aggregated regions)
629
629
countries = wb.get_countries()
630
- countries_code = countries[countries['region'] != 'Aggregates']['iso3c'].values
631
-
632
- df = wb.download(indicator=varlist, country=countries_code, start=s, end=e).stack().unstack(0).reset_index()
633
- df = df.drop(['level_1'], axis=1).transpose() # set_index(['year'])
634
- if varnames != None:
630
+ countries_name = countries[countries['region'] != 'Aggregates']['name'].values
631
+ c = "all"
632
+
633
+ df = wb.download(indicator=varlist, country=c, start=s, end=e).stack().unstack(0).reset_index()
634
+ df = df.drop(['level_1'], axis=1).transpose()
635
+ if varnames is not None:
635
636
df.columns = varnames
636
637
df = df[1:]
637
638
return df
638
639
```
639
640
640
-
641
-
642
641
### Firm size
643
642
644
643
Here is a plot of the firm size distribution taken from Forbes Global 2000.
645
644
646
645
``` {code-cell} ipython3
647
646
:tags: [hide-input]
647
+
648
648
df_fs = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/forbes-global2000.csv')
649
649
df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']]
650
650
fig, ax = plt.subplots(figsize=(6.4, 3.5))
@@ -663,6 +663,7 @@ measured by population.
663
663
664
664
``` {code-cell} ipython3
665
665
:tags: [hide-input]
666
+
666
667
df_cs_us = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.txt', delimiter="\t", header=None)
667
668
df_cs_us = df_cs_us[[0, 3]]
668
669
df_cs_us.columns = 'rank', 'pop'
@@ -692,6 +693,7 @@ The data is from the Forbes billionaires list.
692
693
693
694
``` {code-cell} ipython3
694
695
:tags: [hide-input]
696
+
695
697
df_w = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/forbes-billionaires.csv')
696
698
df_w = df_w[['country', 'realTimeWorth', 'realTimeRank']].dropna()
697
699
df_w = df_w.astype({'realTimeRank': int})
@@ -724,6 +726,7 @@ Here we show cross-country per capita GDP.
724
726
725
727
``` {code-cell} ipython3
726
728
:tags: [hide-input]
729
+
727
730
# get gdp and gdp per capita for all regions and countries in 2021
728
731
729
732
variable_code = ['NY.GDP.MKTP.CD', 'NY.GDP.PCAP.CD']
@@ -734,7 +737,9 @@ df_gdp1 = extract_wb(varlist=variable_code,
734
737
s="2021",
735
738
e="2021",
736
739
varnames=variable_names)
740
+ ```
737
741
742
+ ``` {code-cell} ipython3
738
743
fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
739
744
740
745
for name, ax in zip(variable_names, axes):
@@ -778,7 +783,7 @@ For example, it fails for the Cauchy distribution.
778
783
Let's have a look at the behavior of the sample mean in this case, and see
779
784
whether or not the LLN is still valid.
780
785
781
- ``` {code-cell} python3
786
+ ``` {code-cell} ipython3
782
787
from scipy.stats import cauchy
783
788
784
789
np.random.seed(1234)
0 commit comments