Skip to content

Commit fef8440

Browse files
committed
Hypothesis testing tab done
1 parent 63f4e44 commit fef8440

2 files changed

Lines changed: 29 additions & 8 deletions

File tree

pictures/Chi-distrib.png

22.2 KB
Loading

predict_page.py

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from PIL import Image
44
import numpy as np
55
import pandas as pd
6-
from random import sample
6+
#from random import sample
77

88
from help_functions import return_heroes
99
from help_functions import list_to_df
@@ -21,8 +21,8 @@
2121

2222

2323
def show_tabs():
24-
global tab1, tab2, tab3, tab4
25-
tab1, tab2, tab3, tab4 = st.tabs(["Predict", "EDA", 'Hypothesis testing', 'Model'])
24+
global tab1, tab2, tab3
25+
tab1, tab2, tab3 = st.tabs(["Predict", "EDA", 'Hypothesis testing'])
2626

2727
def show_predict_page():
2828

@@ -92,7 +92,7 @@ def show_predict_page():
9292
code1 = '''list_with_all_heroes_from_df = df[['hero_radiant_1', 'hero_radiant_2', 'hero_radiant_3', 'hero_radiant_4', 'hero_radiant_5',
9393
'hero_dire_1', 'hero_dire_2', 'hero_dire_3', 'hero_dire_4', 'hero_dire_5']].to_numpy().ravel(order='F')
9494
pd.Series(list_with_all_heroes_from_df, name='').nunique()
95-
#this snippet returned - 123'''
95+
#this snippet returned: 123'''
9696
st.code(code1, language='python')
9797
#1.2
9898
st.subheader("Next, let's look at the five most and least popular heroes.")
@@ -140,7 +140,25 @@ def show_predict_page():
140140
st.markdown(''' ### $H_A : p_{radiant-win} ≠ p_{dire-win}$ ''')
141141
st.caption('where p - probability')
142142
st.dataframe(return_count_victories(), use_container_width=True)
143+
#1.1
144+
st.markdown('## First approach: $Chi^2$')
145+
st.markdown('### Represent our dataset in the a different way')
146+
st.dataframe(pd.DataFrame([[4108, 3254], [3681, 3681]], columns=['radiant', 'dire'], index=['observed wins', 'expected wins']), use_container_width=True)
147+
st.markdown(r'''Lets calculate Chi-squared distance by this formula:
148+
149+
$χ^2 = \displaystyle\sum_{i=1}^{n} \frac{(observed_i - expected_i)^2}{expected_i} = \frac{(4108-3681)^2}{3681} + \frac{(3254-3681)^2}{3681} = 99.06$
150+
151+
After we can calculate the p-value by plotting the distance value on the **distribution graph $Chi^2$**.''')
152+
st.markdown('''**We have one degree of freedom, hence the critical $Chi^2$ value for p-value = 0.05 is 3.84**''')
153+
chi_distrib = Image.open('pictures/Chi-distrib.png')
154+
st.image(chi_distrib)
155+
st.markdown('''The resulting p-value = 2.4e-23 is so small that it cannot be displayed on the chart''')
156+
st.markdown('''### Conclusion:round_pushpin::
157+
**Based on the p-value, we reject $H_0$ and we can say that the distribution of wins of the two sides is not uniform :arrow_right: and since
158+
the match data was collected randomly and independently of any influences, we can say that at least in patch 7.32d, the percentage of wins of the :green[Radiant] side is higher than :red[Dire].**''')
143159

160+
#1.2
161+
st.markdown('## Second approach: Gaussian approximation')
144162
st.markdown('''Binomial distribution is our case.
145163
\n
146164
Since our $n$ is large, we can approximate the binomial distribution with a Gaussian, and we can directly look up $z$-score in a
@@ -151,10 +169,13 @@ def show_predict_page():
151169
$n = 7362$, we can safely use a Gaussian approximation and calculate the z-score.''')
152170

153171
st.markdown(r'''
154-
# $z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$
172+
### $z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} = \frac{0.558 - 0.5}{\sqrt{\frac{0.5(1-0.5)}{7362}}} = 9.95$
155173
156174
where $\hat{p}$ is our estimated probability of 'radiant_win' and $p_0 = 0.5$''')
175+
st.markdown('''**In the table of the Gaussian distribution, we will not find such limiting z-values, so we calculate the p-value using scipy**''')
176+
st.code('p_value = scipy.stats.norm.sf(abs(z)) \n#this snippet returned: 1.22e-23', language='python')
177+
st.markdown('''### Conclusion:round_pushpin::
178+
**The p-value is much less than the threshold value of 0.05, and we can safely conclude that the probability of "radiant_win" is statistically significantly different from "dire_win".**''')
157179

158-
159-
with tab4:
160-
st.title('About model')
180+
# with tab4:
181+
# st.title('About model')

0 commit comments

Comments
 (0)