A structured portfolio of extra activities, empirical practice problems, and Python mini-projects for:
- Statistics 1: Descriptive & Foundational Methods
- Statistics 2: Inferential & Probability Models
Explore how theoretical statistics are brought to life in spreadsheets and Python—from hand-calculated frequency tables to advanced algorithmic simulations.
- Descriptive Statistics: Modeling of mean, median, mode, variance, and frequency distributions
- Probability & Simulations: Custom Monte Carlo frameworks—algorithmic dice rolls & randomness
- Theoretical Probability Models: Normal distribution fits on real datasets, empirical rule checks
- Central Limit Theorem (CLT): Sampling engine proving asymptotic normality
- Bivariate Relationships: Covariance matrix & scatter analysis on financial ratios
- Spreadsheet Modeling: Microsoft Excel (advanced formulas, Data Analysis Toolpak)
- Python/Notebooks: Google Colab, Jupyter (.ipynb)
- Core Libraries:
numpy,pandas,matplotlib,scipy.stats,openpyxl
basic-statistics-projects/
├── Statistics 1 — Descriptive Foundations/
│ ├── STATISTICS-1-EXTRA_ACTIVITY_2/ # Frequency distributions
│ ├── STATISTICS-1-EXTRA_ACTIVITY_3/ # Central tendency
│ └── STATISTICS-1-EXTRA_ACTIVITY_4/ # Categorical/continuous plotting
│
└── Statistics 2 — Probability & Inference/
├── STATISTICS-2-EXTRA_ACTIVITY_1/ # Netflix dataset exploration
├── STATISTICS-2-EXTRA_ACTIVITY_2/ # Monte Carlo Dice Simulation
├── STATISTICS-2-EXTRA_ACTIVITY_3/ # Corporate Ratio Covariance
├── STATISTICS-2-EXTRA_ACTIVITY_4/ # Normal Curve Profit Analysis
└── STATISTICS-2-EXTRA_ACTIVITY_5/ # Central Limit Theorem (CLT)
- Concept: Law of Large Numbers (LLN)
- Implementation: Python
.ipynb, stochastic dice-rolls, live distribution plots
-
Concept: Correlation & Covariance (
$r$ ,$\sigma_{XY}$ ) - Implementation: Balance-sheet ratios (Current, Quick, Cash), scatter-matrix viz
-
Concept: Gaussian fit, Z-scores
$$f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$ - Implementation: Google quarterly profits fit to Normal, probability densities of future returns
- Concept: Sampling means, limit theorem
-
Implementation: Python engine simulating non-normal populations; batch sampling proves readout is normal as size
$n\uparrow$
A. Excel Workspace
- Open relevant
STATISTICS-*activity folder, download.xlsx, and explore formulas.
B. Python & Jupyter
pip install numpy pandas matplotlib scipy openpyxl notebook
jupyter notebookAlternatively: Drop .ipynb into Google Colab for instant cloud execution.
- Hands-on descriptive and inferential stats via spreadsheet & code
- Real financial & business data translation to statistical models
- Visualization, reporting, and result interpretation for research portfolios
Saloni Tiwari
🎓 IIT Madras BS Data Science
🎓 B.Sc Mathematics
If you found this portfolio useful, please ⭐ on GitHub!