A comprehensive data science project analyzing the development of the COVID-19 pandemic using datasets provided by Statistik Austria.
Authors: Yasin Sahin and Sven Oberwalder
This project explores COVID-19-related statistics, focusing on vaccination, recovery, and demographic analysis. The data was cleaned, prepared, and visualized to uncover patterns and correlations.
- Dataset 1: Contains demographic details by political district.
- Dataset 2: Includes economic and educational details by federal states.
- Clone the repository:
git clone https://github.com/Sormy23/DataScience_CoronaStatistics.git cd DataScience_CoronaStatistics
- Install required libraries:
pip install pandas numpy matplotlib seaborn
- Ensure datasets are in the
data
folder.
- Data Cleaning: Removed duplicates, handled null values, and mapped coded data to meaningful names.
- Data Transformation: Converted numerical, nominal, and normalized values for analysis.
- Visualizations:
- Heatmaps
- Pairplots
- Boxplots and Violin Plots
- Histograms and KDEs
Run analysis scripts in Datascience_CoronaStatistics.ipynb
for detailed visualizations.
Key graphs include:
- Correlation heatmaps to detect patterns.
- Bar charts and boxplots analyzing vaccination rates by demographics.
- KDE plots for density estimations.
- Correlation between education and vaccination rates was identified.
- Economic status strongly influences vaccination likelihood.
- Age and vaccination patterns suggest demographic-specific trends.
This project is licensed under the MIT License.