This Python project scrapes article data from the CERN Courier website, analyses the sex of the authors by their name, and generates a bar plot with the sex distribution.
- Clone the repository:
git clone https://github.com/cerncourier/disparity-analysis.git
cd disparity-analysis
- Install the required packages:
pip install -r requirements.txt
Run the script:
python3 thread_main.py
This will:
- Scrape article links from the CERN Courier website.
- Fetch the title, date, and author of each article.
- Determine the sex of the authors using the gender-guesser library.
- Generate a CSV file with the processed data.
- Create and save a bar chart showing the sex distribution.
Distributed under the MIT Licence.