dos-diversity: Examining gender representation in American diplomacy

Objective

This project utilizes a data extraction, cleaning, and analysis pipeline to to examine and visualize gender representation in the State Department directories in 1965, 1982, and 2022. I am most interested in the percentage of female officers overtime.

While this project only examines the State Department at three points in time, it can easily be scaled to analyze the State Department at a yearly granularity from 1965 to 2022.

Data

My data source is the State Department Key Officers of Foreign Service Posts series of documents, publicly published from 1965 to 2022. The documents list the assigned officers and their positions for each US embassy.

The data to be extracted generally takes the form of 'RANK(:) First M. Last Name'. Two excerpts of the data from 1965 and 2022 are displayed below. Find these PDFs in the inst folder.

Methods

In code.ipynb I design a data pipeline that completes the following tasks:

Imports and converts the directory PDF into text using the PyMuPDF PDF manipulation package.
Filters out information that does not include officer ranks or names, such as country names and page numbers, using text analysis tools such as regex.
Extracts officer rank and names, then uses name_parser package to extract first names.
Uses the gender_guesser first-name gender classification package to classify officer names.

Results

The percentage of female officers at the State Department has increased by 30%, from only 3% in 1965 to nearly 35% in 2022. While this is still much lower than the global representation of female population, which sits at nearly 50%, this demonstrates a significant improvement in gender equality in the State Department.

Notes

Required Python packages include:

pandas
re
pymupdf
nameparser
gender_guesser
matplotlib
seaborn

This project is a replication of fp21's DOS diversity project, which examines the gender and racial diversity of State Department officers between 1965 to 2022.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
clean		clean
images		images
inst		inst
out		out
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
code.ipynb		code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dos-diversity: Examining gender representation in American diplomacy

Objective

Data

Methods

Results

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dos-diversity: Examining gender representation in American diplomacy

Objective

Data

Methods

Results

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages