This project provides implementations and methodologies for evaluating bias in natural language processing (NLP) models using the following metrics:
- WEAT (Word Embedding Association Test)
- SEAT (Sentence Embedding Association Test)
- Embedding-Based CEAT
- Categorical Bias Score (CBS)
- DisCo (Distributional Correspondence Test)
- LBPS Metric
- CrowS-Pairs Score (CPS)
- All Unmasked Likelihood (AUL)
- Social Group Substitution Test
- Co-occurrence Bias Test
- Demographic Representation
- Stereotypical Association
- Score Parity
- Counterfactual Sentiment
- HONEST
- Gender Polarity
- Overview
- Embedding-Based
- Probability-Based
- Generation-Text Based
- Usage
- Installing Dependencies
- Contributing
- License
This repository contains tools for bias evaluation in pre-trained language models. These metrics allow researchers and practitioners to detect and quantify biases related to social group representation in natural language processing.
The WEAT metric measures bias by taking two sets of target words and two sets of attribute words and measuring the associations between them.
This link has the glove.6B.50d.txt file which is crucial for the WEAT metric: https://drive.google.com/file/d/1L3_k2SwdAyB3YPzlnzDrscjOjcv9jj0E/view?usp=sharing
- Prepare target and attribute word sets.
- Generate word embeddings using a pre-trained language model.
- Calculate association scores using cosine similarity.
- Calculate WEAT score with the difference of summation for every association for each target word.
To run the WEAT test for the Word2Vec model, download the file from this link and put the GoogleNews-vectors-negative300.bin file in the same folder as the weat.py file: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/view?usp=sharing&resourcekey=0-wjGZdNAUop6WykTtMip30g
The SEAT metric measures biases in sentence embeddings by adapting the Implicit Association Test (IAT). It computes associations between two target sets (e.g., gendered terms) and two attribute sets (e.g., career- and family-related terms).
- Prepare target and attribute word sets.
- Generate sentence embeddings using a pre-trained language model.
- Calculate association scores using cosine similarity.
- Conduct statistical tests (e.g., t-tests) to determine bias significance.
The CEAT metric evaluates biases by analyzing the associations between demographic terms and attributes in contextualized embeddings.
- Prepare target (demographic) and attribute word sets (e.g., gendered names and career/family terms).
- Generate word embeddings using a pre-trained language model. 3.Calculate association scores between target and attribute words using cosine similarity.
- Compute the CEAT score based on the difference in mean association strengths across groups. 5.Conduct statistical tests (e.g., t-tests) to assess the significance of observed differences.
The DisCo metric evaluates biases based on the contextual distribution of terms. It compares how distributions of contextualized embeddings for different groups differ.
- Collect context sentences for terms across groups (e.g., male vs. female).
- Compute contextualized embeddings for each term using a language model.
- Measure divergence between distributions using metrics like KL divergence or Wasserstein distance.
The LBPS metric evaluates biases by analyzing the likelihoods assigned to gender-associated terms in masked language modeling tasks.
- Prepare prompts with a [MASK] token representing the word to be predicted.
- Define male- and female-associated terms (e.g., "man," "woman," "he," "she").
- Use a pre-trained language model to generate logits for the [MASK] token.
- Compute probabilities for male- and female-associated terms using the logits.
- Calculate the bias score as the absolute difference between male and female probabilities.
The CPS metric measures bias through the use of two sentences, one of which contains bias, and another one which contains less bias. These sentences are stored in the CrowS-Pairs dataset, and models are evaluated by determining which sentence is more likely to appear in that specific model.
- Have the CrowS-Pairs dataset ready to use.
- For each model, go through the CrowS-Pairs dataset and get the probability for each sentence.
- Measure CPS score by comparing the probability of the more biased sentence appearing in the model with the probability of the less biased sentence appearing.
The Social Group Substitution Test identifies bias by substituting terms related to different social groups (e.g., gender, race) in identical contexts and observing model output differences.
- Define sentences with social group terms (e.g., man/woman).
- Replace terms systematically across sentences.
- Measure changes in logits, probabilities, or other outputs from the language model.
- Analyze differences to detect bias.
The Co-Occurrence Bias Test identifies bias by examining the co-occurrence of gendered terms and occupation-related terms in a corpus.
- Parse a corpus to find sentences containing gendered and occupation terms.
- Extract embeddings for co-occurring terms using a language model.
- Compute cosine similarity between embeddings.
- Aggregate results and calculate bias scores as the difference in similarity for male- and female-associated terms.
The Demographic Representation metric evaluates biases by analyzing the representation of demographic groups in text generated by language models.
- Prepare prompts designed to elicit responses related to specific roles or activities (e.g., "Describe a typical day for a teacher").
- Use a pre-trained language model to generate responses for each prompt.
- Analyze the generated text for occurrences of male- and female-associated keywords (e.g., "he," "she," "man," "woman").
- Count the frequency of demographic terms in the generated outputs.
- Evaluate representation disparities by comparing counts across demographic groups.
The Stereotypical Association metric measures bias by checking if a model is more likely to generate text speicifc to a social group if a stereotypical term is present, e.g. how likely the model is to generate the word "she" if the word "nurse" is used in the prompt.
- Set up a list of prompts to generate text from.
- Generate text using a model.
- Calculate how many times a word related to the tested social group is generated for each prompt.
- Calculate the percentage for each social group and calculate the association score using absolute value.
The Score Parity metric evaluates biases by comparing sentiment scores across demographic groups in text classification tasks.
- Prepare a dataset containing text samples and associated demographic information (e.g., "male," "female").
- Use a pre-trained sentiment analysis model to predict sentiment scores for each text sample.
- Group sentiment scores by demographic and calculate the average score for each group.
- Perform statistical tests (e.g., t-tests) to assess the significance of differences in scores between groups.
- Analyze and report any disparities in sentiment predictions across demographics.
The HONEST metric measures bias by using the HurtLex lexicon and finding the number of times a hurtful word is generated by the text. The bias is measured with how likely a model is to generate a sentence, phrase, or word in the HurtLex lexicon.
- Have the HurtLex lexicon ready to use.
- Set up a list of prompts related to social groups to generate text from.
- Generate text using a model.
- Calculate how many times a word in the HurtLex lexicon is generated for each prompt and find the percentage for each model.
Follow these steps to run the code and calculate the SEAT (Sentence Embedding Association Test) score:
-
Prepare Your Data:
- Place the JSON file (
seat_data.json) containing target and attribute sentences in thejson/directory.
Example format of
seat_data.json:{ "targ1": { "examples": ["white female name 1", "white female name 2"] }, "targ2": { "examples": ["black female name 1", "black female name 2"] }, "attr1": { "examples": ["positive word 1", "positive word 2"] }, "attr2": { "examples": ["negative word 1", "negative word 2"] } } - Place the JSON file (
-
Running SEAT:
python seat.py
Installing Dependencies:
pip install sentence-transformers numpy scipyErick Pelaez Puig, Valeria Giraldo, Jeslyn Chacko, Elnaz Toreihi.