Bias Evaluation Metrics Documentation

This project provides implementations and methodologies for evaluating bias in natural language processing (NLP) models using the following metrics:

Embedding-Based

WEAT (Word Embedding Association Test)
SEAT (Sentence Embedding Association Test)
Embedding-Based CEAT
Categorical Bias Score (CBS)

Probability-Based

DisCo (Distributional Correspondence Test)
LBPS Metric
CrowS-Pairs Score (CPS)
All Unmasked Likelihood (AUL)

Generation-Text Based

Social Group Substitution Test
Co-occurrence Bias Test
Demographic Representation
Stereotypical Association
Score Parity
Counterfactual Sentiment
HONEST
Gender Polarity

Overview

This repository contains tools for bias evaluation in pre-trained language models. These metrics allow researchers and practitioners to detect and quantify biases related to social group representation in natural language processing.

WEAT: Word Embedding Association Test

The WEAT metric measures bias by taking two sets of target words and two sets of attribute words and measuring the associations between them.

This link has the glove.6B.50d.txt file which is crucial for the WEAT metric: https://drive.google.com/file/d/1L3_k2SwdAyB3YPzlnzDrscjOjcv9jj0E/view?usp=sharing

Steps

Prepare target and attribute word sets.
Generate word embeddings using a pre-trained language model.
Calculate association scores using cosine similarity.
Calculate WEAT score with the difference of summation for every association for each target word.

To run the WEAT test for the Word2Vec model, download the file from this link and put the GoogleNews-vectors-negative300.bin file in the same folder as the weat.py file: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/view?usp=sharing&resourcekey=0-wjGZdNAUop6WykTtMip30g

SEAT: Sentence Embedding Association Test

The SEAT metric measures biases in sentence embeddings by adapting the Implicit Association Test (IAT). It computes associations between two target sets (e.g., gendered terms) and two attribute sets (e.g., career- and family-related terms).

Steps

Prepare target and attribute word sets.
Generate sentence embeddings using a pre-trained language model.
Calculate association scores using cosine similarity.
Conduct statistical tests (e.g., t-tests) to determine bias significance.

CEAT: Contextual Embedding Association Test

The CEAT metric evaluates biases by analyzing the associations between demographic terms and attributes in contextualized embeddings.

Steps

Prepare target (demographic) and attribute word sets (e.g., gendered names and career/family terms).
Generate word embeddings using a pre-trained language model. 3.Calculate association scores between target and attribute words using cosine similarity.
Compute the CEAT score based on the difference in mean association strengths across groups. 5.Conduct statistical tests (e.g., t-tests) to assess the significance of observed differences.

DisCo: Distributional Correspondence Test

The DisCo metric evaluates biases based on the contextual distribution of terms. It compares how distributions of contextualized embeddings for different groups differ.

Steps

Collect context sentences for terms across groups (e.g., male vs. female).
Compute contextualized embeddings for each term using a language model.
Measure divergence between distributions using metrics like KL divergence or Wasserstein distance.

LBPS: Likelihood-Based Parity Score

The LBPS metric evaluates biases by analyzing the likelihoods assigned to gender-associated terms in masked language modeling tasks.

Steps

Prepare prompts with a [MASK] token representing the word to be predicted.
Define male- and female-associated terms (e.g., "man," "woman," "he," "she").
Use a pre-trained language model to generate logits for the [MASK] token.
Compute probabilities for male- and female-associated terms using the logits.
Calculate the bias score as the absolute difference between male and female probabilities.

CrowS-Pairs Score (CPS)

The CPS metric measures bias through the use of two sentences, one of which contains bias, and another one which contains less bias. These sentences are stored in the CrowS-Pairs dataset, and models are evaluated by determining which sentence is more likely to appear in that specific model.

Steps

Have the CrowS-Pairs dataset ready to use.
For each model, go through the CrowS-Pairs dataset and get the probability for each sentence.
Measure CPS score by comparing the probability of the more biased sentence appearing in the model with the probability of the less biased sentence appearing.

Social Group Substitution Test

The Social Group Substitution Test identifies bias by substituting terms related to different social groups (e.g., gender, race) in identical contexts and observing model output differences.

Steps

Define sentences with social group terms (e.g., man/woman).
Replace terms systematically across sentences.
Measure changes in logits, probabilities, or other outputs from the language model.
Analyze differences to detect bias.

Co-Occurrence Bias Test

The Co-Occurrence Bias Test identifies bias by examining the co-occurrence of gendered terms and occupation-related terms in a corpus.

Steps

Parse a corpus to find sentences containing gendered and occupation terms.
Extract embeddings for co-occurring terms using a language model.
Compute cosine similarity between embeddings.
Aggregate results and calculate bias scores as the difference in similarity for male- and female-associated terms.

Demographic Representation

The Demographic Representation metric evaluates biases by analyzing the representation of demographic groups in text generated by language models.

Steps

Prepare prompts designed to elicit responses related to specific roles or activities (e.g., "Describe a typical day for a teacher").
Use a pre-trained language model to generate responses for each prompt.
Analyze the generated text for occurrences of male- and female-associated keywords (e.g., "he," "she," "man," "woman").
Count the frequency of demographic terms in the generated outputs.
Evaluate representation disparities by comparing counts across demographic groups.

Stereotypical Association

The Stereotypical Association metric measures bias by checking if a model is more likely to generate text speicifc to a social group if a stereotypical term is present, e.g. how likely the model is to generate the word "she" if the word "nurse" is used in the prompt.

Steps

Set up a list of prompts to generate text from.
Generate text using a model.
Calculate how many times a word related to the tested social group is generated for each prompt.
Calculate the percentage for each social group and calculate the association score using absolute value.

Score Parity

The Score Parity metric evaluates biases by comparing sentiment scores across demographic groups in text classification tasks.

Steps

Prepare a dataset containing text samples and associated demographic information (e.g., "male," "female").
Use a pre-trained sentiment analysis model to predict sentiment scores for each text sample.
Group sentiment scores by demographic and calculate the average score for each group.
Perform statistical tests (e.g., t-tests) to assess the significance of differences in scores between groups.
Analyze and report any disparities in sentiment predictions across demographics.

HONEST

The HONEST metric measures bias by using the HurtLex lexicon and finding the number of times a hurtful word is generated by the text. The bias is measured with how likely a model is to generate a sentence, phrase, or word in the HurtLex lexicon.

Steps

Have the HurtLex lexicon ready to use.
Set up a list of prompts related to social groups to generate text from.
Generate text using a model.
Calculate how many times a word in the HurtLex lexicon is generated for each prompt and find the percentage for each model.

Usage

Follow these steps to run the code and calculate the SEAT (Sentence Embedding Association Test) score:

Prepare Your Data:

Place the JSON file (seat_data.json) containing target and attribute sentences in the json/ directory.

Example format of seat_data.json:

{
    "targ1": {
        "examples": ["white female name 1", "white female name 2"]
    },
    "targ2": {
        "examples": ["black female name 1", "black female name 2"]
    },
    "attr1": {
        "examples": ["positive word 1", "positive word 2"]
    },
    "attr2": {
        "examples": ["negative word 1", "negative word 2"]
    }
}

Running SEAT:
```
python seat.py
```

Installing Dependencies

Installing Dependencies:

 pip install sentence-transformers numpy scipy

Contributing

Erick Pelaez Puig, Valeria Giraldo, Jeslyn Chacko, Elnaz Toreihi.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
data		data
tests		tests
CEAT.py		CEAT.py
GTB.py		GTB.py
LBPS.py		LBPS.py
README.md		README.md
aul.py		aul.py
cbs.py		cbs.py
co_occurence.py		co_occurence.py
counterfactual_sentiment.py		counterfactual_sentiment.py
cps.py		cps.py
disco.py		disco.py
gender_polarity.py		gender_polarity.py
honest.py		honest.py
score_parity.py		score_parity.py
seat.py		seat.py
social_group_sub.py		social_group_sub.py
stereotypical_association.py		stereotypical_association.py
weat.py		weat.py

etoreihi/MetricTesting

Folders and files

Latest commit

History

Repository files navigation

Bias Evaluation Metrics Documentation

Embedding-Based

Probability-Based

Generation-Text Based

Table of Contents

Overview

WEAT: Word Embedding Association Test

Steps

SEAT: Sentence Embedding Association Test

Steps

CEAT: Contextual Embedding Association Test

Steps

DisCo: Distributional Correspondence Test

Steps

LBPS: Likelihood-Based Parity Score

Steps

CrowS-Pairs Score (CPS)

Steps

Social Group Substitution Test

Steps

Co-Occurrence Bias Test

Steps

Demographic Representation

Steps

Stereotypical Association

Steps

Score Parity

Steps

HONEST

Steps

Usage

Installing Dependencies

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages