Skip to content

BrennanLagasse/Interpretable-Information-Retrieval

Repository files navigation

Interpretable Information Retrieval

This is an interpretable information retrieval research project that implements and evaluates RAG (Retrieval-Augmented Generation) systems using three interpretability methods:

  1. Gradient-based methods - Using gradients to identify important tokens/sentences
  2. Perturbation-based methods - Measuring impact by removing or modifying text elements
  3. Surrogate-based methods - Using SHAP values to explain retrieval scores

The project evaluates these methods using the MS MARCO and SCIREX datasets.

Environment Setup

IMPORTANT: PyTorch must be installed from source first (it will not work properly with conda).

After PyTorch is installed, create the conda environment:

conda env create -f environment.yml

This creates an environment named interpretability_project with Python 3.10.18.

Activate the environment:

conda activate interpretability_project

Methods

1. Gradient-Based Method

Coming soon

2. Perturbation-Based Method

The permutation_method folder conaints the implementation and evaluation of permutation-based importance scoring. For methodology and results, see:

3. SHAP-Based Surrogate Method

The shap_method/ folder contains the complete implementation and evaluation of the SHAP-based surrogate method. For detailed methodology, results, and analysis, see:

Note: Other methods will be added soon.

Dataset

  • MS MARCO v2.1 (microsoft/ms_marco) for retrieval evaluation
  • SCIREX for entity-level evaluation (used in SHAP method)

About

Propose, implement, evaluate three novel metrics for approximating importance of words and sentences for the retrieval of document in RAG setting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors