Interpretable Information Retrieval

This is an interpretable information retrieval research project that implements and evaluates RAG (Retrieval-Augmented Generation) systems using three interpretability methods:

Gradient-based methods - Using gradients to identify important tokens/sentences
Perturbation-based methods - Measuring impact by removing or modifying text elements
Surrogate-based methods - Using SHAP values to explain retrieval scores

The project evaluates these methods using the MS MARCO and SCIREX datasets.

Environment Setup

IMPORTANT: PyTorch must be installed from source first (it will not work properly with conda).

After PyTorch is installed, create the conda environment:

conda env create -f environment.yml

This creates an environment named interpretability_project with Python 3.10.18.

Activate the environment:

conda activate interpretability_project

Methods

1. Gradient-Based Method

Coming soon

2. Perturbation-Based Method

The permutation_method folder conaints the implementation and evaluation of permutation-based importance scoring. For methodology and results, see:

pertubation_method/README.md Overview of the formulation and motivation of permutation-based importance scoring
pertubation_method/permutation_scores_test.py Implementation of the pertubation based method and analysis in multiple different settings on two different datasets

3. SHAP-Based Surrogate Method

The shap_method/ folder contains the complete implementation and evaluation of the SHAP-based surrogate method. For detailed methodology, results, and analysis, see:

shap_method/README.md - Implementation documentation
shap_method/SHAP_METHOD_REPORT.md - Evaluation report for SHAP method
shap_method/notebooks/ - Jupyter notebooks with EDA and SHAP evaluation outputs

Note: Other methods will be added soon.

Dataset

MS MARCO v2.1 (microsoft/ms_marco) for retrieval evaluation
SCIREX for entity-level evaluation (used in SHAP method)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
figures		figures
gradient_method		gradient_method
permutation_method		permutation_method
shap_method		shap_method
.gitignore		.gitignore
README.md		README.md
data_analysis.ipynb		data_analysis.ipynb
environment.yml		environment.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretable Information Retrieval

Environment Setup

Methods

1. Gradient-Based Method

2. Perturbation-Based Method

3. SHAP-Based Surrogate Method

Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Interpretable Information Retrieval

Environment Setup

Methods

1. Gradient-Based Method

2. Perturbation-Based Method

3. SHAP-Based Surrogate Method

Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages