TransformerSurgery

A Library for perform and visualize brain surgery on transformers using mechanistic interpretability library TransformerLens by Bryce Meyer and created by Neel Nanda

With TransformerSurgery you can ablate attentions in a Transfomer model using TransformerLens hooks. You can then generate text and compare it with the unablated model.

Quick Start

Install

Create a virtual environment, then do:

pip install -r requirements.txt

Use

To run the interactive app, just do:

streamlit run app.py

Features

load 6 different transformer models
compare generated text for ablated and unablated models
ablate attentions in any layer
ablate head, residual stream or MLP
only apply to a fixed position
zero, double or flip attentions
load and apply custom hooks

Custom hooks

You can use your custom hooks. Just out your hooks in a new python file in folder custom_hooks. The hooks need to fulfill the following conditions:

the name of the hook has to start with hook_
the hook takes in a torch tensor and returns a torch tensor
make sure that the hook respects the dimension of the attention (layer) that it is supposed to act on

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
custom_hooks		custom_hooks
static		static
transformer_surgery		transformer_surgery
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransformerSurgery

Quick Start

Install

Use

Features

Custom hooks

About

Releases

Packages

Languages

License

straeter/TransformerSurgery

Folders and files

Latest commit

History

Repository files navigation

TransformerSurgery

Quick Start

Install

Use

Features

Custom hooks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages