A Library for perform and visualize brain surgery on transformers using mechanistic interpretability library TransformerLens by Bryce Meyer and created by Neel Nanda
With TransformerSurgery you can ablate attentions in a Transfomer model using TransformerLens hooks. You can then generate text and compare it with the unablated model.
Create a virtual environment, then do:
pip install -r requirements.txt
To run the interactive app, just do:
streamlit run app.py
- load 6 different transformer models
- compare generated text for ablated and unablated models
- ablate attentions in any layer
- ablate head, residual stream or MLP
- only apply to a fixed position
- zero, double or flip attentions
- load and apply custom hooks
You can use your custom hooks. Just out your hooks in a new python file in folder custom_hooks
. The hooks need to fulfill the following conditions:
- the name of the hook has to start with
hook_
- the hook takes in a torch tensor and returns a torch tensor
- make sure that the hook respects the dimension of the attention (layer) that it is supposed to act on
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details