Skip to content

Commit

Permalink
Merge pull request #128 from stanfordnlp/frankaging-patch-1
Browse files Browse the repository at this point in the history
[Minor] Update README.md
  • Loading branch information
frankaging authored Mar 13, 2024
2 parents 4ac51e4 + 134dd44 commit 96db4e9
Showing 1 changed file with 13 additions and 19 deletions.
32 changes: 13 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<br />
<div align="center">
<h1 align="center"><img src="https://i.ibb.co/BNkhQH3/pyvene-logo.png"></h1>
<a href="https://nlp.stanford.edu/~wuzhengx/"><strong>Library Paper and Doc Are Forthcoming »</strong></a>
<a href="https://arxiv.org/abs/2403.07809"><strong>Read Our Paper »</strong></a>
</div>

<br />
Expand Down Expand Up @@ -241,6 +241,18 @@ intervenable.train_alignment(
```
where you need to pass in a trainable dataset, and your customized loss and metrics function. The trainable interventions can later be saved on to your disk. You can also use `intervenable.evaluate()` your interventions in terms of customized objectives.

## Citation
Library paper is forthcoming. For now, if you use this repository, please consider to cite relevant papers:
```stex
@article{wu2024pyvene,
title={pyvene: A Library for Understanding and Improving {P}y{T}orch Models via Interventions},
author={Wu, Zhengxuan and Geiger, Atticus and Arora, Aryaman and Huang, Jing and Wang, Zheng and Noah D. Goodman and Christopher D. Manning and Christopher Potts},
booktitle={arXiv:2403.07809},
url={arxiv.org/abs/2403.07809},
year={2024}
}
```

## Related Works in Discovering Causal Mechanism of LLMs
If you would like to read more works on this area, here is a list of papers that try to align or discover the causal mechanisms of LLMs.
- [Causal Abstractions of Neural Networks](https://arxiv.org/abs/2106.02997): This paper introduces interchange intervention (a.k.a. activation patching or causal scrubbing). It tries to align a causal model with the model's representations.
Expand All @@ -253,21 +265,3 @@ If you would like to read more works on this area, here is a list of papers that
## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=stanfordnlp/pyvene&type=Date)](https://star-history.com/#stanfordnlp/pyvene&Date)

## Citation
Library paper is forthcoming. For now, if you use this repository, please consider to cite relevant papers:
```stex
@article{geiger-etal-2023-DAS,
title={Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations},
author={Geiger, Atticus and Wu, Zhengxuan and Potts, Christopher and Icard, Thomas and Goodman, Noah},
year={2023},
booktitle={arXiv}
}
@article{wu-etal-2023-Boundless-DAS,
title={Interpretability at Scale: Identifying Causal Mechanisms in Alpaca},
author={Wu, Zhengxuan and Geiger, Atticus and Icard, Thomas and Potts, Christopher and Goodman, Noah},
year={2023},
booktitle={NeurIPS}
}
```

0 comments on commit 96db4e9

Please sign in to comment.