Skip to content

Commit eebbcd3

Browse files
committed
change of plans
1 parent 7f404dc commit eebbcd3

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Implementation of <a href="https://arxiv.org/abs/2203.07852">Block Recurrent Tra
66

77
This design is SOTA for recurrent transformers line of research, afaict.
88

9-
It will also include <a href="https://arxiv.org/abs/2205.14135">flash attention</a> as well as <a href="https://arxiv.org/abs/2203.08913">KNN attention layers</a>
9+
It will also include <a href="https://arxiv.org/abs/2205.14135">flash attention</a> as well as routed memories of up to 250k tokens using ideas from <a href="https://github.com/lucidrains/CoLT5-attention">this paper</a>
1010

1111
## Appreciation
1212

@@ -70,7 +70,7 @@ $ python train.py
7070
- [x] add <a href="https://github.com/lucidrains/compressive-transformer-pytorch">compressed memories</a>
7171

7272
- [ ] revisit <a href="https://github.com/lucidrains/memformer">memformer</a>
73-
- [ ] add ability to gate in memorizing transformers knn attention layers
73+
- [ ] try routing long distance memories of up to 250k using coordinate descent (Wright et al.)
7474

7575
## Citations
7676

@@ -111,5 +111,12 @@ $ python train.py
111111
}
112112
```
113113

114+
```bibtex
115+
@inproceedings{Ainslie2023CoLT5FL,
116+
title = {CoLT5: Faster Long-Range Transformers with Conditional Computation},
117+
author = {Joshua Ainslie and Tao Lei and Michiel de Jong and Santiago Ontan'on and Siddhartha Brahma and Yury Zemlyanskiy and David Uthus and Mandy Guo and James Lee-Thorp and Yi Tay and Yun-Hsuan Sung and Sumit Sanghai},
118+
year = {2023}
119+
}
120+
```
114121

115122
*Memory is Attention through Time* - Alex Graves

0 commit comments

Comments
 (0)