π Feature Request
Describe the feature you'd like
Identify and eliminate performance bottlenecks outside of custom attention
Why is it needed?
To get custom attention times within a small-ish factor of full attention
Additional context (optional)
see parent issue