Spectron: Super-Linear Spectral Attention for Efficient Long-Context Language Modeling. Replaces O(n²) self-attention with IFFT(W ⊙ FFT(x)) — 15x faster, 1000x less memory, 10M+ context window.
-
Updated
May 19, 2026 - Python
Spectron: Super-Linear Spectral Attention for Efficient Long-Context Language Modeling. Replaces O(n²) self-attention with IFFT(W ⊙ FFT(x)) — 15x faster, 1000x less memory, 10M+ context window.
TESM (Token-Entangled State Machine) - A novel LLM architecture based on state entanglement theory, combining State Space Models with local entanglement mechanisms for efficient long-sequence modeling
GLACIER: Mamba with infinite memory. This project integrates the Mamba SSM with ICE-Lite, a virtual memory engine, to solve context rot. By adding persistent, time-aware memory, GLACIER gives Mamba the long-term recall of a Transformer while retaining its $O(N)$ speed. Apache 2.0 licensed, by Dopove.
CRSD is a minimal, research-oriented sequence modeling framework built from scratch to explore state-space models (SSMs) and sequence-to-sequence architectures in PyTorch. It’s designed to be fully reproducible, interpretable, and extensible — suitable both for learning and for building experimental variants such as nonlinear-SSMs, gated decoders
Add a description, image, and links to the transformer-alternative topic page so that developers can more easily learn about it.
To associate your repository with the transformer-alternative topic, visit your repo's landing page and select "manage topics."