Skip to content

Sunsvea/cl-sparse-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse Transformer Implementation

This repository contains a PyTorch implementation of a Transformer model with sparse attention patterns. The goal is to explore and implement various sparse attention mechanisms to improve the efficiency of transformer models while maintaining performance.

Current Features

  • Local sparse attention mechanism (window-based)
  • Configurable model architecture (layers, heads, dimensions)
  • Basic positional encoding
  • Simple training loop for sequence prediction
  • CPU support

Installation

# Clone the repository
git clone https://github.com/yourusername/sparse-transformer.git
cd sparse-transformer

# Install dependencies
pip install torch

Quick Start

# Run the example training script
python sparse_transformer.py

Output

When running the training script, you should see output similar to this:

2025-01-31 10:15:23,456 - INFO - Starting training...
2025-01-31 10:15:23,789 - INFO - Generated 1000 sample sequences...
2025-01-31 10:15:23,901 - INFO - Split data into 800 train and 200 validation sequences

2025-01-31 10:15:24,123 - INFO - Epoch 1/5
2025-01-31 10:15:24,456 - INFO - Batch 0, Loss: 4.6573
2025-01-31 10:15:24,789 - INFO - Batch 10, Loss: 4.3291
2025-01-31 10:15:25,012 - INFO - Training Loss: 4.2845
2025-01-31 10:15:25,234 - INFO - Validation Loss: 4.1932
2025-01-31 10:15:25,345 - INFO - Saved new best model checkpoint
2025-01-31 10:15:25,456 - INFO - Epoch completed in 1.33s

[...]

2025-01-31 10:15:35,678 - INFO - Training completed in 12.22s
2025-01-31 10:15:35,789 - INFO - Best validation loss: 3.2456

The model saves checkpoints to ./checkpoints/ whenever the validation loss improves.

Architecture

The current implementation includes:

  • LocalSparseAttention: Implements window-based sparse attention where each token attends only to its neighbors
  • SparseTransformerBlock: A single transformer block with sparse attention
  • SparseTransformer: The full model with embedding layer and multiple transformer blocks

TODO List

Phase 1: Core Functionality

  • Add proper data loading and preprocessing
  • Implement validation loop
  • Add model checkpointing
  • Add logging and metrics tracking
  • Write unit tests for core components

Phase 2: Sparse Attention Patterns

  • Implement strided sparse attention
  • Add block sparse attention
  • Implement learned sparsity patterns
  • Create dynamic/adaptive sparsity mechanisms

Phase 3: Optimizations

  • Optimize sparse matrix operations
  • Add mixed precision training
  • Implement gradient checkpointing
  • Add multi-GPU support
  • Optimize memory usage

Phase 4: Analysis & Visualization

  • Add attention pattern visualization
  • Create performance benchmarking suite
  • Add sparsity pattern analysis tools
  • Implement attention head importance analysis
  • Create training dynamics visualization

Phase 5: Documentation & Examples

  • Add detailed API documentation
  • Create Jupyter notebook tutorials
  • Add example configurations
  • Write contribution guidelines
  • Create performance comparison benchmarks

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Citation

If you use this code in your research, please cite:

@software{sparse_transformer2025,
  author = {Dean Coulstock},
  title = {Sparse Transformer Implementation},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/Sunsvea/sparse-transformer}
}

Contact

Acknowledgments

This implementation draws inspiration from:

  • "Generating Long Sequences with Sparse Transformers" (Child et al., 2019)
  • "Longformer: The Long-Document Transformer" (Beltagy et al., 2020)

About

A CL (coulstock) sparse transformer implementation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages