This repository provides an end-to-end workflow for analysing Transformer-based time series classification (TSC) models through mechanistic interpretability methods. It contains ready-to-run notebooks, a modular training script and a collection of pre-trained models.
Author: Matiss Kalnare Supervisor: Niki van Stein
Notebooks/ - Interactive notebooks demonstrating the two analysis pipelines
Patching.ipynb - Activation patching/causal tracing walkthrough
SAE.ipynb - Sparse Autoencoder exploration
IPYNB_to_PY/ - Python script versions of the notebooks
Utilities/ - Helper code
TST_trainer.py - Training/evaluation script and model definition
utils.py - Patching and plotting utilities
TST_models/ - Pre-trained models for several datasets
SAE_models/ - Example sparse autoencoder weights
Results/ - Example results (plots, patched predictions, ...)
requirements.txt - Python package requirements
- Clone the repository and install dependencies
A GPU with CUDA is recommended but the code also runs on CPU.
git clone https://github.com/mathiisk/TSTpatching.git cd TSTpatching pip install -r requirements.txt
Pre-trained weights for common datasets are provided in TST_models. You can immediately run the notebooks to reproduce the experiments.
Open the activation patching notebook:
jupyter notebook Notebooks/Patching.ipynbor the sparse autoencoder notebook:
jupyter notebook Notebooks/SAE.ipynbStep through the cells to load a model, run the analysis and display plots. The notebooks assume the working directory is the repository root.
Utilities/TST_trainer.py can train a fresh Transformer on any dataset from timeseriesclassification.com.
python Utilities/TST_trainer.py --dataset DATASET_NAME --epochs NUM_EPOCHS --batch_size BATCH_SIZEDATASET_NAMEshould match one of the names on the website, e.g.JapaneseVowels.NUM_EPOCHSdefaults to100if not provided.BATCH_SIZEdefaults to32.
The resulting weights are stored as TST_<dataset>.pth under TST_models/.
The notebook Notebooks/SAE.ipynb trains an autoencoder on intermediate activations of a Transformer. It highlights interpretable concepts that the model relies on. Pre-trained SAE weights are stored in SAE_models/ and can be loaded by the notebook.
All figures and intermediate outputs generated by the notebooks are stored under Results/ by default. Separate folders exist for each dataset so you can keep experiments organised.
This code base accompanies a Bachelor thesis exploring whether interpretability techniques from NLP, namely activation patching and sparse autoencoders, can reveal causal mechanisms inside Transformer-based time series classifiers. The provided scripts and notebooks allow anyone to reproduce and extend the experiments.