Skip to content

mathiisk/TST-Mechanistic-Interpretability

Repository files navigation

Transformer Time Series Interpretability Toolkit

This repository provides an end-to-end workflow for analysing Transformer-based time series classification (TSC) models through mechanistic interpretability methods. It contains ready-to-run notebooks, a modular training script and a collection of pre-trained models.

Author: Matiss Kalnare Supervisor: Niki van Stein

Repository Structure

Notebooks/             - Interactive notebooks demonstrating the two analysis pipelines
  Patching.ipynb       - Activation patching/causal tracing walkthrough
  SAE.ipynb            - Sparse Autoencoder exploration
  IPYNB_to_PY/         - Python script versions of the notebooks

Utilities/             - Helper code
  TST_trainer.py       - Training/evaluation script and model definition
  utils.py             - Patching and plotting utilities

TST_models/            - Pre-trained models for several datasets
SAE_models/            - Example sparse autoencoder weights
Results/               - Example results (plots, patched predictions, ...)
requirements.txt       - Python package requirements

Installation

  1. Clone the repository and install dependencies
    git clone https://github.com/mathiisk/TSTpatching.git
    cd TSTpatching
    pip install -r requirements.txt
    A GPU with CUDA is recommended but the code also runs on CPU.

Quick Start

Pre-trained weights for common datasets are provided in TST_models. You can immediately run the notebooks to reproduce the experiments.

Open the activation patching notebook:

jupyter notebook Notebooks/Patching.ipynb

or the sparse autoencoder notebook:

jupyter notebook Notebooks/SAE.ipynb

Step through the cells to load a model, run the analysis and display plots. The notebooks assume the working directory is the repository root.

Training a New Model

Utilities/TST_trainer.py can train a fresh Transformer on any dataset from timeseriesclassification.com.

python Utilities/TST_trainer.py --dataset DATASET_NAME --epochs NUM_EPOCHS --batch_size BATCH_SIZE
  • DATASET_NAME should match one of the names on the website, e.g. JapaneseVowels.
  • NUM_EPOCHS defaults to 100 if not provided.
  • BATCH_SIZE defaults to 32.

The resulting weights are stored as TST_<dataset>.pth under TST_models/.

Sparse Autoencoders

The notebook Notebooks/SAE.ipynb trains an autoencoder on intermediate activations of a Transformer. It highlights interpretable concepts that the model relies on. Pre-trained SAE weights are stored in SAE_models/ and can be loaded by the notebook.

Output & Results

All figures and intermediate outputs generated by the notebooks are stored under Results/ by default. Separate folders exist for each dataset so you can keep experiments organised.

BSc Thesis Context

This code base accompanies a Bachelor thesis exploring whether interpretability techniques from NLP, namely activation patching and sparse autoencoders, can reveal causal mechanisms inside Transformer-based time series classifiers. The provided scripts and notebooks allow anyone to reproduce and extend the experiments.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published