Stock market forecasting using Frozen Pretrained Transformers (FPT). This project explores leveraging pretrained Large Language Models (LLMs) as feature extractors for time series prediction on Indian stock market indices.
This project implements a novel approach to stock price forecasting by:
- Freezing most parameters of pretrained LLM backbones
- Fine-tuning only layer normalization and embedding layers
- Using patch-based input embeddings to convert time series data into a format suitable for transformer architectures
The approach is inspired by the observation that pretrained transformers learn general-purpose representations that can transfer to domains beyond natural language.
┌─────────────────────────────────────────────────────────────────┐
│ Input Time Series │
│ (seq_len=60 time steps) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Instance Normalization │
│ (zero mean, unit variance per sample) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Patch Embedding │
│ (split into 6 patches of size 10, project to d_model) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ + Positional Embedding │
│ (learnable, 6 positions) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Frozen LLM Backbone │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Transformer Layers (weights frozen) │ │
│ │ Layer Norms (fine-tuned) ✓ │ │
│ │ Embeddings (fine-tuned) ✓ │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Output Projection │
│ (flatten → linear → prediction) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Reverse Normalization │
│ (restore original scale) │
└─────────────────────────────────────────────────────────────────┘
│
▼
Forecast Output
| Model | Parameters | Architecture | Reference |
|---|---|---|---|
| GPT-2 | 124M | Decoder-only, causal attention | Radford et al. (2019) |
| BERT | 110M | Encoder-only, bidirectional | Devlin et al. (2019) |
| XLNet | 110M | Permutation LM (AR + AE) | Yang et al. (2019) |
| ALBERT | 12M | Parameter sharing, factorized embeddings | Lan et al. (2020) |
| DistilBERT | 66M | Distilled BERT, fewer layers | Sanh et al. (2019) |
The project supports 5 NSE (National Stock Exchange of India) indices:
| Index | Data Available From | Description |
|---|---|---|
| NIFTY 50 | 1999 | Top 50 companies by market cap |
| NIFTY NEXT 50 | 1999 | Companies ranked 51-100 |
| NIFTY BANK | 2005 | Banking sector index |
| NIFTY FINANCIAL SERVICES | 2012 | Financial services sector |
| NIFTY MIDCAP SELECT | 2022 | Mid-cap companies |
- Python >= 3.14
- CUDA-capable GPU (optional, but recommended)
This project uses PEP 723 inline script metadata. The easiest way to run the scripts is with uv:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Run any script directly (dependencies auto-installed)
uv run model.py
uv run backbone_variation.py
uv run ticker_variation.pyIf you prefer traditional pip installation:
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install numpy pandas scikit-learn torch transformers matplotlib requests sentencepieceFirst, download historical stock data from NSE:
uv run fetch_data.pyThis will create CSV files for all 5 indices in the current directory.
uv run model.pyThis trains a GPT-2 based forecaster on NIFTY 50 data and outputs:
- Training/validation loss per epoch
- Test metrics (RMSE, MAE, MSE)
- Forecast visualization (
forecast_results.png)
Compare all 5 LLM architectures on NIFTY 50:
uv run backbone_variation.pyOutputs:
- Per-model training logs
- Comparative metrics table
- Multi-panel visualization (
backbone_variation_results.png)
Test GPT-2 across all 5 stock indices:
uv run ticker_variation.pyOutputs:
- Per-ticker training logs
- Comparative metrics table
- Multi-panel visualization (
ticker_variation_results.png)
Key hyperparameters (defined in each script's Config class):
| Parameter | Default | Description |
|---|---|---|
SEQ_LEN |
60 | Input sequence length (days) |
PRED_LEN |
1 | Prediction horizon (days) |
PATCH_SIZE |
10 | Size of each input patch |
BATCH_SIZE |
32 | Training batch size |
EPOCHS |
20 | Number of training epochs |
LEARNING_RATE |
1e-4 | Adam optimizer learning rate |
llm-stock-forecaster/
├── LICENSE # MIT License
├── README.md # This file
├── .gitignore # Git ignore patterns
├── model.py # Main FPT model (GPT-2 on NIFTY 50)
├── backbone_variation.py # Experiment: compare 5 LLM architectures
├── ticker_variation.py # Experiment: GPT-2 on 5 stock indices
└── fetch_data.py # Data fetcher for NSE indices
Results will vary based on the data fetched (market data changes over time). Example metrics format:
=========================================
BACKBONE VARIATION RESULTS (NIFTY 50)
=========================================
Model Total Params Trainable RMSE MAE
----------------------------------------------------------------------------------
GPT-2 (124M) 124,xxx,xxx x,xxx,xxx xx.xxxx xx.xxxx
BERT (110M) 110,xxx,xxx x,xxx,xxx xx.xxxx xx.xxxx
...
- Radford, A., et al. (2019). "Language Models are Unsupervised Multitask Learners"
- Devlin, J., et al. (2019). "BERT: Pre-training of Deep Bidirectional Transformers"
- Yang, Z., et al. (2019). "XLNet: Generalized Autoregressive Pretraining"
- Lan, Z., et al. (2020). "ALBERT: A Lite BERT for Self-supervised Learning"
- Sanh, V., et al. (2019). "DistilBERT, a distilled version of BERT"
- Zhou, T., et al. (2023). "One Fits All: Power General Time Series Analysis by Pretrained LM"
This project is licensed under the MIT License - see the LICENSE file for details.