Skip to content

pc799/llm-stock-forecaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-stock-forecaster

Stock market forecasting using Frozen Pretrained Transformers (FPT). This project explores leveraging pretrained Large Language Models (LLMs) as feature extractors for time series prediction on Indian stock market indices.

Overview

This project implements a novel approach to stock price forecasting by:

  1. Freezing most parameters of pretrained LLM backbones
  2. Fine-tuning only layer normalization and embedding layers
  3. Using patch-based input embeddings to convert time series data into a format suitable for transformer architectures

The approach is inspired by the observation that pretrained transformers learn general-purpose representations that can transfer to domains beyond natural language.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Input Time Series                        │
│                    (seq_len=60 time steps)                      │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Instance Normalization                       │
│              (zero mean, unit variance per sample)              │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Patch Embedding                            │
│         (split into 6 patches of size 10, project to d_model)   │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                   + Positional Embedding                        │
│                  (learnable, 6 positions)                       │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Frozen LLM Backbone                          │
│    ┌─────────────────────────────────────────────────────┐      │
│    │  Transformer Layers (weights frozen)                │      │
│    │  Layer Norms (fine-tuned) ✓                         │      │
│    │  Embeddings (fine-tuned) ✓                          │      │
│    └─────────────────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Output Projection                          │
│              (flatten → linear → prediction)                    │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Reverse Normalization                         │
│                (restore original scale)                         │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
                         Forecast Output

Supported LLM Backbones

Model Parameters Architecture Reference
GPT-2 124M Decoder-only, causal attention Radford et al. (2019)
BERT 110M Encoder-only, bidirectional Devlin et al. (2019)
XLNet 110M Permutation LM (AR + AE) Yang et al. (2019)
ALBERT 12M Parameter sharing, factorized embeddings Lan et al. (2020)
DistilBERT 66M Distilled BERT, fewer layers Sanh et al. (2019)

Dataset

The project supports 5 NSE (National Stock Exchange of India) indices:

Index Data Available From Description
NIFTY 50 1999 Top 50 companies by market cap
NIFTY NEXT 50 1999 Companies ranked 51-100
NIFTY BANK 2005 Banking sector index
NIFTY FINANCIAL SERVICES 2012 Financial services sector
NIFTY MIDCAP SELECT 2022 Mid-cap companies

Installation

Prerequisites

  • Python >= 3.14
  • CUDA-capable GPU (optional, but recommended)

Using uv (Recommended)

This project uses PEP 723 inline script metadata. The easiest way to run the scripts is with uv:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Run any script directly (dependencies auto-installed)
uv run model.py
uv run backbone_variation.py
uv run ticker_variation.py

Manual Installation

If you prefer traditional pip installation:

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install numpy pandas scikit-learn torch transformers matplotlib requests sentencepiece

Usage

1. Fetch Data

First, download historical stock data from NSE:

uv run fetch_data.py

This will create CSV files for all 5 indices in the current directory.

2. Run Main Model (GPT-2 on NIFTY 50)

uv run model.py

This trains a GPT-2 based forecaster on NIFTY 50 data and outputs:

  • Training/validation loss per epoch
  • Test metrics (RMSE, MAE, MSE)
  • Forecast visualization (forecast_results.png)

3. Backbone Comparison Experiment

Compare all 5 LLM architectures on NIFTY 50:

uv run backbone_variation.py

Outputs:

  • Per-model training logs
  • Comparative metrics table
  • Multi-panel visualization (backbone_variation_results.png)

4. Ticker Variation Experiment

Test GPT-2 across all 5 stock indices:

uv run ticker_variation.py

Outputs:

  • Per-ticker training logs
  • Comparative metrics table
  • Multi-panel visualization (ticker_variation_results.png)

Configuration

Key hyperparameters (defined in each script's Config class):

Parameter Default Description
SEQ_LEN 60 Input sequence length (days)
PRED_LEN 1 Prediction horizon (days)
PATCH_SIZE 10 Size of each input patch
BATCH_SIZE 32 Training batch size
EPOCHS 20 Number of training epochs
LEARNING_RATE 1e-4 Adam optimizer learning rate

Project Structure

llm-stock-forecaster/
├── LICENSE                  # MIT License
├── README.md                # This file
├── .gitignore               # Git ignore patterns
├── model.py                 # Main FPT model (GPT-2 on NIFTY 50)
├── backbone_variation.py    # Experiment: compare 5 LLM architectures
├── ticker_variation.py      # Experiment: GPT-2 on 5 stock indices
└── fetch_data.py            # Data fetcher for NSE indices

Results

Results will vary based on the data fetched (market data changes over time). Example metrics format:

=========================================
BACKBONE VARIATION RESULTS (NIFTY 50)
=========================================
Model                Total Params      Trainable         RMSE          MAE
----------------------------------------------------------------------------------
GPT-2 (124M)          124,xxx,xxx       x,xxx,xxx       xx.xxxx      xx.xxxx
BERT (110M)           110,xxx,xxx       x,xxx,xxx       xx.xxxx      xx.xxxx
...

References

  • Radford, A., et al. (2019). "Language Models are Unsupervised Multitask Learners"
  • Devlin, J., et al. (2019). "BERT: Pre-training of Deep Bidirectional Transformers"
  • Yang, Z., et al. (2019). "XLNet: Generalized Autoregressive Pretraining"
  • Lan, Z., et al. (2020). "ALBERT: A Lite BERT for Self-supervised Learning"
  • Sanh, V., et al. (2019). "DistilBERT, a distilled version of BERT"
  • Zhou, T., et al. (2023). "One Fits All: Power General Time Series Analysis by Pretrained LM"

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Stock market forecasting using Frozen Pretrained Transformers (FPT)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages