πUBC Master of Data Science β Capstone Project | In partnership with Trilemma Foundation
Special thanks to our mentor Hedayat Zarkoob from the UBC MDS program for his invaluable guidance and support throughout this project.
Bitcoin transaction fees are unpredictable, often spiking without warning. Most existing tools offer only short-term estimates with limited foresight (<1 hour).
This project tackles that problem by building a system to forecast Bitcoin transaction fees 24 hours ahead.
Our data product includes:
- Custom volatility-aware loss functions for improved evaluation of fee spikes
- Modular, end-to-end pipelines for six forecasting models: HWES, SARIMA, Prophet, XGBoost, DeepAR, and TFT
- Exploratory notebooks detailing data preparation, EDA, model training, and evaluation for each method
- A final report summarizing key findings and benchmarking model performance across metrics
Whether you're exploring volatility, comparing time series models, or forecasting blockchain costs, this repo offers a practical and extensible foundation.
| Folder / File | Purpose |
|---|---|
analysis/ |
Jupyter notebooks for overview, EDA, and walkthroughs of each model. |
scripts/ |
Main training/evaluation scripts for each model, with helper functions organized in model-specific subfolders. |
data/ |
Contains raw data, processed data, and script for data extraction from API. |
results/ |
Stores generated plots, tables, and figures generated from scrpts. |
reports/ |
Project proposal and final report in Quarto format (rendered as PDF). |
src/ |
Utility functions used across notebooks and scripts. |
tests/ |
Unit tests for utility functions in src/. |
environment.yml |
Conda environment configuration file. |
- Clone the repository
git clone [email protected]:UBC-MDS/Capstone_SatCast_Trilemma.git- Create and activate the virtual environment
conda env create -f environment.yml
conda activate satcast- (Optional) If Jupyter can't find your environment kernel, you may need to manually add it by running the following command in the terminal:
python -m ipykernel install --user --name=satcastIf you're new to this project, we recommend starting with one of the following:
- Final Report (PDF) β A complete summary of our goals, methodology, EDA, model results, and insights.
To regenerate the report, you need to have Quarto installed. Then run the following command in terminal:
quarto render reports/final/final_report.qmd- Comprehensive Overview Notebook β An all-in-one Jupyter notebook that showcase the most important findings of the project.
To open the notebook, you can use Jupyter Lab or Jupyter Notebook. Type the following command in terminal and then navigate to analysis/comprehensive_overview.ipynb from the Jupyter interface to ensure all images and links are rendered properly.
jupyter labThis project is designed to be modular and user-friendly, allowing you to explore, run, and reproduce results at different levels of detail.
sarima_final_model.pkl) in the repository. Instead, you can generate it locally by training the model with the following command:
python scripts/baseline_sarima.py --parquet-path data/raw/mar_5_may_12.parquetNote: You must first generate
sarima_final_model.pklfor any SARIMA-related scripts, notebooks, orscripts/analysis.pyto run without errors.
For those who prefer to engage with the project using minimal code while still gaining a comprehensive understanding of the data, models, and results, we recommend reviewing the notebooks in the analysis/ folder.
These are designed to emphasize reasoning, interpretation, and model logic over implementation details.
Please use the same command (jupyter lab analysis) in the terminal to open the Jupyter Lab interface and the following table to navigate through the notebooks:
| Item | Notebook | Reading Time |
|---|---|---|
| EDA | analysis/data_spec.ipynb | ~10-15 minutes |
| HWES | analysis/baseline_hwes.ipynb | ~5 minutes |
| SARIMA | analysis/baseline_sarima.ipynb | ~5 minutes |
| XGBoost | analysis/baseline_xgboost.ipynb | ~5 minutes |
| Prophet | analysis/advanced_prophet.ipynb | ~5 minutes |
| DeepAR | analysis/advanced_deepar.ipynb | ~5 minutes |
| TFT | analysis/advanced_tft.ipynb | ~5-10 minutes |
Note: You can also navigate to these notebooks directly from the comprehensive overview notebook (
analysis/comprehensive_overview.ipynb), which includes inline links embedded throughout the summary.
If you're looking to reproduce results, train models, or extend the pipeline, this section is for you.
We offer a modular setup that supports three levels of interaction:
You can skip training and use pretrained models to generate predictions from scripts/analysis.py by running the following command in the terminal:
python scripts/analysis.pyIf you want to customize hyperparameters or train from scratch, you can run each model's main script:
| Model | Script File | Training Time (est.) | Optimization Required | Time Saving Mechanism |
|---|---|---|---|---|
| HWES | scripts/baseline_hwes.py | ~5 minutes | β Yes | Fast, not applicable |
| SARIMA | scripts/baseline_sarima.py | ~5 minutes | β No | Fast, not applicable |
| XGBoost | scripts/baseline_xgboost.py | ~2 hours | β Yes | Skip optimization |
| Prophet | scripts/advanced_prophet.py | ~3β4 hours | β Yes | Skip optimization |
| DeepAR | scripts/advanced_deepar.py | ~6 hours | β No | Sample Data Trial |
| TFT | scripts/advanced_tft.py | ~8β9 hours | β No | Sample Data Trial |
-
Training time is estimated based on a compute setup of
Intel i9-13980HX,RTX 4090 labtop GPU,Windows 11 Pro. Actual time may vary depending on your hardware and configuration. -
Given the different configurations and arguments of each model, command-line options may differ depending on the model you are running. Therefore, please refer to the top of each script file for detailed usage instructions.
-
For models that require hyperparameter tuning (HWES, XGBoost, Prophet), sample data
data/raw/sample_8_days.parquetcannot be used as it is too small to capture the necessary patterns. -
For models that require long time to train, we have built in mechanisms to save time:
- For XGBoost and Prophet, you can use the
--skip-optimizationflag to load pre-tuned hyperparameters. This will save time by skipping the optimization step and directly using the best hyperparameters saved during our best model training. - For DeepAR and TFT, we provided a sample dataset (
data/raw/sample_8_days.parquet) that allows you to quickly test the model without waiting for long training times. However, for full training, you should use the larger dataset (data/raw/mar_5_may_12.parquet). This is controlled by the--parquet-pathargument in the command. - Pay attention to the specific requirements and dependencies for each model, as outlined in their respective script files.
- For XGBoost and Prophet, you can use the
-
If you have played around with the scripts, which may have modified files in
results/modelsfolder, and want to reset to default:- please copy the official model files from
results/saved_models/back intoresults/models/ - Re-run the analysis script (
scripts/analysis.py) to ensure that all results are up-to-date - (Optional) Re-render the final report to reflect actual project findings
- please copy the official model files from
If you want to customize the training process or experiment with different configurations, you can modify the respective script files in the scripts/ directory. Each model has its own script, and you can adjust preprocessing, training, and prediction as needed. For each model, you can find:
- A main script (e.g.,
baseline_sarima.py) inscriptsfolder that can be run from the command line. - A subfolder (e.g.,
scripts/sarima/) with modularized helper functions for loading, preprocessing, training, forecasting, and evaluation. - Users typically run the main script; helper files are imported and not meant to be executed directly.
We have included unit tests for the utility functions used by multiple models in the project. These tests can be found in the tests/ directory and are designed to ensure the correctness of the utility functions.
To run the function tests, enter the following in the root of the repository:
pytestWe have provided additional scripts for evaluating model performance across different time-based windows (e.g., expanding, reverse expanding, sliding) for HWES, SARIMA, and XGBoost under the scripts/experimentation/ directory.
These scripts are designed for deeper insight and are not required to reproduce the main results. They can be run independently to explore how models perform under different time-based conditions.
| Model | Script | Est. Runtime |
|---|---|---|
| SARIMA | sarima_window.py | ~1.5 hours |
| XGBoost | xgboost_window.py | ~1 hour |
| HWES | hwes_window.py | ~15 minutes |
- Runtime estimates are based on a standard compute setup and may vary based on your hardware.
- Please refer to the top of the specific script files for detailed usage instructions and available modes.
- The available arguments for each script include
--parquet-pathto specify the data file, and--modeto select the time-based windowing strategy (e.g., expanding, reverse expanding, sliding). - These experiments can only be excuted on the full dataset (
data/raw/mar_5_may_12.parquet) and are not compatible with the sample dataset (data/raw/sample_8_days.parquet).
Weβd love for you to contribute to this project! Whether itβs adding new forecasting models, improving data pipelines, or fixing bugs, your input is valuable.
Check out our CONTRIBUTING.md file for guidelines on how to get started.
If you are new to open source, look for issues labeled "good first issue" β these are great entry points to begin contributing!
Encountered a problem or have a question?
Please open an issue on this repository, and weβll get back to you as soon as possible.
Jenny Zhang, Ximin Xu, Yajing Liu, Tengwei Wang
This project is licensed under the MIT License.
See the LICENSE.md file for full details.