dexscan

To see a more recent example of my coding style and best practices, take a look at my Python caching library rapide 🐎

This was a personal project I worked on between June - October 2023, to further hone my skills in data processing, interacting with the Ethereum network, and machine learning.

The initial idea was to scrape per-block data on all holders of any given on-chain crypto asset. Each per-block dataset would contain accurate statistics on live holder profit-and-losses, along with other data points. This type of data was not offered by any third-party data brokers at the time, and would have cost a fortune to acquire through managed nodes.

Features

✅ Efficiently query and cache data from a single, self-hosted Ethereum node
✅ Per-block, per-address live profit and loss data, fast enough for live use cases
✅ Use minimal 3rd party data sources
✅ Prepare normalized feature set from large amounts of raw data
✅ Train both neural-network and gradient boosting models
✅ UI that displays live model inference on a given crypto asset
✅ Simple backtesting system for models
✅ Uses EdgeDB, use EdgeQL to create concise queries
✅ Runs even on consumer hardware

What I Would Change

Looking back on this project, there is a number of improvements and simplifications I would make.

Rework Data Features: At the time of this project I didn't have an understanding of predictive financial models, and simply included all the data I had collected. I compensated with regularization in the training steps, but this was a band-aid. Making a profitable model was not really my focus at the time.
Switch to SQLite: EdgeDB was fun to use, but introduces too much overhead and complexity given the scope of this project. A simple SQLite implementation for the cached data would probably be faster and give fewer headaches, especially when writing many rows in a single transaction. Some of the tables would have to be redesigned to better fit the more traditional style of SQL queries.
Testing for Faster Iteration: This codebase originally grew out of a small single script. Certain components, such as the Database and Hybrid classes, are fairly complex. I would add a suite of unit and integration tests, especially for these components, before making any further changes. This would be essential if switching to SQLite.

Requirements to Run

Python 3.11
poetry for package management
edgedb-cli to setup database
A fully synced, archival, self-hosted Ethereum node
- I recommend reth for execution and lighthouse for consensus

Command Reference

Used to interact with the library

App - Start Realtime UI

poetry run ui

This prompts for a token pair address, then loads a terminal UI displaying live data and charts for that pair, including PnL spread amongst addresses and model inference results.

Sampling - Collect Samples

poetry run collect-samples

Using the pair addresses provided in samples_test.csv and samples_train.csv, this script pulls all per-block data from the database or node as needed, generates feature set, then packages into a compressed JSON for training.

Sampling - Reset Pair

poetry run reset-pair

Prompts for a token pair address and clears associated data from the database

Training - Run NN Training

poetry run train

Begins model training process. Each epoch is saved in a project directory.

Training - Save NN Epoch

poetry run train-save-epoch

Prompts for a epoch number in the training output directory and saves the model.

Training - Run LGBM Training

poetry run train-lgbm

Begins LGBM training process.

Backtest - Run

poetry run backtest

One of the last things I experimented with. Runs a basic backtesting suite and outputs results to JSON file.

Backtest - Analyze Results

poetry run backtest-analyze

Prompts for backtest results file and outputs some useful interpretive statistics in the terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dbschema		dbschema
dexscan		dexscan
queries		queries
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
edgedb.toml		edgedb.toml
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
samples_test.csv		samples_test.csv
samples_train.csv		samples_train.csv
scripts.py		scripts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dexscan

Features

What I Would Change

Requirements to Run

Command Reference

App - Start Realtime UI

Sampling - Collect Samples

Sampling - Reset Pair

Training - Run NN Training

Training - Save NN Epoch

Training - Run LGBM Training

Backtest - Run

Backtest - Analyze Results

About

Languages

License

johnrezk/dexscan

Folders and files

Latest commit

History

Repository files navigation

dexscan

Features

What I Would Change

Requirements to Run

Command Reference

App - Start Realtime UI

Sampling - Collect Samples

Sampling - Reset Pair

Training - Run NN Training

Training - Save NN Epoch

Training - Run LGBM Training

Backtest - Run

Backtest - Analyze Results

About

Resources

License

Stars

Watchers

Forks

Languages