Opteryx-Core is the SQL execution engine behind opteryx.app. It is a fork of Opteryx with a smaller, more opinionated API and configuration surface, shaped around the workloads we run in the hosted service.
This library is designed for fast, read-heavy analytical queries over Parquet-backed data. It handles SQL parsing, planning, predicate pushdown, projection pruning, and execution so you can query datasets from Python without standing up a separate warehouse.
It is fair to say this project is opinionated toward the needs of opteryx.app. That said, it is still useful as a standalone library, especially if you want to query local Parquet-backed datasets via registered workspaces, embed SQL into a Python service or notebook, or experiment with the engine directly.
pip install opteryx-coreImport it as:
import opteryxIf your current working directory contains local Parquet data, the simplest way to use Opteryx-Core is to register a local workspace and query it with dot-separated names.
import opteryx
from opteryx.connectors import DiskConnector
opteryx.register_workspace("data", DiskConnector)
session = opteryx.session()
result = session.execute_to_arrow(
"SELECT id, name FROM data.planets WHERE id < 5"
)
print(result)In this model, dataset names are resolved relative to the current working directory. For example, data.planets resolves to ./data/planets, and Opteryx-Core will read the Parquet files it finds there.
- Powering the execution layer used by
opteryx.app - Running analytical SQL against local Parquet-backed datasets
- Embedding a query engine inside Python applications, scripts, notebooks, and services
- Working on engine internals such as planning, execution, and Parquet performance
Opteryx-Core works best when paired with the opteryx_catalog library. That is the intended model for named datasets, catalog-backed tables, and the general experience used in opteryx.app.
Typical setup:
import os
import opteryx
from opteryx import set_default_connector
from opteryx.connectors import OpteryxConnector
from opteryx_catalog import OpteryxCatalog
set_default_connector(
OpteryxConnector,
catalog=OpteryxCatalog,
firestore_project=os.environ["GCP_PROJECT_ID"],
firestore_database=os.environ["FIRESTORE_DATABASE"],
gcs_bucket=os.environ["GCS_BUCKET"],
)Once configured, you can query catalog-backed datasets using dot-separated names such as public.space.planets or opteryx.ops.billing.
For local data, Opteryx-Core is typically used through registered workspaces such as testdata, scratch, or data. Queries refer to datasets by dot-separated names relative to the workspace root, for example testdata.planets, testdata.satellites, or scratch.signals.
Opteryx-Core is best thought of as an embedded analytical engine rather than a full end-user platform. If you want a hosted experience, multi-tenant service features, and the broader product workflow, use opteryx.app. If you want the core engine in your own environment, this package gives you that engine directly. If you want the intended table-resolution model, pair it with opteryx_catalog.
If you use Opteryx-Core yourself, we want to hear from you.
- Use it on your own datasets
- Raise bugs when queries, schemas, or performance do not behave as expected
- Open pull requests for fixes, tests, docs, or performance improvements
- Share repro cases, failing queries, and edge-case Parquet files
This project is being actively built, and outside usage helps make it better.
Docs: https://docs.opteryx.app/ • Source: https://github.com/mabel-dev/opteryx-core • License: Apache-2.0