Skip to content

Commit 1a708d2

Browse files
authored
Update Readme (#112)
As title
1 parent e6d34dc commit 1a708d2

File tree

1 file changed

+132
-2
lines changed

1 file changed

+132
-2
lines changed

README.md

Lines changed: 132 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,135 @@
1+
# 🎯 optd
2+
13
[![codecov](https://codecov.io/gh/cmu-db/optd/graph/badge.svg?token=FYM7I3R3GZ)](https://codecov.io/gh/cmu-db/optd)
24

3-
# optd
5+
optd is a high-performance, extensible optimizer-as-a-service, built to support research in cardinality estimation, adaptive planning, AI-driven optimization, and parallelism. It serves as both a prototype system and a foundation for building production-ready optimizers.
6+
7+
## ✨ Core Features
8+
9+
**🔍 Flexible Search Strategy**: Unlike traditional recursive sub-plan optimizers, optd supports broader, non-recursive search spaces for faster and better plan discovery.
10+
11+
**⚡ Parallelism**:
12+
- *Inter-query*: Optimize multiple queries in parallel while sharing computation
13+
- *Intra-query*: Explore a single plan's search space using many threads
14+
15+
**💾 Persistent Memoization**: The optimizer acts like a database—plans and statistics are stored and reused, enabling adaptivity through feedback from prior executions.
16+
17+
**📝 Rule DSL**: Define transformation rules in a high-level, expressive DSL. Our rule engine is Turing complete, enabling compact definitions of complex transformations like join order enumeration.
18+
19+
**Example Data Type Definition**:
20+
```
21+
data Logical =
22+
| Join(left: Logical, right: Logical, type: JoinType, predicate: Scalar)
23+
| Filter(child: Logical, predicate: Scalar)
24+
| Project(child: Logical, expressions: [Scalar])
25+
| Sort(child: Logical, order_by: [Bool])
26+
\ Get(table_name: String)
27+
```
28+
29+
**Example Transformation Rule**:
30+
```
31+
[transformation]
32+
fn (expr: Logical*) join_commute(): Logical? = match expr
33+
| Join(left, right, Inner, predicate) ->
34+
let
35+
left_props = left.properties(),
36+
right_props = right.properties(),
37+
left_len = left_props#schema#columns.len(),
38+
right_len = right_props#schema#columns.len(),
39+
40+
right_indices = 0..right_len,
41+
left_indices = 0..left_len,
42+
43+
remapping = (left_indices.map((i: I64) -> (i, i + right_len)) ++
44+
right_indices.map((i: I64) -> (i + left_len, i))).to_map(),
45+
in
46+
Project(
47+
Join(right, left, Inner, predicate.remap(remapping)),
48+
(right_indices ++ left_indices).map((i: I64) -> ColumnRef(i))
49+
)
50+
\ _ -> none
51+
```
52+
53+
**🔧 Pluggable Scheduling**: Apply rules using customizable scheduling strategies—from heuristics to AI-guided decisions.
54+
55+
**🔍 Explainability**: Track rule application history for better debugging and plan introspection.
56+
57+
**🔌 Extensibility**: Define custom operators and inherit existing rules. Designed to integrate with standards like Substrait, with a smoother UX than systems like Calcite.
58+
59+
## 🛠️ Usage
60+
61+
optd is currently under development. The costing mechanism is still being implemented, but there is a small demo available. The DSL tooling is more mature.
62+
63+
### Running the Demo
64+
65+
```bash
66+
# Run the demo test (located in optd/src/demo/mod.rs)
67+
cargo test test_optimizer_demo -- --nocapture
68+
```
69+
70+
### CLI Tool
71+
72+
```bash
73+
# Compile a DSL file
74+
cargo run --bin optd-cli -- compile path/to/file.opt
75+
76+
# Compile with verbose output and show intermediate representations
77+
cargo run --bin optd-cli -- compile path/to/file.opt --verbose --show-ast --show-hir
78+
79+
# Compile with mock UDFs for testing
80+
cargo run --bin optd-cli -- compile path/to/file.opt --mock-udfs map get_table_schema properties statistics optimize
81+
82+
# Run functions marked with [run] annotation
83+
cargo run --bin optd-cli -- run-functions path/to/file.opt
84+
```
85+
86+
## 🧮 TODO: How to Perform Costing
87+
88+
Physical expressions need to be costed. Their children are either goals or other physical expressions (called goal members). Let's take the following example: `EXPR(goal_1, sub_expr_2)`. To cost that expression, we have multiple approaches:
89+
90+
### Approach 1: Recursive Optimal Costing
91+
Recursively optimally cost `goal_1` and `sub_expr_2`. This approach is challenging because:
92+
- It requires invalidation whenever we get a better expression for `goal_1` or `sub_expr_2`
93+
- It doesn't ensure a global minimum, as greedy approaches are not always optimal
94+
- We cannot support physical→physical optimizations (if that turns out to be useful)
95+
96+
### Approach 2: Explore All Possibilities
97+
Explore all possibilities and rely on the scheduler to avoid combinatorial explosion. This is more in line with what we do for transformations and implementations. We can define a costing function in the DSL with the following signature:
98+
99+
```
100+
fn (plan: Physical*) cost(): (f64, Statistics)
101+
```
102+
103+
`Physical*` indicates that it is stored, so it has extra guarantees (e.g., all children are ingested). This mirrors what we use for logical implementations and transformations.
104+
105+
`f64` is the cost, and `Statistics` is any user-defined data type (could be ML weights, histograms, etc.).
106+
107+
When we encounter a goal, we expand it and materialize all physical expressions in that goal (and subgoals!). We need new syntax to expand/cost a nested physical expression. **Idea**: `$` postfix, which means "into costed". The left type should be `Physical*`, which can easily be tested with the type checker.
108+
109+
### Approach 3: Final Approach (Best of All Worlds)
110+
111+
```
112+
// This is a UDF/external function, similar to optimize for implementations
113+
fn (plan: Physical) into_costed(cost: f64, stats: Statistics)
114+
```
115+
116+
```
117+
fn (plan: Physical*) cost(): Physical$
118+
```
119+
120+
In the memo, each physical expression id will have a set of costed expressions:
121+
```
122+
pid -> {pid + cost + stats}
123+
```
124+
125+
This approach is excellent because:
126+
1. It uses the same updating mechanisms as for implementations and explorations (consistent scheduler!)
127+
2. It allows for further physical→physical transformations
128+
3. You can do whatever you want when costing an expression! Can go as deep as needed, can choose to recursively cost if desired (or not!)
129+
4. Can propagate statistics perfectly
130+
131+
**Only caveat**: Cost pruning has no built-in mechanism, but you can instrument the scheduler.
132+
133+
---
4134

5-
Query Optimizer Service
135+
**📧 Contact**: Please reach out to [email protected] for more information about this.

0 commit comments

Comments
 (0)