Skip to content

Commit 1cd6341

Browse files
authored
Update README.md
1 parent 4cdeb7b commit 1cd6341

File tree

1 file changed

+0
-51
lines changed

1 file changed

+0
-51
lines changed

README.md

Lines changed: 0 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -82,54 +82,3 @@ cargo run --bin optd-cli -- compile path/to/file.opt --mock-udfs map get_table_s
8282
# Run functions marked with [run] annotation
8383
cargo run --bin optd-cli -- run-functions path/to/file.opt
8484
```
85-
86-
## TODO: How to Perform Costing
87-
88-
Physical expressions need to be costed. Their children are either goals or other physical expressions (called goal members). Let's take the following example: `EXPR(goal_1, sub_expr_2)`. To cost that expression, we have multiple approaches:
89-
90-
### Approach 1: Recursive Optimal Costing
91-
Recursively optimally cost `goal_1` and `sub_expr_2`. This approach is challenging because:
92-
- It requires invalidation whenever we get a better expression for `goal_1` or `sub_expr_2`
93-
- It doesn't ensure a global minimum, as greedy approaches are not always optimal
94-
- We cannot support physical→physical optimizations (if that turns out to be useful)
95-
96-
### Approach 2: Explore All Possibilities
97-
Explore all possibilities and rely on the scheduler to avoid combinatorial explosion. This is more in line with what we do for transformations and implementations. We can define a costing function in the DSL with the following signature:
98-
99-
```
100-
fn (plan: Physical*) cost(): (f64, Statistics)
101-
```
102-
103-
`Physical*` indicates that it is stored, so it has extra guarantees (e.g., all children are ingested). This mirrors what we use for logical implementations and transformations.
104-
105-
`f64` is the cost, and `Statistics` is any user-defined data type (could be ML weights, histograms, etc.).
106-
107-
When we encounter a goal, we expand it and materialize all physical expressions in that goal (and subgoals!). We need new syntax to expand/cost a nested physical expression. **Idea**: `$` postfix, which means "into costed". The left type should be `Physical*`, which can easily be tested with the type checker.
108-
109-
### Approach 3: Final Approach (Best of All Worlds)
110-
111-
```
112-
// This is a UDF/external function, similar to optimize for implementations
113-
fn (plan: Physical) into_costed(cost: f64, stats: Statistics)
114-
```
115-
116-
```
117-
fn (plan: Physical*) cost(): Physical$
118-
```
119-
120-
In the memo, each physical expression id will have a set of costed expressions:
121-
```
122-
pid -> {pid + cost + stats}
123-
```
124-
125-
This approach is excellent because:
126-
1. It uses the same updating mechanisms as for implementations and explorations (consistent scheduler!)
127-
2. It allows for further physical→physical transformations
128-
3. You can do whatever you want when costing an expression! Can go as deep as needed, can choose to recursively cost if desired (or not!)
129-
4. Can propagate statistics perfectly
130-
131-
**Only caveat**: Cost pruning has no built-in mechanism, but you can instrument the scheduler.
132-
133-
---
134-
135-
**📧 Contact**: Please reach out to [email protected] for more information about this.

0 commit comments

Comments
 (0)