Skip to content

Commit c39b65c

Browse files
committed
fix JOIN logic to avoid extraneous NULLs, documentation updates
1 parent 2e0d009 commit c39b65c

File tree

4 files changed

+290
-381
lines changed

4 files changed

+290
-381
lines changed

doc/user/content/sql/explain-analyze.md

Lines changed: 175 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,13 @@ menu:
88

99
`EXPLAIN ANALYZE` uses [`mz_introspection`](https://materialize.com/docs/sql/system-catalog/mz_introspection/) sources to report on the performance of indexes and materialized views.
1010

11+
When optimizing a query, it helps to be able to attribute 'cost' to its parts,
12+
starting with how much time is spent computing in each part overall. For example, Materialize
13+
reports the time spent in each _dataflow operator_ in
14+
[`mz_introspection.mz_compute_operator_durations_histogram`](/sql/system-catalog/mz_introspection/#mz_compute_operator_durations_histogram).
15+
Using `EXPLAIN ANALYZE`, we can attribute the time spent in each operator to the higher-level, more
16+
intelligible LIR operators.
17+
1118
{{< warning >}}
1219
`EXPLAIN` is not part of Materialize's stable interface and is not subject to
1320
our backwards compatibility guarantee. The syntax and output of `EXPLAIN` may
@@ -52,7 +59,174 @@ When running `EXPLAIN ANALYZE` for `CPU` or `MEMORY`, you can specify `WITH SKEW
5259

5360
Under the hood, `EXPLAIN ANALYZE` runs SQL queries that correlate [`mz_introspection` performance information](https://materialize.com/docs/sql/system-catalog/mz_introspection/) with the LIR operators in [`mz_introspection.mz_lir_mapping`](../../sql/system-catalog/mz_introspection/#mz_lir_mapping).
5461

55-
<!-- TODO add example or two -->
62+
The attribution examples in this
63+
section reference the `wins_by_item` index (and the underlying `winning_bids`
64+
view) from the [quickstart
65+
guide](/get-started/quickstart/#step-2-create-the-source):
66+
67+
```sql
68+
CREATE SOURCE auction_house
69+
FROM LOAD GENERATOR AUCTION
70+
(TICK INTERVAL '1s', AS OF 100000)
71+
FOR ALL TABLES;
72+
73+
CREATE VIEW winning_bids AS
74+
SELECT DISTINCT ON (a.id) b.*, a.item, a.seller
75+
FROM auctions AS a
76+
JOIN bids AS b
77+
ON a.id = b.auction_id
78+
WHERE b.bid_time < a.end_time
79+
AND mz_now() >= a.end_time
80+
ORDER BY a.id, b.amount DESC, b.bid_time, b.buyer;
81+
82+
CREATE INDEX wins_by_item ON winning_bids (item);
83+
```
84+
85+
### `EXPLAIN ANALYZE MEMORY`
86+
87+
When we run `EXPLAIN ANALYZE MEMORY FOR INDEX wins_by_item`, we get back a report on each operator in the dataflow for `wins_by_item`, annotated with memory usage and the number of records:
88+
89+
| operator | total_memory | total_records |
90+
| :------------------------------- | -----------: | ------------: |
91+
| Arrange 7 | 1116 kB | 39976 |
92+
|   Get::PassArrangements u274 | null | null |
93+
| TopK::Basic 5 | 100 MB | 1899733 |
94+
|   Join::Differential 2 » 4 | null | null |
95+
|     Arrange 3 | 5672 kB | 220375 |
96+
|       Get::PassArrangements u271 | null | null |
97+
|     Arrange 1 | 3501 kB | 188785 |
98+
|       Get::Collection u270 | null | null |
99+
100+
The results show the [`TopK`](/transform-data/idiomatic-materialize-sql/top-k/) is overwhelmingly responsible for memory usage.
101+
102+
103+
### `EXPLAIN ANALYZE CPU`
104+
105+
When we run `EXPLAIN ANALYZE CPU FOR INDEX wins_by_item`, we get back a report on each operator in the dataflow for `wins_by_item`, annotated with total time spent in each operator (not inclusive of its child operators):
106+
107+
| operator | total_elapsed |
108+
| :------------------------------- | --------------: |
109+
| Arrange 7 | 00:00:03.14266 |
110+
|   Get::PassArrangements u274 | null |
111+
| TopK::Basic 5 | 00:00:44.079592 |
112+
|   Join::Differential 2 » 4 | 00:00:06.372705 |
113+
|     Arrange 3 | 00:00:21.49465 |
114+
|       Get::PassArrangements u271 | 00:00:00.066236 |
115+
|     Arrange 1 | 00:00:11.212331 |
116+
|       Get::Collection u270 | 00:00:00.190449 |
117+
118+
We can see CPU and memory usage simultaneously by running `EXPLAIN ANALYZE CPU, MEMORY FOR INDEX wins_by_item`:
119+
120+
| operator | total_elapsed | total_memory | total_records |
121+
| :------------------------------- | --------------: | -----------: | ------------: |
122+
| Arrange 7 | 00:00:03.151386 | 1234 kB | 42359 |
123+
|   Get::PassArrangements u274 | null | null | |
124+
| TopK::Basic 5 | 00:00:44.347959 | 105 MB | 2013547 |
125+
|   Join::Differential 2 » 4 | 00:00:06.389385 | null | null |
126+
|     Arrange 3 | 00:00:21.558754 | 5431 kB | 233533 |
127+
|       Get::PassArrangements u271 | 00:00:00.06644 | null | null |
128+
|     Arrange 1 | 00:00:11.246103 | 3546 kB | 191168 |
129+
|       Get::Collection u270 | 00:00:00.190935 | null | null |
130+
131+
The order of `CPU` and `MEMORY` in the statement determines the order of the output columns.
132+
133+
### `EXPLAIN ANALYZE ... WITH SKEW`
134+
135+
[Worker skew](/transform-data/dataflow-troubleshooting/#is-work-distributed-equally-across-workers) occurs when your data do not end up getting evenly
136+
partitioned between workers.
137+
Worker skew can only happen when your cluster has more than one worker.
138+
You can query
139+
[`mz_catalog.mz_cluster_replica_sizes`](/sql/system-catalog/mz_catalog/#mz_cluster_replica_sizes)
140+
to determine how many workers a given cluster size has; in our example, there are 2 workers.
141+
Extreme cases of skew can cause serious performance issues; `EXPLAIN ANALYZE` can help you identify this scenario.
142+
143+
You can explain `MEMORY`, `CPU`, or both with `WITH SKEW`, which adds per worker and average worker performance numbers for each operator, along with each worker's ratio compared to the average. Here is an example of `EXPLAIN ANALYZE MEMORY WITH SKEW FOR INDEX wins_by_item`:
144+
145+
| operator | worker_id | memory_ratio | worker_memory | avg_memory | total_memory | records_ratio | worker_records | avg_records | total_records |
146+
| :------------------------------- | --------: | -----------: | ------------: | ---------: | -----------: | ------------: | -------------: | ----------: | ------------: |
147+
| Arrange 7 | 0 | 1.24 | 768 kB | 619 kB | 1238 kB | 1.2 | 25485 | 21246 | 42492 |
148+
| Arrange 7 | 1 | 0.76 | 470 kB | 619 kB | 1238 kB | 0.8 | 17007 | 21246 | 42492 |
149+
|   Get::PassArrangements u274 | null | null | null | null | null | null | null | null | null |
150+
| TopK::Basic 5 | 0 | 1 | 53 MB | 53 MB | 105 MB | 1 | 1011624 | 1010259.5 | 2020519 |
151+
| TopK::Basic 5 | 1 | 1 | 52 MB | 53 MB | 105 MB | 1 | 1008895 | 1010259.5 | 2020519 |
152+
|   Join::Differential 2 » 4 | null | null | null | null | null | null | null | null | null |
153+
|     Arrange 3 | 0 | 1 | 2726 kB | 2724 kB | 5448 kB | 1 | 117297 | 117167.5 | 234335 |
154+
|     Arrange 3 | 1 | 1 | 2722 kB | 2724 kB | 5448 kB | 1 | 117038 | 117167.5 | 234335 |
155+
|       Get::PassArrangements u271 | null | null | null | null | null | null | null | null | null |
156+
|     Arrange 1 | 0 | 1 | 1779 kB | 1778 kB | 3556 kB | 1 | 95955 | 95750 | 191500 |
157+
|     Arrange 1 | 1 | 1 | 1778 kB | 1778 kB | 3556 kB | 1 | 95545 | 95750 | 191500 |
158+
|       Get::Collection u270 | null | null | null | null | null | null | null | null | null |
159+
160+
The `ratio` column tells you whether a worker is particularly over- or
161+
under-loaded:
162+
163+
- a `ratio` below 1 indicates a worker doing a below average amount of work.
164+
165+
- a `ratio` above 1 indicates a worker doing an above average amount of work.
166+
167+
While there will always be some amount of variation, very high ratios indicate a
168+
skewed workload.
169+
Here the memory ratios are close to 1, indicating there is very little worker skew.
170+
171+
### `EXPLAIN ANALYZE HINTS`
172+
173+
The [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice) introspection source offers hints on sizing `TopK` operators. `EXPLAIN ANALYZE HINTS` can annotate your query with these hints, which can be helpful for identifying which `TopK` should get which hint. Each `TopK` operator will have an [associated `DISTINCT ON INPUT GROUP SIZE`
174+
query hint](/transform-data/idiomatic-materialize-sql/top-k/#query-hints-1):
175+
Here is an example of `EXPLAIN ANALYZE HINTS FOR INDEX wins_by_item` (which has only one `TopK`):
176+
177+
| operator | levels | to_cut | hint | savings |
178+
| :------------------------------- | -----: | -----: | ----: | ------: |
179+
| Arrange 7 | null | null | null | null |
180+
|   Get::PassArrangements u274 | null | null | null | null |
181+
| TopK::Basic 5 | 8 | 6 | 255.0 | 75 MB |
182+
|   Join::Differential 2 » 4 | null | null | null | null |
183+
|     Arrange 3 | null | null | null | null |
184+
|       Get::PassArrangements u271 | null | null | null | null |
185+
|     Arrange 1 | null | null | null | null |
186+
|       Get::Collection u270 | null | null | null | null |
187+
188+
Here, the hinted `DISTINCT ON INPUT GROUP SIZE` is `255.0`. We can re-create our view and index using the hint as follows:
189+
190+
```sql
191+
DROP VIEW winning_bids CASCADE;
192+
193+
CREATE VIEW winning_bids AS
194+
SELECT DISTINCT ON (a.id) b.*, a.item, a.seller
195+
FROM auctions AS a
196+
JOIN bids AS b
197+
ON a.id = b.auction_id
198+
WHERE b.bid_time < a.end_time
199+
AND mz_now() >= a.end_time
200+
OPTIONS (DISTINCT ON INPUT GROUP SIZE = 255) -- use hint!
201+
ORDER BY a.id,
202+
b.amount DESC,
203+
b.bid_time,
204+
b.buyer;
205+
206+
CREATE INDEX wins_by_item ON winning_bids (item);
207+
```
208+
209+
Re-running the `TopK`-hints query will show only `null` hints; there are no
210+
hints because our `TopK` is now appropriately sized. But if we re-run our [query
211+
for attributing memory usage](#attributing-memory-usage), we can see that our
212+
`TopK` operator uses a third of the 100MB of memory it was using before:
213+
214+
| operator | total_memory | total_records |
215+
| -------------------------------- | ------------ | ------------- |
216+
| Arrange 7 | 1093 kB | 42720 |
217+
|   Get::PassArrangements u286 | null | null |
218+
| TopK::Basic 5 | 30 MB | 625638 |
219+
|   Join::Differential 2 » 4 | null | null |
220+
|     Arrange 3 | 5447 kB | 235570 |
221+
|       Get::PassArrangements u271 | null | null |
222+
|     Arrange 1 | 3485 kB | 191730 |
223+
|       Get::Collection u270 | null | null |
224+
225+
### `EXPLAIN ANALYZE ... AS SQL`
226+
227+
Under the hood, `EXPLAIN ANALYZE` issues SQL queries against [`mz_introspection`](https://materialize.com/docs/sql/system-catalog/mz_introspection/) sources.
228+
You can append `AS SQL` to any `EXPLAIN ANALYZE` statement to see the SQL that would be run (without running it).
229+
You can then customize this SQL to report finer grained or other information.
56230

57231
## Privileges
58232

0 commit comments

Comments
 (0)