✨ Features
- persist column accelerator indexes inline (zero-copy mmap)
- build + persist chunk-zone indexes for .csv.splayed stores
- variable-length STR group keys via the wide-key path
- persisted STR dictionary index for int-code group-by
- STR has no null — kdb+ model (empty "" is the value)
- SYM has no null — empty symbol ' is a value (atom-level flip)
- render empty symbol as ', remove the 0Ns literal
- pure int-width resolver csv_resolve_int_width + unit test
- INT schema type -> auto narrowest int width on csv.splayed
- per-morsel selection emitter (rowsel_builder + emit_segment)
- fuse expr-tree WHERE into the row selection (skip the bool vec)
- fuse in WHERE into the row selection
- sel_compact keep-set param (project columns) + batched multi-gather
- window special form exposing the OP_WINDOW kernel
- SYM-column hash index (grouped attr on symbol columns)
- persist an explicit SYM grouped hash index in splayed columns
- support whole-column verbs (distinct/asc/desc/reverse) as select projections
🐛 Bug fixes
- skip a header row that matches explicitly-supplied column names
- blank STR field is empty string, not null (consistent with SYM)
- ray_pool_of must gate every header on the full signature
- guard str_vec_from_parts pool offset; drop dead null gather path
- fall back for >cap-task filters + zero-init rowsel builder (review)
- fold (quote X) in DAG predicates so it matches the 'X form
- carry adaptive-width SYM keys from splayed sources in asof/window-join
- read adaptive-width SYM keys via ray_read_sym in group/update-by
- window form — short first/last names, reject bad frame, order guard, restrict funcs, error tests
- key persisted SYM hash index in file-local domain
- drop write-only valid_ncols (clang -Werror on macOS)
⚡ Performance
- skip morsels a fused BOOL predicate decides from chunk-zone extrema
- prefilter computed group keys for any selective WHERE, not just top-N
- parallel filtered group-by + inline STR key descriptors
- lazy, arena-backed SYM-domain atom materialization
- adaptive small-hash -> radix promotion for misrouted high-card groups
- O(1) empty-string find — skip the full reverse-index build for
<> '' - route multi-key
by:distinct through the dict-linked agg path - wide (STR) group keys in agg_group_keys + admit them to the distinct path
- projection pushdown into a nested no-aggregate distinct
- avalanche the group-by tuple hash to kill structured-key collisions
- hoist per-row key-type dispatch out of the dense group-by hot loops
- skip the dense-plan min/max prescan for width-bounded SYM keys
- hoist the no-null fast path in the streaming accumulators
- skip dense min/max prescan for small-domain W32/W64 SYM keys
- presence-bitmap dedup for small-domain SYM columns
- cap per-group HT at 2*domain_count for SYM input
- order filter chains by selectivity, not just eval cost
- only trust equality selectivity, defer rest to cost
- don't assume SYM equality is selective (cardinality unknown)
- bulk-build STR group-key gather (no per-row append)
- len-only fast-path for STR ==/!= empty constant
- parallelize STR-key row_gid probe in count-distinct
- dict-code row_gid for STR count-distinct (q13 520->395ms)
- bulk-build dict_codes_to_str (q13 395->260ms)
- route dict-STR multi-key groups to the DAG path (q16 7581->119ms)
- dict-STR EQ/NE filter + STR sort key (q24/q25/q26 35-55x)
- dict-STR presence dedup instead of string hashset (q05 595->110ms)
- route non-dict STR multi-key groups to the DAG path (q39 1186->137ms)
- drop dead row-index from radix record when no row-dependent agg
- SIMD-specialized small-set integer membership fast path
- skip DA min/max prescan when a known key cardinality proves DA infeasible
- project the filtered-group compact to keys+agg-inputs only
- iterate survivors, not all rows, for sparse filtered group-by
- hashed dense-I64 find for vector args (was O(n*m) boxed list)
- compact sparse WHERE selections before grouped expression aggregates
- dense-key abort + no-qsort in hash probes; SYM support in the IN probe
- compact sparse WHERE selections before projected expression columns
- hash-group + cursor-merge asof strategy, sort-merge as fallback
- CSR grouped hash-index layout; asof joins answer from index slices
- xasc stamps the verified sorted marker on its primary key column
- multi-eq-key asof via first-key CSR index slice walks
- parted-layout package — per-slice asof verify, eq+range consult, sparse chained-filter refine
- xasc/xdesc already-in-order detection
- vectorized + pool-parallel SYM vector compares; parallel already-sorted scan
- sweep the FN_ATOMIC vector-dispatch holes — SYM ordering, STR compares, nullable-vector DAG routing; all pool-parallel
- SYM verdict-LUT membership kernel; typed eval-level
inrouting - fused SUM/AVG(a*b) product inputs; arith-of-aggs decomposition
- age free blocks before releasing their pages
- incremental cursor for the page-release pass — bounded work at every maintenance point
📚 Documentation
- correct filter-reorder sort comment for selectivity key
- drop stale BOOL/U8 refs after I16-floor (comments + dead map cases)
- note the ascending-order dependency of the survivor list
- correct SYM aux-layout comments for the new hash index
🔧 Maintenance & internal
- refactor(agg): move the aggregation engine off libc malloc onto the buddy heap
- refactor: remove the last libc malloc from src/ — buddy for transient, sys for global
- refactor: generalize the no-agg distinct path + harden the empty-sym find
- refactor(eval): remove dead disabled integer fast-path in atomic_map_binary_op
- refactor: remove 6 dead (uncalled) internal functions
- refactor: remove dead ray_heap_current_id
- refactor(compile): remove dead struct field compiler_t::dbg_len
- test: eliminate two never-run (always-skipped) tests
- parse hyphenated quoted symbols
- chore(ci): steer contributors to dev; auto-retarget stray master PRs
- test(str): cover empty-string compare matrix (eq, left-scalar, non-empty no-fire)
- refactor(sym): remove dead SYM null branches (no-null cleanup)
- test(sym): empty-symbol count-distinct parity + final no-null acceptance (zero 0Ns)
- refactor(sym): fix no-null comment rot + remove dead SYM-compare branches (final review)
- test(csv): add INT32_MIN sentinel test + drop redundant stdbool include
- test(csv): >1M-row multi-chunk INT round-trip + AUTO-marker guard (review)
- change(csv): cap INT auto-width floor at I16 (U8/BOOL render hex/bool — integers stay decimal)
- harden(rowsel): document builder-local seg contract + asserts + init OOM guard
- test(query): nullable fused-in equivalence + mixed-AND fallback note
- refactor(group): move group-key cardinality hint into ray_vm_t (VM ctx) + fix bound comment
- test(filter): assert per-column values across gather batch boundary + keep-set spanning
- refactor: remove RAY_NO_FUSED_SEL env-flag hack (no runtime toggle for optimizations)
- test(group): route sel_group fixture through the parallel path; honest coverage comment
- refactor(query): remove count-compare result-memoization cache
Full changelog: v2.1.9...v2.2.0