Skip to content

v2.2.0

Latest

Choose a tag to compare

@github-actions github-actions released this 02 Jul 21:37
8af9b0f

✨ Features

  • persist column accelerator indexes inline (zero-copy mmap)
  • build + persist chunk-zone indexes for .csv.splayed stores
  • variable-length STR group keys via the wide-key path
  • persisted STR dictionary index for int-code group-by
  • STR has no null — kdb+ model (empty "" is the value)
  • SYM has no null — empty symbol ' is a value (atom-level flip)
  • render empty symbol as ', remove the 0Ns literal
  • pure int-width resolver csv_resolve_int_width + unit test
  • INT schema type -> auto narrowest int width on csv.splayed
  • per-morsel selection emitter (rowsel_builder + emit_segment)
  • fuse expr-tree WHERE into the row selection (skip the bool vec)
  • fuse in WHERE into the row selection
  • sel_compact keep-set param (project columns) + batched multi-gather
  • window special form exposing the OP_WINDOW kernel
  • SYM-column hash index (grouped attr on symbol columns)
  • persist an explicit SYM grouped hash index in splayed columns
  • support whole-column verbs (distinct/asc/desc/reverse) as select projections

🐛 Bug fixes

  • skip a header row that matches explicitly-supplied column names
  • blank STR field is empty string, not null (consistent with SYM)
  • ray_pool_of must gate every header on the full signature
  • guard str_vec_from_parts pool offset; drop dead null gather path
  • fall back for >cap-task filters + zero-init rowsel builder (review)
  • fold (quote X) in DAG predicates so it matches the 'X form
  • carry adaptive-width SYM keys from splayed sources in asof/window-join
  • read adaptive-width SYM keys via ray_read_sym in group/update-by
  • window form — short first/last names, reject bad frame, order guard, restrict funcs, error tests
  • key persisted SYM hash index in file-local domain
  • drop write-only valid_ncols (clang -Werror on macOS)

⚡ Performance

  • skip morsels a fused BOOL predicate decides from chunk-zone extrema
  • prefilter computed group keys for any selective WHERE, not just top-N
  • parallel filtered group-by + inline STR key descriptors
  • lazy, arena-backed SYM-domain atom materialization
  • adaptive small-hash -> radix promotion for misrouted high-card groups
  • O(1) empty-string find — skip the full reverse-index build for <> ''
  • route multi-key by: distinct through the dict-linked agg path
  • wide (STR) group keys in agg_group_keys + admit them to the distinct path
  • projection pushdown into a nested no-aggregate distinct
  • avalanche the group-by tuple hash to kill structured-key collisions
  • hoist per-row key-type dispatch out of the dense group-by hot loops
  • skip the dense-plan min/max prescan for width-bounded SYM keys
  • hoist the no-null fast path in the streaming accumulators
  • skip dense min/max prescan for small-domain W32/W64 SYM keys
  • presence-bitmap dedup for small-domain SYM columns
  • cap per-group HT at 2*domain_count for SYM input
  • order filter chains by selectivity, not just eval cost
  • only trust equality selectivity, defer rest to cost
  • don't assume SYM equality is selective (cardinality unknown)
  • bulk-build STR group-key gather (no per-row append)
  • len-only fast-path for STR ==/!= empty constant
  • parallelize STR-key row_gid probe in count-distinct
  • dict-code row_gid for STR count-distinct (q13 520->395ms)
  • bulk-build dict_codes_to_str (q13 395->260ms)
  • route dict-STR multi-key groups to the DAG path (q16 7581->119ms)
  • dict-STR EQ/NE filter + STR sort key (q24/q25/q26 35-55x)
  • dict-STR presence dedup instead of string hashset (q05 595->110ms)
  • route non-dict STR multi-key groups to the DAG path (q39 1186->137ms)
  • drop dead row-index from radix record when no row-dependent agg
  • SIMD-specialized small-set integer membership fast path
  • skip DA min/max prescan when a known key cardinality proves DA infeasible
  • project the filtered-group compact to keys+agg-inputs only
  • iterate survivors, not all rows, for sparse filtered group-by
  • hashed dense-I64 find for vector args (was O(n*m) boxed list)
  • compact sparse WHERE selections before grouped expression aggregates
  • dense-key abort + no-qsort in hash probes; SYM support in the IN probe
  • compact sparse WHERE selections before projected expression columns
  • hash-group + cursor-merge asof strategy, sort-merge as fallback
  • CSR grouped hash-index layout; asof joins answer from index slices
  • xasc stamps the verified sorted marker on its primary key column
  • multi-eq-key asof via first-key CSR index slice walks
  • parted-layout package — per-slice asof verify, eq+range consult, sparse chained-filter refine
  • xasc/xdesc already-in-order detection
  • vectorized + pool-parallel SYM vector compares; parallel already-sorted scan
  • sweep the FN_ATOMIC vector-dispatch holes — SYM ordering, STR compares, nullable-vector DAG routing; all pool-parallel
  • SYM verdict-LUT membership kernel; typed eval-level in routing
  • fused SUM/AVG(a*b) product inputs; arith-of-aggs decomposition
  • age free blocks before releasing their pages
  • incremental cursor for the page-release pass — bounded work at every maintenance point

📚 Documentation

  • correct filter-reorder sort comment for selectivity key
  • drop stale BOOL/U8 refs after I16-floor (comments + dead map cases)
  • note the ascending-order dependency of the survivor list
  • correct SYM aux-layout comments for the new hash index
🔧 Maintenance & internal
  • refactor(agg): move the aggregation engine off libc malloc onto the buddy heap
  • refactor: remove the last libc malloc from src/ — buddy for transient, sys for global
  • refactor: generalize the no-agg distinct path + harden the empty-sym find
  • refactor(eval): remove dead disabled integer fast-path in atomic_map_binary_op
  • refactor: remove 6 dead (uncalled) internal functions
  • refactor: remove dead ray_heap_current_id
  • refactor(compile): remove dead struct field compiler_t::dbg_len
  • test: eliminate two never-run (always-skipped) tests
  • parse hyphenated quoted symbols
  • chore(ci): steer contributors to dev; auto-retarget stray master PRs
  • test(str): cover empty-string compare matrix (eq, left-scalar, non-empty no-fire)
  • refactor(sym): remove dead SYM null branches (no-null cleanup)
  • test(sym): empty-symbol count-distinct parity + final no-null acceptance (zero 0Ns)
  • refactor(sym): fix no-null comment rot + remove dead SYM-compare branches (final review)
  • test(csv): add INT32_MIN sentinel test + drop redundant stdbool include
  • test(csv): >1M-row multi-chunk INT round-trip + AUTO-marker guard (review)
  • change(csv): cap INT auto-width floor at I16 (U8/BOOL render hex/bool — integers stay decimal)
  • harden(rowsel): document builder-local seg contract + asserts + init OOM guard
  • test(query): nullable fused-in equivalence + mixed-AND fallback note
  • refactor(group): move group-key cardinality hint into ray_vm_t (VM ctx) + fix bound comment
  • test(filter): assert per-column values across gather batch boundary + keep-set spanning
  • refactor: remove RAY_NO_FUSED_SEL env-flag hack (no runtime toggle for optimizations)
  • test(group): route sel_group fixture through the parallel path; honest coverage comment
  • refactor(query): remove count-compare result-memoization cache

Full changelog: v2.1.9...v2.2.0