Skip to content

LIMIT on coordinate-only columns returns wrong row count #10

@alxmrs

Description

@alxmrs

Summary

When executing a SQL query that selects only coordinate columns with a LIMIT clause, the Rust CLI returns fewer rows than expected. The issue appears to be that LIMIT is applied to the DictionaryArray’s unique values rather than the expanded Cartesian product rows.

Bug revealed in #9

Steps to Reproduce

  1. Generate test data:
./scripts/generate_data.sh
  1. Run the following query via zarr-cli:

CREATE EXTERNAL TABLE data STORED AS ZARR LOCATION 'data/synthetic_v3.zarr'
SELECT lat FROM data LIMIT 11

  1. Expected: 11 rows (from the 700-row Cartesian product: 7 time × 10 lat × 10 lon)

  2. Actual: 10 rows (the number of unique lat values in the DictionaryArray)

Minimal Reproduction

# Build CLI
cargo build --bin zarr-cli
# Run query
echo "CREATE EXTERNAL TABLE data STORED
AS ZARR LOCATION 'data/ synthetic_v3.zarr'
SELECT lat FROM data LIMIT 11 quit" | ./target/debug/zarr-cli

Analysis

The Rust implementation uses
'DictionaryArray for coordinate columns like 'lat'
" "lon'
" 'time. When
a query selects only coordinate columns:

  • 'lat' has 10 unique dictionary values
    I0, 1, 2, ..., 9]
  • The full table has 700 rows where each lat value appears 70 times
  • 'SELECT lat FROM data LIMIT 11' should return 11 rows
    The LIMIT appears to be incorrectly applied to the dictionary values (10 unique) rather than the expanded index array (700 rows).

Workaround

Include at least one data variable column in the SELECT to force full row expansion:

SELECT lat, temperature FROM data LIMIT
11 - Works correctly

Environment

  • Found via hypothesis property-based testing
  • Test file: 'python/tests/ test_integration.py
  • Relevant Rust files: 'src/reader/ zarr_reader.rs', 'src/physical_plan/ zarr_exec.rs'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions