Skip to content

docs: mosaic CLI page (review of #66)#5

Closed
jianguotian wants to merge 12 commits into
mainfrom
docs/mosaic-cli
Closed

docs: mosaic CLI page (review of #66)#5
jianguotian wants to merge 12 commits into
mainfrom
docs/mosaic-cli

Conversation

@jianguotian

@jianguotian jianguotian commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Docs for the mosaic CLI, addressing Jingsong Li's review on apache#66.

What

  • New docs/cli.html: schema / meta / cat / pages / head / footer / column-size / buckets / dictionary with text + JSON examples and an options table.
  • cli/README.md quick-start; CLI added to nav across doc pages.

This was a fork preview of apache#66. Superseded by apache#66 (same head 924ac4b), closing.

mingfeng and others added 12 commits June 16, 2026 04:22
Mosaic previously shipped no viewer tooling — inspecting a file meant
writing Rust against the library API. Add a `mosaic` binary (a new `cli`
workspace crate) mirroring parquet-cli:

- schema: column names, Arrow types, nullability, bucket assignment
- meta:   row groups, rows, per-column stats (null_count/min/max)
- cat:    first N rows as a table, with -n and --columns projection
- pages:  per-column encoding (plain/const/dict/all_null) + slot size

All commands support --json. The reader is driven over a new file-backed
InputFile (pread). Core gains three small read-only accessors used by
`pages`: BucketReader::encodings(), ColumnPageReader::encoding(), and
MosaicReader::page_infos(). No format/behavior change; 199 core tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a core regression test for MosaicReader::page_infos asserting
plain/dict/const detection on a paged-bucket file, and CLI unit tests
for the fmt helpers (json escaping, value/encoding rendering, ndjson
null handling, table truncation).
Drive the mosaic binary against a fixture file (via CARGO_BIN_EXE) and
assert stdout for schema/meta/pages/cat, --json output, projection,
row truncation and missing-file failure. No external dev-deps.
Adds docs/cli.html documenting the mosaic inspector (schema/meta/pages/cat,
text + JSON) with a parquet-cli command mapping and design-difference table,
addressing the review asks on apache#66. Adds CLI to the nav across doc pages.
Align the viewer command set with parquet-cli/arrow-rs: head (alias of
cat), footer (magic/version/buckets/compression), column-size (on-disk
bytes per column), dictionary (dump dict-encoded entries). Core gains
compression()/dict_values()/dictionary() read-only accessors. e2e tests
cover the new commands.
Mosaic's column-bucket grouping has no parquet equivalent. Add a
buckets command printing, per row group, each bucket's kind
(empty/monolithic/paged), on-disk size and member columns. Core gains
MosaicReader::bucket_infos(). e2e covered.
Align dictionary column selection with parquet-cli's -c flag instead of
a positional argument; update e2e.
Completes JSON output across all 9 commands; dict columns emit an array,
non-dict row groups emit null. e2e extended.
Expand docs/cli.html and cli/README.md to cover every command
(schema/meta/footer/buckets/pages/dictionary/column-size/cat/head) with
usage and example output. Drop all comparison content per maintainer
preference.
Remove the near-trivial encoding_names mapping test; extend footer and
buckets e2e to cover their --json output, improving CLI feature coverage.
The e2e tests carry their own fixture writer; the standalone gen.rs
example duplicated it and was unreferenced.
@jianguotian jianguotian changed the title docs: mosaic CLI page + parquet-cli comparison (review of #66) docs: mosaic CLI page (review of #66) Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant