docs: mosaic CLI page (review of #66)#5
Closed
jianguotian wants to merge 12 commits into
Closed
Conversation
Mosaic previously shipped no viewer tooling — inspecting a file meant writing Rust against the library API. Add a `mosaic` binary (a new `cli` workspace crate) mirroring parquet-cli: - schema: column names, Arrow types, nullability, bucket assignment - meta: row groups, rows, per-column stats (null_count/min/max) - cat: first N rows as a table, with -n and --columns projection - pages: per-column encoding (plain/const/dict/all_null) + slot size All commands support --json. The reader is driven over a new file-backed InputFile (pread). Core gains three small read-only accessors used by `pages`: BucketReader::encodings(), ColumnPageReader::encoding(), and MosaicReader::page_infos(). No format/behavior change; 199 core tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a core regression test for MosaicReader::page_infos asserting plain/dict/const detection on a paged-bucket file, and CLI unit tests for the fmt helpers (json escaping, value/encoding rendering, ndjson null handling, table truncation).
Drive the mosaic binary against a fixture file (via CARGO_BIN_EXE) and assert stdout for schema/meta/pages/cat, --json output, projection, row truncation and missing-file failure. No external dev-deps.
Adds docs/cli.html documenting the mosaic inspector (schema/meta/pages/cat, text + JSON) with a parquet-cli command mapping and design-difference table, addressing the review asks on apache#66. Adds CLI to the nav across doc pages.
Align the viewer command set with parquet-cli/arrow-rs: head (alias of cat), footer (magic/version/buckets/compression), column-size (on-disk bytes per column), dictionary (dump dict-encoded entries). Core gains compression()/dict_values()/dictionary() read-only accessors. e2e tests cover the new commands.
Mosaic's column-bucket grouping has no parquet equivalent. Add a buckets command printing, per row group, each bucket's kind (empty/monolithic/paged), on-disk size and member columns. Core gains MosaicReader::bucket_infos(). e2e covered.
Align dictionary column selection with parquet-cli's -c flag instead of a positional argument; update e2e.
Completes JSON output across all 9 commands; dict columns emit an array, non-dict row groups emit null. e2e extended.
Expand docs/cli.html and cli/README.md to cover every command (schema/meta/footer/buckets/pages/dictionary/column-size/cat/head) with usage and example output. Drop all comparison content per maintainer preference.
Remove the near-trivial encoding_names mapping test; extend footer and buckets e2e to cover their --json output, improving CLI feature coverage.
The e2e tests carry their own fixture writer; the standalone gen.rs example duplicated it and was unreferenced.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Docs for the mosaic CLI, addressing Jingsong Li's review on apache#66.
What
docs/cli.html: schema / meta / cat / pages / head / footer / column-size / buckets / dictionary with text + JSON examples and an options table.cli/README.mdquick-start; CLI added to nav across doc pages.This was a fork preview of apache#66. Superseded by apache#66 (same head 924ac4b), closing.