Skip to content

rust: implement ListRuns RPC #4385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Nov 25, 2020
Merged

rust: implement ListRuns RPC #4385

merged 45 commits into from
Nov 25, 2020

Conversation

wchargin
Copy link
Contributor

@wchargin wchargin commented Nov 24, 2020

Summary:
This patch connects the //tensorboard/data/server main module to the
Commit and LogdirLoader infrastructure, and implements ListRuns as
a proof of concept.

For now, we just take a logdir path as a positional command line
argument. A future patch will bring nicer command-line argument parsing.

Test Plan:
Unit tests included. Note that testing these data provider methods is
nice and easy: you just call them. No mocking or networking required.

For a manual grpc_cli test, start just the data server, and run:

$ grpc_cli --channel_creds_type=insecure \
>   --protofiles tensorboard/data/proto/data_provider.proto \
>   call localhost:6806 TensorBoardDataProvider.ListRuns '' 2>/dev/null
runs {
  name: "lr_1E-03,conv=1,fc=2"
  start_time: 1563406327
}
// etc.

For an end-to-end test, cherry-pick #4356, then run both the data server
and a copy of TensorBoard concurrently:

bazel run //tensorboard/data/server -- ~/tensorboard_data/mnist &
bazel run //tensorboard -- \
    --bind_all \
    --generic_data true \
    --grpc_data_provider localhost:6806 \
    ;

Then navigate to localhost:6006/data/runs and note that the response
includes a correct list of runs.

wchargin-branch: rust-listruns

Summary:
This module performs conversions from legacy on-disk data formats. It
will correspond to both the `data_compat` and `dataclass_compat` modules
from Python TensorBoard. For now, we have just one function, to
determine the initial summary metadata of a time series. And, for now,
we only implement this for scalars, to keep things simple.

Test Plan:
Unit tests included.

wchargin-branch: rust-data-compat
wchargin-source: ef510b422b00a66a23efc1406947844fa8d1a798
Summary:
Our errors now have nice string formatting, easier `From` conversions,
and `std::error::Error` implementations.

Test Plan:
Included some tests for the `Display` implementations. They’re often not
necessary—one benefit of deriving traits is that you can be confident in
the implementation without manually testing it. But sometimes, if the
format string is non-trivial, it can be nice to actually see the full
text written out.

wchargin-branch: rust-use-thiserror
wchargin-source: 0f8009c3b96619f2eb4649353487dc996b45a712
Summary:
We’d defined a `Step` [newtype] in the `reservoir` module, but that type
will be useful more broadly. We thus move it into a new `types` module,
and define `WallTime` and `Tag` siblings, which we’ll need shortly.

[newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html

Test Plan:
Most behavior is just `#[derive]`d; unit tests included for the rest.

wchargin-branch: rust-types-module
wchargin-source: b0f5475b247ef8f9cb8bf2e0bf94d2b648b8009e
…rigin/wchargin-rust-use-thiserror' and 'origin/wchargin-rust-types-module' into HEAD
wchargin-branch: rust-run-loader
wchargin-source: 35359d6e233139973edac8b2c1440cc48ed4e710
wchargin-branch: rust-data-compat
wchargin-source: 467974fa4e0e55dea0872bb1394a768a38811ac5
wchargin-branch: rust-run-loader
wchargin-source: d190e3d45c304d976a3fc14a73d99572a6ad377f
Summary:
This patch refactors `run`’s `StagePayload` type into an `EventValue`
type in `data_compat`. This way, we can define enriching conversions
from `EventValue`s to committed values in a layer that’s not specific to
the run loader.

We also implement `initial_metadata` for the `GraphDef` variant, for
symmetry and to make the abstraction clearer. The run loader still
doesn’t handle graphs.

Test Plan:
Unit tests expanded for the new code, and continue to cover the old.

wchargin-branch: rust-promote-eventvalue
wchargin-source: 96af44556af064cf9ef933154ca3a522a8829798
wchargin-branch: rust-run-loader
wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9

# Conflicts:
#	tensorboard/data/server/data_compat.rs
wchargin-branch: rust-run-loader
wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9
wchargin-branch: rust-promote-eventvalue
wchargin-source: 106b4519aba1a966d8273b6b2509f1798131dc5a
Summary:
This patch introduces the *commit*, which is the link between loading
threads and serving threads. The commit structure is designed for
concurrency: many threads can read from it at once, and when the commit
is updated, a loader acquires exclusive access only to its own run. This
improves latency for both readers and writers.

The commit itself doesn’t have much logic; it’s mostly just the correct
composition of substructures, most notably the reservoir basins.

See also the RustBoard technical design doc (#4350) for discussion.

Test Plan:
There’s one method on `commit::TimeSeries`; it has tests.

wchargin-branch: rust-commit-type
wchargin-source: d69b4e0034b68d1d8dadc6aa0f6dc3cc48efccde
Summary:
This patch integrates `RunLoader` and `Commit`. A run loader now commits
all changes at the end of its load cycle. This isn’t sufficient—we also
need it to periodically commit while it’s still loading. But it’s a good
start for now.

Much of the weight is actually in `data_compat`, which owns the logic
for “enriching” raw events into their class-specific representations.
This way, that logic can be reused: e.g., for an alternate consumer that
exposes a stream of all events, without downsampling.

Test Plan:
The new enrichment functionality has dedicated tests, and existing tests
for the run loader have been updated to operate on a `Commit` rather
than poking into the internals.

wchargin-branch: rust-commit-run-loader
wchargin-source: 1e0ae52eb9cd9ead6f53c9716306a5fcce178e06
wchargin-branch: rust-run-loader
wchargin-source: 793ff3e287c2571fd1bc06d5c1940ed3b3bdc796
wchargin-branch: rust-promote-eventvalue
wchargin-source: 432923653efb6fe768bdd507acac833672ec3e06
wchargin-branch: rust-commit-type
wchargin-source: 1850f510bac1f515fa4b7f73c1ec9a7a47fea283
wchargin-branch: rust-commit-run-loader
wchargin-source: 14baca16a82293a24a1b7f58869305f3b4ff7a6f

# Conflicts:
#	tensorboard/data/server/run.rs
wchargin-branch: rust-commit-run-loader
wchargin-source: 14baca16a82293a24a1b7f58869305f3b4ff7a6f
wchargin-branch: rust-promote-eventvalue
wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e

# Conflicts:
#	tensorboard/data/server/data_compat.rs
#	tensorboard/data/server/run.rs
wchargin-branch: rust-promote-eventvalue
wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e
wchargin-branch: rust-commit-type
wchargin-source: 38179a4b9af7538f133ee04f771ac895a8db21a9
wchargin-branch: rust-commit-run-loader
wchargin-source: f84d5842282dba99385fba34ece75f6ac7f82d2e
wchargin-branch: rust-commit-type
wchargin-source: b62b0669a50b5086a1f6af18effe62fbfd1e5431
wchargin-branch: rust-commit-run-loader
wchargin-source: a8130d52ce4b02636f59cab02ebef5cb27055fc9
wchargin-branch: rust-commit-run-loader
wchargin-source: a8130d52ce4b02636f59cab02ebef5cb27055fc9
wchargin-branch: rust-commit-run-loader
wchargin-source: 869544fd17ac2d740e60f1501b9540490d06906b

# Conflicts:
#	tensorboard/data/server/commit.rs
wchargin-branch: rust-commit-run-loader
wchargin-source: 869544fd17ac2d740e60f1501b9540490d06906b
wchargin-branch: rust-commit-run-loader
wchargin-source: e0ee69aea5631ccc6f2197922249d766aaa84cb6
wchargin-branch: rust-commit-run-loader
wchargin-source: e0ee69aea5631ccc6f2197922249d766aaa84cb6
wchargin-branch: rust-commit-run-loader
wchargin-source: 3f18bbb73a892785e6dfe5255b2831289ac3e789

# Conflicts:
#	tensorboard/data/server/run.rs
wchargin-branch: rust-commit-run-loader
wchargin-source: 3f18bbb73a892785e6dfe5255b2831289ac3e789
Summary:
A new `LogdirLoader` type owns `RunLoader`s for each run and wires them
up to a shared `Commit`. It is the logdir loader that finds event files
in the filesystem and feeds them to the run loaders.

For ease of review, this patch implements event file discovery but does
not actually load data. Stay tuned.

Test Plan:
Unit tests included. One test covers an edge case involving directories
whose names are not valid Unicode. This test only runs on Unix. We could
make it run on Windows, too, but I don’t have a Windows box to test on,
and the code under test is portable: the only platform-specific code is
the test setup. So run tests on a `cfg(unix)` platform, such as Linux or
macOS, for best coverage.

wchargin-branch: rust-logdir-loader-discovery
wchargin-source: dfb479587d0b1361dc39840dc7f535d3f944166d
Summary:
This patch connects `LogdirLoader` to `RunLoader` by actually calling
the `RunLoader::reload` method.

Test Plan:
Tests adapted and expanded.

wchargin-branch: rust-logdir-loader-data
wchargin-source: 8a363898cc9970a816da0898f1ce0ea78bfd98ea
wchargin-branch: rust-logdir-loader-discovery
wchargin-source: 761b320491045f4ceb6ced9eb596242ba860e7a9

# Conflicts:
#	tensorboard/data/server/data_compat.rs
wchargin-branch: rust-logdir-loader-discovery
wchargin-source: 761b320491045f4ceb6ced9eb596242ba860e7a9
wchargin-branch: rust-logdir-loader-data
wchargin-source: 5f6ccd3318e76347bc3f07d78c459070fe15e65d
Summary:
This patch connects the `//tensorboard/data/server` main module to the
`Commit` and `LogdirLoader` infrastructure, and implements `ListRuns` as
a proof of concept.

For now, we just take a logdir path as a positional command line
argument. A future patch will bring nicer command-line argument parsing.

Test Plan:
Run both the data server and a copy of TensorBoard concurrently:

```
bazel run //tensorboard/data/server -- ~/tensorboard_data/mnist &
bazel run //tensorboard -- \
    --bind_all \
    --generic_data true \
    --grpc_data_provider localhost:6806 \
    ;
```

Then navigate to `localhost:6006/data/runs` and note that the response
includes a correct list of runs. Or, for a manual `grpc_cli` test, start
just the data server, and run:

```
$ grpc_cli --channel_creds_type=insecure \
>   --protofiles tensorboard/data/proto/data_provider.proto \
>   call localhost:6806 TensorBoardDataProvider.ListRuns '' 2>/dev/null
runs {
  name: "lr_1E-03,conv=1,fc=2"
  start_time: 1563406327
}
// etc.
```

Unit tests also included. Note that testing these data provider methods
is nice and easy: you just call them. No mocking or networking required.

wchargin-branch: rust-listruns
wchargin-source: eb471eca310905eb45d730936279b57bdae70927
@wchargin wchargin added the core:rustboard //tensorboard/data/server/... label Nov 24, 2020
@google-cla google-cla bot added the cla: yes label Nov 24, 2020
@wchargin wchargin requested a review from nfelt November 24, 2020 08:56
@wchargin
Copy link
Contributor Author

Cf.: #4387 for grpc_cli tips; #4356 to cherry-pick in if you want to
run the end-to-end test.

wchargin-branch: rust-logdir-loader-discovery
wchargin-source: 44cd7a803ba04324ce11a892d8a8414792dae95c
wchargin-branch: rust-logdir-loader-data
wchargin-source: 29289362b8ff8294883bd381401613a6269e2ebc
wchargin-branch: rust-listruns
wchargin-source: 8187059ac28210897b1c90e4a31cd8237993ec4b
wchargin-branch: rust-logdir-loader-data
wchargin-source: ab88209f4c0f5874b7c0e80a7c670a8f7cef810d

# Conflicts:
#	tensorboard/data/server/logdir.rs
wchargin-branch: rust-logdir-loader-data
wchargin-source: ab88209f4c0f5874b7c0e80a7c670a8f7cef810d
wchargin-branch: rust-listruns
wchargin-source: bf3865887dddcdf516bc73806ea40f1f51ca0738
Base automatically changed from wchargin-rust-logdir-loader-data to master November 24, 2020 21:26
wchargin-branch: rust-listruns
wchargin-source: 805e652280c9c72e2f1ea5cd0be2ae74de1538bb
wchargin-branch: rust-listruns
wchargin-source: 4314f515f85db55f912bccc1ea0eb22078e6d783
.map(|(Run(name), start_time)| data::Run {
name,
start_time: start_time.into(),
..Default::default()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gather the point of this line is to be future-proof in case we add more fields to Run? Is there any mechanism that enforces this by default or is just a convention we'll need to uphold ourselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah—specifically, so that proto updates that add new fields don’t
introduce compilation errors.

I was considering adding a tiny “style guide” to DEVELOPMENT.md with
things like this, so I’ll probably go ahead and do that.

There is a #[non_exhaustive] attribute that you can add to structs to
indicate that they may have more fields later, preventing callers
outside the crate from initializing them directly but still permitting
them to access public fields. This doesn’t suffice because (a) we’re in
the same crate (though we could change that) and (b) it doesn’t permit
functional record updates (..Default::default()):

error[E0639]: cannot create non-exhaustive struct using struct expression
 --> main.rs:3:13
  |
3 |       let s = sub::S {
  |  _____________^
4 | |         a: 1,
5 | |         b: 2,
6 | |         ..Default::default()
7 | |     };
  | |_____^

The RFC threads have a lot of comments, and from skimming them it’s not
obvious to me whether this was “intended, by design” or merely “artifact
of how #[non_exhaustive] desugars to a hidden private field”. (It was
definitely known at the time.)

Another potential check for this would be in Clippy: “lint if you
construct a value at a type that implements prost::Message without
using a functional record update”. There is no such check currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the next PR, it's really rather noisy to have the ..Default::default() everywhere we create a proto, especially in tests :/ Given the fact that we control these protos ourselves, I wonder if on net maybe it's easier for places where we don't expect to add fields to just relax this and accept that we'll need to make modifications to the code if we update the proto?

The other alternative that came to mind would be to initialize structs to default-empty, then mutate to populate fields; needing mutation is a downside although I think in some of the more convoluted multi-layer struct cases this may be significantly more concise.

I can see why neither of these is ideal, it just feels distracting IMO to have every single proto struct need the extra syntax. Would be nice if there was some way to specify on the struct definition that they can be initialized a subset of fields if the rest are Default (basically like optional arguments), or if at least maybe the functional update syntax supported plain .. as sugar for ..Default::default().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seeing this. For posterity, further discussion at #4399.

commit
}

fn sample_handler() -> DataProviderHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you (and would also be fine to do in a follow up or punt for now), but I try to avoid data fixtures unless it's really pretty complicated or expensive to set up the right data. In this case it seems like it wouldn't be so bad to 1) create a commit, 2) explicitly populate it with the data we want via test helpers, and then 3) wrap a handler around it and do the assertions. I just find the assertions to be a lot more meaningful when I can see the data they depend on in the same test method vs having to go dig up the fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; noted, thanks!. I’m certainly sympathetic to this. I think I might
punt it for now since a shared fixture suffices for this set of changes.
But I’m happy to go back and reswizzle things after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #4433.

@wchargin wchargin merged commit 5f3b099 into master Nov 25, 2020
@wchargin wchargin deleted the wchargin-rust-listruns branch November 25, 2020 17:22
wchargin added a commit that referenced this pull request Dec 7, 2020
Summary:
In #4385, @nfelt suggested using test helpers to easily populate a
commit rather than using a single fixture across all test cases. This
patch implements such test helpers and uses them in the server tests.
The net diff to the test code (not the helpers) is −32 lines.

Test Plan:
Existing tests pass. The new test helpers mostly don’t have explicit
tests, but that’s okay since they’re exercised by the main test code.

wchargin-branch: rust-test-data-builders
wchargin-source: 31164b4d2b4a1fb9bbabc584249e1d3715c17a3e
wchargin added a commit that referenced this pull request Dec 9, 2020
Summary:
In #4385, @nfelt suggested using test helpers to easily populate a
commit rather than using a single fixture across all test cases. This
patch implements such test helpers and uses them in the server tests.
The net diff to the test code (not the helpers) is −32 lines.

Test Plan:
Existing tests pass. The new test helpers mostly don’t have explicit
tests, but that’s okay since they’re exercised by the main test code.

wchargin-branch: rust-test-data-builders
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes core:rustboard //tensorboard/data/server/...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants