-
Notifications
You must be signed in to change notification settings - Fork 1.7k
rust: implement ListRuns
RPC
#4385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This module performs conversions from legacy on-disk data formats. It will correspond to both the `data_compat` and `dataclass_compat` modules from Python TensorBoard. For now, we have just one function, to determine the initial summary metadata of a time series. And, for now, we only implement this for scalars, to keep things simple. Test Plan: Unit tests included. wchargin-branch: rust-data-compat wchargin-source: ef510b422b00a66a23efc1406947844fa8d1a798
Summary: Our errors now have nice string formatting, easier `From` conversions, and `std::error::Error` implementations. Test Plan: Included some tests for the `Display` implementations. They’re often not necessary—one benefit of deriving traits is that you can be confident in the implementation without manually testing it. But sometimes, if the format string is non-trivial, it can be nice to actually see the full text written out. wchargin-branch: rust-use-thiserror wchargin-source: 0f8009c3b96619f2eb4649353487dc996b45a712
Summary: We’d defined a `Step` [newtype] in the `reservoir` module, but that type will be useful more broadly. We thus move it into a new `types` module, and define `WallTime` and `Tag` siblings, which we’ll need shortly. [newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html Test Plan: Most behavior is just `#[derive]`d; unit tests included for the rest. wchargin-branch: rust-types-module wchargin-source: b0f5475b247ef8f9cb8bf2e0bf94d2b648b8009e
…rigin/wchargin-rust-use-thiserror' and 'origin/wchargin-rust-types-module' into HEAD
wchargin-branch: rust-run-loader wchargin-source: 35359d6e233139973edac8b2c1440cc48ed4e710
wchargin-branch: rust-data-compat wchargin-source: 467974fa4e0e55dea0872bb1394a768a38811ac5
wchargin-branch: rust-run-loader wchargin-source: d190e3d45c304d976a3fc14a73d99572a6ad377f
Summary: This patch refactors `run`’s `StagePayload` type into an `EventValue` type in `data_compat`. This way, we can define enriching conversions from `EventValue`s to committed values in a layer that’s not specific to the run loader. We also implement `initial_metadata` for the `GraphDef` variant, for symmetry and to make the abstraction clearer. The run loader still doesn’t handle graphs. Test Plan: Unit tests expanded for the new code, and continue to cover the old. wchargin-branch: rust-promote-eventvalue wchargin-source: 96af44556af064cf9ef933154ca3a522a8829798
wchargin-branch: rust-run-loader wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9 # Conflicts: # tensorboard/data/server/data_compat.rs
wchargin-branch: rust-run-loader wchargin-source: 170c77aae53df6716520df779e4838d3fc3b93b9
wchargin-branch: rust-promote-eventvalue wchargin-source: 106b4519aba1a966d8273b6b2509f1798131dc5a
Summary: This patch introduces the *commit*, which is the link between loading threads and serving threads. The commit structure is designed for concurrency: many threads can read from it at once, and when the commit is updated, a loader acquires exclusive access only to its own run. This improves latency for both readers and writers. The commit itself doesn’t have much logic; it’s mostly just the correct composition of substructures, most notably the reservoir basins. See also the RustBoard technical design doc (#4350) for discussion. Test Plan: There’s one method on `commit::TimeSeries`; it has tests. wchargin-branch: rust-commit-type wchargin-source: d69b4e0034b68d1d8dadc6aa0f6dc3cc48efccde
Summary: This patch integrates `RunLoader` and `Commit`. A run loader now commits all changes at the end of its load cycle. This isn’t sufficient—we also need it to periodically commit while it’s still loading. But it’s a good start for now. Much of the weight is actually in `data_compat`, which owns the logic for “enriching” raw events into their class-specific representations. This way, that logic can be reused: e.g., for an alternate consumer that exposes a stream of all events, without downsampling. Test Plan: The new enrichment functionality has dedicated tests, and existing tests for the run loader have been updated to operate on a `Commit` rather than poking into the internals. wchargin-branch: rust-commit-run-loader wchargin-source: 1e0ae52eb9cd9ead6f53c9716306a5fcce178e06
wchargin-branch: rust-run-loader wchargin-source: 793ff3e287c2571fd1bc06d5c1940ed3b3bdc796
wchargin-branch: rust-promote-eventvalue wchargin-source: 432923653efb6fe768bdd507acac833672ec3e06
wchargin-branch: rust-commit-type wchargin-source: 1850f510bac1f515fa4b7f73c1ec9a7a47fea283
wchargin-branch: rust-commit-run-loader wchargin-source: 14baca16a82293a24a1b7f58869305f3b4ff7a6f # Conflicts: # tensorboard/data/server/run.rs
wchargin-branch: rust-commit-run-loader wchargin-source: 14baca16a82293a24a1b7f58869305f3b4ff7a6f
wchargin-branch: rust-promote-eventvalue wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e # Conflicts: # tensorboard/data/server/data_compat.rs # tensorboard/data/server/run.rs
wchargin-branch: rust-promote-eventvalue wchargin-source: dd7917e2bd9084f8ff048657d94572168a221a7e
wchargin-branch: rust-commit-type wchargin-source: 38179a4b9af7538f133ee04f771ac895a8db21a9
wchargin-branch: rust-commit-run-loader wchargin-source: f84d5842282dba99385fba34ece75f6ac7f82d2e
wchargin-branch: rust-commit-type wchargin-source: b62b0669a50b5086a1f6af18effe62fbfd1e5431
wchargin-branch: rust-commit-run-loader wchargin-source: a8130d52ce4b02636f59cab02ebef5cb27055fc9
wchargin-branch: rust-commit-run-loader wchargin-source: a8130d52ce4b02636f59cab02ebef5cb27055fc9
wchargin-branch: rust-commit-run-loader wchargin-source: 869544fd17ac2d740e60f1501b9540490d06906b # Conflicts: # tensorboard/data/server/commit.rs
wchargin-branch: rust-commit-run-loader wchargin-source: 869544fd17ac2d740e60f1501b9540490d06906b
wchargin-branch: rust-commit-run-loader wchargin-source: e0ee69aea5631ccc6f2197922249d766aaa84cb6
wchargin-branch: rust-commit-run-loader wchargin-source: e0ee69aea5631ccc6f2197922249d766aaa84cb6
wchargin-branch: rust-commit-run-loader wchargin-source: 3f18bbb73a892785e6dfe5255b2831289ac3e789 # Conflicts: # tensorboard/data/server/run.rs
wchargin-branch: rust-commit-run-loader wchargin-source: 3f18bbb73a892785e6dfe5255b2831289ac3e789
Summary: A new `LogdirLoader` type owns `RunLoader`s for each run and wires them up to a shared `Commit`. It is the logdir loader that finds event files in the filesystem and feeds them to the run loaders. For ease of review, this patch implements event file discovery but does not actually load data. Stay tuned. Test Plan: Unit tests included. One test covers an edge case involving directories whose names are not valid Unicode. This test only runs on Unix. We could make it run on Windows, too, but I don’t have a Windows box to test on, and the code under test is portable: the only platform-specific code is the test setup. So run tests on a `cfg(unix)` platform, such as Linux or macOS, for best coverage. wchargin-branch: rust-logdir-loader-discovery wchargin-source: dfb479587d0b1361dc39840dc7f535d3f944166d
Summary: This patch connects `LogdirLoader` to `RunLoader` by actually calling the `RunLoader::reload` method. Test Plan: Tests adapted and expanded. wchargin-branch: rust-logdir-loader-data wchargin-source: 8a363898cc9970a816da0898f1ce0ea78bfd98ea
wchargin-branch: rust-logdir-loader-discovery wchargin-source: 761b320491045f4ceb6ced9eb596242ba860e7a9 # Conflicts: # tensorboard/data/server/data_compat.rs
wchargin-branch: rust-logdir-loader-discovery wchargin-source: 761b320491045f4ceb6ced9eb596242ba860e7a9
wchargin-branch: rust-logdir-loader-data wchargin-source: 5f6ccd3318e76347bc3f07d78c459070fe15e65d
Summary: This patch connects the `//tensorboard/data/server` main module to the `Commit` and `LogdirLoader` infrastructure, and implements `ListRuns` as a proof of concept. For now, we just take a logdir path as a positional command line argument. A future patch will bring nicer command-line argument parsing. Test Plan: Run both the data server and a copy of TensorBoard concurrently: ``` bazel run //tensorboard/data/server -- ~/tensorboard_data/mnist & bazel run //tensorboard -- \ --bind_all \ --generic_data true \ --grpc_data_provider localhost:6806 \ ; ``` Then navigate to `localhost:6006/data/runs` and note that the response includes a correct list of runs. Or, for a manual `grpc_cli` test, start just the data server, and run: ``` $ grpc_cli --channel_creds_type=insecure \ > --protofiles tensorboard/data/proto/data_provider.proto \ > call localhost:6806 TensorBoardDataProvider.ListRuns '' 2>/dev/null runs { name: "lr_1E-03,conv=1,fc=2" start_time: 1563406327 } // etc. ``` Unit tests also included. Note that testing these data provider methods is nice and easy: you just call them. No mocking or networking required. wchargin-branch: rust-listruns wchargin-source: eb471eca310905eb45d730936279b57bdae70927
wchargin-branch: rust-logdir-loader-discovery wchargin-source: 44cd7a803ba04324ce11a892d8a8414792dae95c
wchargin-branch: rust-logdir-loader-data wchargin-source: 29289362b8ff8294883bd381401613a6269e2ebc
wchargin-branch: rust-listruns wchargin-source: 8187059ac28210897b1c90e4a31cd8237993ec4b
wchargin-branch: rust-logdir-loader-data wchargin-source: ab88209f4c0f5874b7c0e80a7c670a8f7cef810d # Conflicts: # tensorboard/data/server/logdir.rs
wchargin-branch: rust-logdir-loader-data wchargin-source: ab88209f4c0f5874b7c0e80a7c670a8f7cef810d
wchargin-branch: rust-listruns wchargin-source: bf3865887dddcdf516bc73806ea40f1f51ca0738
wchargin-branch: rust-listruns wchargin-source: 805e652280c9c72e2f1ea5cd0be2ae74de1538bb
wchargin-branch: rust-listruns wchargin-source: 4314f515f85db55f912bccc1ea0eb22078e6d783
.map(|(Run(name), start_time)| data::Run { | ||
name, | ||
start_time: start_time.into(), | ||
..Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather the point of this line is to be future-proof in case we add more fields to Run
? Is there any mechanism that enforces this by default or is just a convention we'll need to uphold ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah—specifically, so that proto updates that add new fields don’t
introduce compilation errors.
I was considering adding a tiny “style guide” to DEVELOPMENT.md
with
things like this, so I’ll probably go ahead and do that.
There is a #[non_exhaustive]
attribute that you can add to structs to
indicate that they may have more fields later, preventing callers
outside the crate from initializing them directly but still permitting
them to access public fields. This doesn’t suffice because (a) we’re in
the same crate (though we could change that) and (b) it doesn’t permit
functional record updates (..Default::default()
):
error[E0639]: cannot create non-exhaustive struct using struct expression
--> main.rs:3:13
|
3 | let s = sub::S {
| _____________^
4 | | a: 1,
5 | | b: 2,
6 | | ..Default::default()
7 | | };
| |_____^
The RFC threads have a lot of comments, and from skimming them it’s not
obvious to me whether this was “intended, by design” or merely “artifact
of how #[non_exhaustive]
desugars to a hidden private field”. (It was
definitely known at the time.)
Another potential check for this would be in Clippy: “lint if you
construct a value at a type that implements prost::Message
without
using a functional record update”. There is no such check currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the next PR, it's really rather noisy to have the ..Default::default()
everywhere we create a proto, especially in tests :/ Given the fact that we control these protos ourselves, I wonder if on net maybe it's easier for places where we don't expect to add fields to just relax this and accept that we'll need to make modifications to the code if we update the proto?
The other alternative that came to mind would be to initialize structs to default-empty, then mutate to populate fields; needing mutation is a downside although I think in some of the more convoluted multi-layer struct cases this may be significantly more concise.
I can see why neither of these is ideal, it just feels distracting IMO to have every single proto struct need the extra syntax. Would be nice if there was some way to specify on the struct definition that they can be initialized a subset of fields if the rest are Default
(basically like optional arguments), or if at least maybe the functional update syntax supported plain ..
as sugar for ..Default::default()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just seeing this. For posterity, further discussion at #4399.
commit | ||
} | ||
|
||
fn sample_handler() -> DataProviderHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it up to you (and would also be fine to do in a follow up or punt for now), but I try to avoid data fixtures unless it's really pretty complicated or expensive to set up the right data. In this case it seems like it wouldn't be so bad to 1) create a commit, 2) explicitly populate it with the data we want via test helpers, and then 3) wrap a handler around it and do the assertions. I just find the assertions to be a lot more meaningful when I can see the data they depend on in the same test method vs having to go dig up the fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; noted, thanks!. I’m certainly sympathetic to this. I think I might
punt it for now since a shared fixture suffices for this set of changes.
But I’m happy to go back and reswizzle things after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #4433.
Summary: In #4385, @nfelt suggested using test helpers to easily populate a commit rather than using a single fixture across all test cases. This patch implements such test helpers and uses them in the server tests. The net diff to the test code (not the helpers) is −32 lines. Test Plan: Existing tests pass. The new test helpers mostly don’t have explicit tests, but that’s okay since they’re exercised by the main test code. wchargin-branch: rust-test-data-builders wchargin-source: 31164b4d2b4a1fb9bbabc584249e1d3715c17a3e
Summary: In #4385, @nfelt suggested using test helpers to easily populate a commit rather than using a single fixture across all test cases. This patch implements such test helpers and uses them in the server tests. The net diff to the test code (not the helpers) is −32 lines. Test Plan: Existing tests pass. The new test helpers mostly don’t have explicit tests, but that’s okay since they’re exercised by the main test code. wchargin-branch: rust-test-data-builders
Summary:
This patch connects the
//tensorboard/data/server
main module to theCommit
andLogdirLoader
infrastructure, and implementsListRuns
asa proof of concept.
For now, we just take a logdir path as a positional command line
argument. A future patch will bring nicer command-line argument parsing.
Test Plan:
Unit tests included. Note that testing these data provider methods is
nice and easy: you just call them. No mocking or networking required.
For a manual
grpc_cli
test, start just the data server, and run:For an end-to-end test, cherry-pick #4356, then run both the data server
and a copy of TensorBoard concurrently:
Then navigate to
localhost:6006/data/runs
and note that the responseincludes a correct list of runs.
wchargin-branch: rust-listruns