Skip to content

rust: use thiserror for error types #4341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions tensorboard/data/server/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ rust_library(
"//third_party/rust:prost",
"//third_party/rust:rand",
"//third_party/rust:rand_chacha",
"//third_party/rust:thiserror",
"//third_party/rust:tonic",
],
)
Expand Down
30 changes: 8 additions & 22 deletions tensorboard/data/server/event_file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,34 +35,20 @@ pub struct EventFileReader<R> {
}

/// Error returned by [`EventFileReader::read_event`].
#[derive(Debug)]
#[derive(Debug, thiserror::Error)]
pub enum ReadEventError {
/// The record failed its checksum.
InvalidRecord(ChecksumError),
#[error(transparent)]
InvalidRecord(#[from] ChecksumError),
/// The record passed its checksum, but the contained protocol buffer is invalid.
InvalidProto(DecodeError),
#[error(transparent)]
InvalidProto(#[from] DecodeError),
/// The record is a valid `Event` proto, but its `wall_time` is `NaN`.
#[error("NaN wall time at step {}", .0.step)]
NanWallTime(Event),
/// An error occurred reading the record. May or may not be fatal.
ReadRecordError(ReadRecordError),
}

impl From<DecodeError> for ReadEventError {
fn from(e: DecodeError) -> Self {
ReadEventError::InvalidProto(e)
}
}

impl From<ChecksumError> for ReadEventError {
fn from(e: ChecksumError) -> Self {
ReadEventError::InvalidRecord(e)
}
}

impl From<ReadRecordError> for ReadEventError {
fn from(e: ReadRecordError) -> Self {
ReadEventError::ReadRecordError(e)
}
#[error(transparent)]
ReadRecordError(#[from] ReadRecordError),
}

impl ReadEventError {
Expand Down
12 changes: 10 additions & 2 deletions tensorboard/data/server/masked_crc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ limitations under the License.

//! Checksums as used by TFRecords.

use std::fmt::{self, Debug};
use std::fmt::{self, Debug, Display};

/// A CRC-32C (Castagnoli) checksum that has undergone a masking permutation.
///
Expand All @@ -30,7 +30,13 @@ pub struct MaskedCrc(pub u32);
// Implement `Debug` manually to use zero-padded hex output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant for Debug, now that it writes Masked({}) without using the :#010x padding option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. The point of the comment is to explain why we don’t
just write #[derive(Debug)]. This patch changes the implementation,
but not the behavior. If we derived Debug, the output would still be
MaskedCrc(4660) rather than MaskedCrc(0x1234), so the comment is
still as helpful as it was before.

In fact, I think it’s a bit more helpful than before, since before the
:#010x? was at least immediately following in the code. Now, if you
don’t look at the Display implementation, it’d be reasonable to ask,
“this just wraps the Display in the struct name; at that point, why
not just derive Debug?”—which the comment answers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the tests answer it, too. :-)

impl Debug for MaskedCrc {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "MaskedCrc({:#010x?})", self.0)
write!(f, "MaskedCrc({})", self)
}
}

impl Display for MaskedCrc {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why the '_ is needed in fmt::Formatter<'_>.

In Rust docs, some examples don't include it [1], while others do [2]. From what I gather, '_ is declaring a lifetime annotation for the Formatter, but we aren't using '_ anywhere else. Could you help me understand why it is needed?

[1] https://doc.rust-lang.org/std/fmt/struct.Formatter.html#examples
[2] https://doc.rust-lang.org/std/fmt/trait.UpperHex.html#examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right—it isn’t needed here. You can read '_ as “infer the
correct lifetime”. The general feature is called lifetime elision, and
the '_ syntax was introduced in rust-lang/rfcs#2115; see conversation
for discussion, or the diff for the final text:

  • You can write '_ to explicitly elide a lifetime, and it is deprecated to
    entirely leave off lifetime arguments for non-& types

The reason that it’s here is that I write trait implementations by
typing impl Display for MaskedCrc {} and then asking rust-analyzer to
fill out method stubs for all required methods (<Space>j1 in my
editor), which generates:

impl Display for Foo {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        todo!()
    }
}

Given that it’s optional, I don’t have a strong preference. In the
particular case of fmt::Formatter, I do actually find it helpful to
recall that Formatter is parameterized by a lifetime, since this
affects how you can use it and isn’t entirely obvious. As the RFC says:
“The '_ marker makes clear to the reader that borrowing is
happening
, which might not otherwise be clear.”

I’m also weakly inclined to deem it acceptable to write the code that
rust-analyzer generates, so that I don’t have to always correct it. But
if we want to have a policy of eliding lifetimes implicitly rather than
implicitly absent a strong reason, I would be open to considering that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the links and explanation!

The RFC sheds a lot of light for me on why they decided it was good to explicitly use the '_ placeholder. I don't yet have an opinion on when we should do it, but Rust reference also says "For lifetimes in paths, using '_ is preferred", so in this case it's seems idiomatic.

write!(f, "{:#010x?}", self.0)
}
}

Expand Down Expand Up @@ -78,9 +84,11 @@ mod tests {
#[test]
fn test_debug() {
let long_crc = MaskedCrc(0xf1234567);
assert_eq!(format!("{}", long_crc), "0xf1234567");
assert_eq!(format!("{:?}", long_crc), "MaskedCrc(0xf1234567)");

let short_crc = MaskedCrc(0x00000123);
assert_eq!(format!("{}", short_crc), "0x00000123");
assert_eq!(format!("{:?}", short_crc), "MaskedCrc(0x00000123)");
}
}
43 changes: 34 additions & 9 deletions tensorboard/data/server/tf_record.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ pub struct TfRecord {
}

/// A buffer's checksum was computed, but it did not match the expected value.
#[derive(Debug, PartialEq, Eq)]
#[derive(Debug, PartialEq, Eq, thiserror::Error)]
#[error("checksum mismatch: got {got}, want {want}")]
pub struct ChecksumError {
/// The actual checksum of the buffer.
pub got: MaskedCrc,
Expand Down Expand Up @@ -112,31 +113,29 @@ impl TfRecord {
}

/// Error returned by [`TfRecordReader::read_record`].
#[derive(Debug)]
#[derive(Debug, thiserror::Error)]
pub enum ReadRecordError {
/// Length field failed checksum. The file is corrupt, and reading must abort.
#[error("length checksum mismatch: got {}, want {}", .0.got, .0.want)]
BadLengthCrc(ChecksumError),
/// No fatal errors so far, but the record is not complete. Call `read_record` again with the
/// same state buffer once new data may be available.
///
/// This includes the "trivial truncation" case where there are no bytes in a new record, so
/// repeatedly reading records from a file of zero or more well-formed records will always
/// finish with a `Truncated` error.
#[error("record truncated")]
Truncated,
/// Record is too large to be represented in memory on this system.
///
/// In principle, it would be possible to recover from this error, but in practice this should
/// rarely occur since serialized protocol buffers do not exceed 2 GiB in size. Thus, no
/// recovery codepath has been implemented, so reading must abort.
#[error("record too large to fit in memory ({0} bytes)")]
TooLarge(u64),
/// Underlying I/O error. May be retryable if the underlying error is.
Io(io::Error),
}

impl From<io::Error> for ReadRecordError {
fn from(e: io::Error) -> Self {
ReadRecordError::Io(e)
}
#[error(transparent)]
Io(#[from] io::Error),
}

impl<R: Debug> Debug for TfRecordReader<R> {
Expand Down Expand Up @@ -405,6 +404,32 @@ mod tests {
}
}

#[test]
fn test_error_display() {
let e = ReadRecordError::BadLengthCrc(ChecksumError {
got: MaskedCrc(0x01234567),
want: MaskedCrc(0xfedcba98),
});
assert_eq!(
e.to_string(),
"length checksum mismatch: got 0x01234567, want 0xfedcba98"
);

let e = ReadRecordError::Truncated;
assert_eq!(e.to_string(), "record truncated");

let e = ReadRecordError::TooLarge(999);
assert_eq!(
e.to_string(),
"record too large to fit in memory (999 bytes)"
);

let io_error = io::Error::new(io::ErrorKind::BrokenPipe, "pipe machine broke");
let expected_message = io_error.to_string();
let e = ReadRecordError::Io(io_error);
assert_eq!(e.to_string(), expected_message);
}

#[test]
fn test_from_data() {
let test_cases = vec![
Expand Down