-
Notifications
You must be signed in to change notification settings - Fork 147
RUST-2228 Normalize conversions between raw types and base crate types #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -246,9 +246,53 @@ impl<'a> RawBsonRef<'a> { | |||
} | |||
} | |||
|
|||
/// Convert this [`RawBsonRef`] to the equivalent [`RawBson`]. | |||
pub fn to_raw_bson(self) -> RawBson { | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff lies. The body of to_raw_bson
was moved to the From<RawBsonRef<'a>> for RawBson
impl and append_to
was untouched.
/// This wrapper has no effect on serialization behavior. | ||
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Hash, Default)] | ||
#[repr(transparent)] | ||
pub struct Utf8Lossy<T>(pub T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this as the type-level flag for both lossy decoding and lossy deserialization means there's now a very nice equivalence between
let lossy_doc: Utf8Lossy<Document> = deserialize_from_slice(buf)?;
and
let lossy_doc: Utf8Lossy<Document> = RawDocument::decode_from_bytes(buf)?.try_into()?;
Beyond being aesthetically pleasing, I think this will help with discoverability.
src/raw/document.rs
Outdated
@@ -633,7 +577,8 @@ impl ToOwned for RawDocument { | |||
type Owned = RawDocumentBuf; | |||
|
|||
fn to_owned(&self) -> Self::Owned { | |||
self.to_raw_document_buf() | |||
// unwrap is ok here because we already verified the bytes in `RawDocumentRef::new` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this copied comment is outdated - this should reference RawDocument::decode_from_bytes
This also got me thinking that decode_from_bytes
reads as a little confusing for these method names; "decode" implies to me that the bytes are being parsed, but we're actually just checking a few invariants and then storing the bytes as-is. I think reverting to from_bytes
could be more clear. WDYT? (Sorry I missed this in #554!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm of two minds on the name. You're definitely right that we're not really doing any decoding here, so this could certainly mislead users to think this is an expensive call when it's not.
The other side, though, is that one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument
is the entry point to the decode side of that; if we drop that then things like the lossy utf8 example becomes
let lossy_doc: Utf8Lossy<Document> = RawDocument::from_bytes(buf)?.try_into()?;
which is no longer explicit about which of decoding or deserializing it is.
I'm not sure what the right way forward is here; if we wanted to be Strictly Correct it would be something like RawDocument::from_bytes
but then RawDocument::decode_i32
instead of get_i32
which gets annoying to work with and doesn't solve the issue when going via traits like try_into
.
Maybe something like RawDocument[Buf]::lazy_decode
? That's explicit about both which side of the API it's on and that it's not doing most of the work upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that
That makes sense - I agree that lazy_decode
would probably be more explicit, but with these considerations I'm also okay with leaving the names as-is. I think it's helpful to have from_bytes
somewhere in the name to align with the usually pattern of from_<type>
in constructors, and lazy_decode_from_bytes
might be more verbose than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the benefit of taking time to think, I'm now leaning towards the perspective that, now that serde is optional, encoding/decoding is the default mode of operation, and so I think we could drop the encode_
/decode_
prefix without loss of clarity as long as the serde
methods keep their prefixes. Specifically:
Document::encode_to_vec
=> to_vec
Document::encode_to_writer
=> to_writer
Document::decode_from_reader
=> from_reader
RawDocumentBuf::decode_from_bytes
=> from_bytes
RawDocumentBuf::decode_from_reader
=> from_reader
RawDocument::decode_from_bytes
=> from_bytes
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm!
src/raw/document.rs
Outdated
@@ -633,7 +577,8 @@ impl ToOwned for RawDocument { | |||
type Owned = RawDocumentBuf; | |||
|
|||
fn to_owned(&self) -> Self::Owned { | |||
self.to_raw_document_buf() | |||
// unwrap is ok here because we already verified the bytes in `RawDocumentRef::new` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that
That makes sense - I agree that lazy_decode
would probably be more explicit, but with these considerations I'm also okay with leaving the names as-is. I think it's helpful to have from_bytes
somewhere in the name to align with the usually pattern of from_<type>
in constructors, and lazy_decode_from_bytes
might be more verbose than it's worth.
c6e36e7
to
86e8f23
Compare
RUST-2228
This fills in the gaps and removes the redundant explicit conversion methods, making all conversions be via the standard traits. As part of that, I promoted
Utf8LossyDeserialization
to a more generalUtf8Lossy
wrapper, and tidied up and expanded the documentation in theraw
module.