Skip to content

RUST-2228 Normalize conversions between raw types and base crate types #580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 28, 2025

Conversation

abr-egn
Copy link
Contributor

@abr-egn abr-egn commented Jul 23, 2025

RUST-2228

This fills in the gaps and removes the redundant explicit conversion methods, making all conversions be via the standard traits. As part of that, I promoted Utf8LossyDeserialization to a more general Utf8Lossy wrapper, and tidied up and expanded the documentation in the raw module.

@@ -246,9 +246,53 @@ impl<'a> RawBsonRef<'a> {
}
}

/// Convert this [`RawBsonRef`] to the equivalent [`RawBson`].
pub fn to_raw_bson(self) -> RawBson {
#[inline]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff lies. The body of to_raw_bson was moved to the From<RawBsonRef<'a>> for RawBson impl and append_to was untouched.

/// This wrapper has no effect on serialization behavior.
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Hash, Default)]
#[repr(transparent)]
pub struct Utf8Lossy<T>(pub T);
Copy link
Contributor Author

@abr-egn abr-egn Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this as the type-level flag for both lossy decoding and lossy deserialization means there's now a very nice equivalence between

let lossy_doc: Utf8Lossy<Document> = deserialize_from_slice(buf)?;

and

let lossy_doc: Utf8Lossy<Document> = RawDocument::decode_from_bytes(buf)?.try_into()?;

Beyond being aesthetically pleasing, I think this will help with discoverability.

@abr-egn abr-egn marked this pull request as ready for review July 23, 2025 14:47
@abr-egn abr-egn requested a review from a team as a code owner July 23, 2025 14:47
@abr-egn abr-egn requested a review from isabelatkinson July 23, 2025 14:47
@@ -633,7 +577,8 @@ impl ToOwned for RawDocument {
type Owned = RawDocumentBuf;

fn to_owned(&self) -> Self::Owned {
self.to_raw_document_buf()
// unwrap is ok here because we already verified the bytes in `RawDocumentRef::new`
Copy link
Contributor

@isabelatkinson isabelatkinson Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this copied comment is outdated - this should reference RawDocument::decode_from_bytes

This also got me thinking that decode_from_bytes reads as a little confusing for these method names; "decode" implies to me that the bytes are being parsed, but we're actually just checking a few invariants and then storing the bytes as-is. I think reverting to from_bytes could be more clear. WDYT? (Sorry I missed this in #554!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of two minds on the name. You're definitely right that we're not really doing any decoding here, so this could certainly mislead users to think this is an expensive call when it's not.

The other side, though, is that one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that; if we drop that then things like the lossy utf8 example becomes

let lossy_doc: Utf8Lossy<Document> = RawDocument::from_bytes(buf)?.try_into()?;

which is no longer explicit about which of decoding or deserializing it is.

I'm not sure what the right way forward is here; if we wanted to be Strictly Correct it would be something like RawDocument::from_bytes but then RawDocument::decode_i32 instead of get_i32 which gets annoying to work with and doesn't solve the issue when going via traits like try_into.

Maybe something like RawDocument[Buf]::lazy_decode? That's explicit about both which side of the API it's on and that it's not doing most of the work upfront.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that

That makes sense - I agree that lazy_decode would probably be more explicit, but with these considerations I'm also okay with leaving the names as-is. I think it's helpful to have from_bytes somewhere in the name to align with the usually pattern of from_<type> in constructors, and lazy_decode_from_bytes might be more verbose than it's worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the benefit of taking time to think, I'm now leaning towards the perspective that, now that serde is optional, encoding/decoding is the default mode of operation, and so I think we could drop the encode_/decode_ prefix without loss of clarity as long as the serde methods keep their prefixes. Specifically:

Document::encode_to_vec => to_vec
Document::encode_to_writer => to_writer
Document::decode_from_reader => from_reader
RawDocumentBuf::decode_from_bytes => from_bytes
RawDocumentBuf::decode_from_reader => from_reader
RawDocument::decode_from_bytes => from_bytes

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm!

isabelatkinson
isabelatkinson previously approved these changes Jul 25, 2025
@@ -633,7 +577,8 @@ impl ToOwned for RawDocument {
type Owned = RawDocumentBuf;

fn to_owned(&self) -> Self::Owned {
self.to_raw_document_buf()
// unwrap is ok here because we already verified the bytes in `RawDocumentRef::new`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that

That makes sense - I agree that lazy_decode would probably be more explicit, but with these considerations I'm also okay with leaving the names as-is. I think it's helpful to have from_bytes somewhere in the name to align with the usually pattern of from_<type> in constructors, and lazy_decode_from_bytes might be more verbose than it's worth.

@abr-egn abr-egn force-pushed the RUST-2228/conversions branch from c6e36e7 to 86e8f23 Compare July 28, 2025 10:51
@abr-egn abr-egn requested a review from isabelatkinson July 28, 2025 11:03
@abr-egn abr-egn merged commit 5622ad4 into mongodb:main Jul 28, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants