RUST-2228 Normalize conversions between raw types and base crate types #580

abr-egn · 2025-07-23T13:54:10Z

This fills in the gaps and removes the redundant explicit conversion methods, making all conversions be via the standard traits. As part of that, I promoted Utf8LossyDeserialization to a more general Utf8Lossy wrapper, and tidied up and expanded the documentation in the raw module.

abr-egn · 2025-07-23T13:57:41Z

src/raw/bson_ref.rs

@@ -246,9 +246,53 @@ impl<'a> RawBsonRef<'a> {
        }
    }

-    /// Convert this [`RawBsonRef`] to the equivalent [`RawBson`].
-    pub fn to_raw_bson(self) -> RawBson {
+    #[inline]


This diff lies. The body of to_raw_bson was moved to the From<RawBsonRef<'a>> for RawBson impl and append_to was untouched.

abr-egn · 2025-07-23T14:14:33Z

src/utf8_lossy.rs

+/// This wrapper has no effect on serialization behavior.
+#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Hash, Default)]
+#[repr(transparent)]
+pub struct Utf8Lossy<T>(pub T);


Using this as the type-level flag for both lossy decoding and lossy deserialization means there's now a very nice equivalence between

let lossy_doc: Utf8Lossy<Document> = deserialize_from_slice(buf)?;

and

let lossy_doc: Utf8Lossy<Document> = RawDocument::decode_from_bytes(buf)?.try_into()?;

Beyond being aesthetically pleasing, I think this will help with discoverability.

isabelatkinson · 2025-07-23T21:33:33Z

src/raw/document.rs

@@ -633,7 +577,8 @@ impl ToOwned for RawDocument {
    type Owned = RawDocumentBuf;

    fn to_owned(&self) -> Self::Owned {
-        self.to_raw_document_buf()
+        // unwrap is ok here because we already verified the bytes in `RawDocumentRef::new`


I think this copied comment is outdated - this should reference RawDocument::decode_from_bytes

This also got me thinking that decode_from_bytes reads as a little confusing for these method names; "decode" implies to me that the bytes are being parsed, but we're actually just checking a few invariants and then storing the bytes as-is. I think reverting to from_bytes could be more clear. WDYT? (Sorry I missed this in #554!)

I'm of two minds on the name. You're definitely right that we're not really doing any decoding here, so this could certainly mislead users to think this is an expensive call when it's not.

The other side, though, is that one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that; if we drop that then things like the lossy utf8 example becomes

let lossy_doc: Utf8Lossy<Document> = RawDocument::from_bytes(buf)?.try_into()?;

which is no longer explicit about which of decoding or deserializing it is.

I'm not sure what the right way forward is here; if we wanted to be Strictly Correct it would be something like RawDocument::from_bytes but then RawDocument::decode_i32 instead of get_i32 which gets annoying to work with and doesn't solve the issue when going via traits like try_into.

Maybe something like RawDocument[Buf]::lazy_decode? That's explicit about both which side of the API it's on and that it's not doing most of the work upfront.

one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that

That makes sense - I agree that lazy_decode would probably be more explicit, but with these considerations I'm also okay with leaving the names as-is. I think it's helpful to have from_bytes somewhere in the name to align with the usually pattern of from_<type> in constructors, and lazy_decode_from_bytes might be more verbose than it's worth.

With the benefit of taking time to think, I'm now leaning towards the perspective that, now that serde is optional, encoding/decoding is the default mode of operation, and so I think we could drop the encode_/decode_ prefix without loss of clarity as long as the serde methods keep their prefixes. Specifically:

Document::encode_to_vec => to_vec
Document::encode_to_writer => to_writer
Document::decode_from_reader => from_reader
RawDocumentBuf::decode_from_bytes => from_bytes
RawDocumentBuf::decode_from_reader => from_reader
RawDocument::decode_from_bytes => from_bytes

WDYT?

isabelatkinson · 2025-07-25T14:39:30Z

src/raw/document.rs

@@ -633,7 +577,8 @@ impl ToOwned for RawDocument {
    type Owned = RawDocumentBuf;

    fn to_owned(&self) -> Self::Owned {
-        self.to_raw_document_buf()
+        // unwrap is ok here because we already verified the bytes in `RawDocumentRef::new`


one of the big goals of #554 was to make it clear where the library was directly working with BSON ("encoding/decoding") and when it was interfacing with serde ("serializing/deserializing"). Constructing a RawDocument is the entry point to the decode side of that

That makes sense - I agree that lazy_decode would probably be more explicit, but with these considerations I'm also okay with leaving the names as-is. I think it's helpful to have from_bytes somewhere in the name to align with the usually pattern of from_<type> in constructors, and lazy_decode_from_bytes might be more verbose than it's worth.

abr-egn commented Jul 23, 2025

View reviewed changes

abr-egn marked this pull request as ready for review July 23, 2025 14:47

abr-egn requested a review from a team as a code owner July 23, 2025 14:47

abr-egn requested a review from isabelatkinson July 23, 2025 14:47

isabelatkinson reviewed Jul 23, 2025

View reviewed changes

isabelatkinson previously approved these changes Jul 25, 2025

View reviewed changes

abr-egn dismissed isabelatkinson’s stale review via c6e36e7 July 28, 2025 10:46

abr-egn added 9 commits July 28, 2025 11:50

to_raw_bson

ad848af

to_raw_document_buf

e259838

utf8_lossy

ea97ffd

from_document

516c8cf

to_raw_array_buf

89cf749

document and fix doctests

ef7b3e9

better documentation

575fba5

fix subcrates

615ea57

rename

86e8f23

abr-egn force-pushed the RUST-2228/conversions branch from c6e36e7 to 86e8f23 Compare July 28, 2025 10:51

abr-egn requested a review from isabelatkinson July 28, 2025 11:03

isabelatkinson approved these changes Jul 28, 2025

View reviewed changes

abr-egn merged commit 5622ad4 into mongodb:main Jul 28, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RUST-2228 Normalize conversions between raw types and base crate types #580

RUST-2228 Normalize conversions between raw types and base crate types #580

Uh oh!

abr-egn commented Jul 23, 2025

Uh oh!

abr-egn Jul 23, 2025

Uh oh!

abr-egn Jul 23, 2025 •

edited

Loading

Uh oh!

isabelatkinson Jul 23, 2025 •

edited

Loading

Uh oh!

abr-egn Jul 24, 2025

Uh oh!

isabelatkinson Jul 25, 2025

Uh oh!

abr-egn Jul 25, 2025

Uh oh!

isabelatkinson Jul 25, 2025

Uh oh!

isabelatkinson Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

RUST-2228 Normalize conversions between raw types and base crate types #580

RUST-2228 Normalize conversions between raw types and base crate types #580

Uh oh!

Conversation

abr-egn commented Jul 23, 2025

Uh oh!

abr-egn Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

abr-egn Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isabelatkinson Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abr-egn Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

isabelatkinson Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

abr-egn Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

isabelatkinson Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

isabelatkinson Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

abr-egn Jul 23, 2025 •

edited

Loading

isabelatkinson Jul 23, 2025 •

edited

Loading