Skip to content

Internal symbols array is publicly exposed and unsound #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Manishearth opened this issue Jul 5, 2023 · 6 comments · Fixed by #76
Closed

Internal symbols array is publicly exposed and unsound #75

Manishearth opened this issue Jul 5, 2023 · 6 comments · Fixed by #76

Comments

@Manishearth
Copy link

use data_encoding::Specification;

fn main() {
    let mut hex = {
        let mut spec = Specification::new();
        spec.symbols.push_str("0123456789abcdef");
        spec.encoding().unwrap()
    };


    hex.0 = (&[0xF6; 514][..]).into(); // created with non-ascii symbols
    let invalid_string = hex.encode(&[1, 2, 3, 4]);
   
    println!("created invalid string");
    println!("{:?}", invalid_string); // will panic
}

This does involve tweaking the field of the Encoding, but that's a public and not doc(hidden) field. At the very least it should be doc(hidden).

(at a meta level, this code would probably benefit from some kind of internal #[repr(transparent)] pub Ascii(u8); type so that it is very clear that the invariants are being upheld)

ia0 added a commit that referenced this issue Jul 5, 2023
The implementation itself (`InternalEncoding`) is already hidden, but the field
in the definition of `Encoding` is not.

Fixes #75
ia0 added a commit that referenced this issue Jul 5, 2023
The implementation itself (`InternalEncoding`) is already hidden, but the field
in the definition of `Encoding` is not.

Fixes #75
@ia0
Copy link
Owner

ia0 commented Jul 5, 2023

Thanks for the bug report! I didn't know it was possible to have doc(hidden) on fields. The field needs to be public for data-encoding-macro. The type was already doc(hidden) and is named InternalEncoding which are good indicators that users should not touch it. But making the field itself doc(hidden) is also another good hint. Sending #76 to fix that.

Note that eventually, when const fn support is powerful enough, data-encoding-macro would disappear and the field could become private again. Actually, it's probably also a good idea to make the field private and only expose an unsafe const fn, but that would be a breaking change because data-encoding-macro does not depend on an exact version. Adding to the wish list in #72.

@ia0 ia0 closed this as completed in #76 Jul 5, 2023
ia0 added a commit that referenced this issue Jul 5, 2023
The implementation itself (`InternalEncoding`) is already hidden, but
the field in the definition of `Encoding` is not.

Fixes #75
@Manishearth
Copy link
Author

@ia0 it may also be worth renaming the field something suitably scary, because it is relatively easy to not notice that x.0 is forbidden: it shows up in error messages and stuff.

I'd also recommend against using doc(hidden) fields when there are unsafe invariants; if possible it would be better to expose a const unsafe fn constructor and call it from the macro.

@ia0
Copy link
Owner

ia0 commented Jul 5, 2023

Yes, I think the const unsafe fn is the best option. Actually, I misremembered, I already have and use a const fn so I can make the field private (that might have been an oversight, or maybe deliberate to hide a breaking change over a few years). Making the const fn unsafe would still be a breaking change though, but I could also hide it under a few years (i.e. assuming users don't use a data-encoding version older by a year than the data-encoding-macro version).

@Manishearth
Copy link
Author

Yeah I think a doc(hidden) const fn is better than having a public-but-hidden field, you have to very explicitly find and call that API.

@anforowicz
Copy link

Are there additional follow-ups needed/expected here? This issue is closed, but it seems to me (see https://chromium-review.googlesource.com/c/chromium/src/+/6187726/comment/53544bd0_bf28981a/) that there may be some lingering soundness concerns here. Maybe I don't fully understand how the current code prevents violating UTF-8 encoding through the public API? FWIW I like the suggestion above to restrict all the symbols to ASCII (only allowing construction of Ascii wrapper via a public constructor that panics at runtime if a non-Ascii value is used).

@ia0
Copy link
Owner

ia0 commented Jan 23, 2025

Are there additional follow-ups needed/expected here?

Yes, using unsafe const fn is tracked in #72 under "Private Encoding implementation" as mentioned at the end of #75 (comment). This is unfortunately a breaking change because I did the mistake of using a too laxist version requirement for data-encoding in data-encoding-macro although the later actually depends on a private API. So this change needs to wait v3 which is under work with an ETA of 2 to 3 years.

However, if you believe this is an issue, I could see a temporary workaround. I could add an unstable-no-private-api feature to data-encoding and data-encoding-macro which does 2 things:

  • The user acknowledges that they will get a broken build if they use both data-encoding and data-encoding-macro without making sure the feature is either enabled in both crates or disabled in both crates.
  • The private API of Encoding uses unsafe const fn instead of field access.

This issue is closed, but it seems to me that there may be some lingering soundness concerns here

I agree the situation is not ideal and depends on what "public API" means. Is it what the library documents? Or is it whatever you can achieve with safe code? I believe this is the first. But I also agree that if you can make both match, it's better (not only to avoid genuine mistakes, but also to simply tracking of malicious usage).

That's why I don't consider the library unsound. If you access the "private" field (because undocumented and with an explicit type name), then you have library UB by breaking the library contract.

Making sure you can't have library UB with safe code is tracked in #72 as described above.

I like the suggestion above to restrict all the symbols to ASCII

This is not as simple because not all bytes are ASCII in the internal representation. The optional field width may be greater than 127. In v3, the internal representation may use something else than a byte slice, in which case using more precise types could be an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants