-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "stream did not contain valid UTF-8" using String::from_utf8_lossy #380
Conversation
I'm testing this and it didn't entirely fix the issue, only one case. I will keep investigating. |
Hi, what problem are you running into and trying to fix? Right now I have no idea beyond that you're evidently having trouble with non-UTF8 names for Instances? |
^ Added String::from_utf8_lossy to Name property I have .rbxl files which, when I decode with rbx_binary::from_reader(input) Throw this UTF-8 error. |
Just to clarify, is there a reason why you want/need invalid UTF-8 for the name property on Instances? We've had discussions about this before internally and it's unclear if it's actually beneficial to support names like that. If we were going to, we'd probably want to swap to using a string replacement like |
So far I have found the issue with both Name and Source property. My current project is that I'm converting the 1006 .rbxl files I downloaded off my place version history into git version history, but I've also run into this issue with map files for my actual game which I wanted to scan for scripts. So it's both reading files I created a very long time ago, and reading files that I didn't create. |
All 1006 files have decoded with these changes! I now have a git history all the way back to 2015! |
I updated the original message with some contextual information, sorry about the initial lack of clarity. |
Going to tag @kennethloeffler in so that he can share his thoughts on this. I don't want to leave you hanging on this and between us we should be able to make a final call on how we want to solve this problem. |
We've had a bit of discussion about this aspect of rbx-dom already: #313. A lossy conversion is okay, but is... well, lossy. Roblox's implementation doesn't do any conversions whatsoever on I think I'm okay with doing a lossy conversion so rbx-dom can successfully-ish decode more files, as long as we document it and print a warning when we have to do it so that there are no murky surprises. But we should really consider bringing rbx-dom's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import std::borrow::Cow
at the top of rbx_binary/src/deserializer/state.rs
and then apply the following suggestions. Also make sure to run rustfmt before you push - that's why the CI failed!
Do we need to do anything to rbx_xml for this? @Dekkonot |
How about making lossy string conversion an optional decoder setting for now? There are lots of maps on my game that simply don't work because of this, so I'm using this patched version. I can agree that effectively destroying strings by default can be seen as a bug that should not be relied on. |
Meh, it's better than the behavior we have now (which is just to fall over and exit without producing a result). I think it should be fine until we properly support non UTF-8 Make sure to import |
This needs an entry in the changelog but otherwise looks good to me.
Unicode in general doesn't currently work in rbx_xml so no, but only because we're in a bad spot anyway. @kennethloeffler |
Squashed and rebased |
061e32e
to
3e4be0d
Compare
Added changelog |
Does Roblox use UTF-16? I know there's a crate for that, maybe that's desirable instead of BStr. |
No, Roblox uses UTF-8 encoding. It just also allows binary data to be put into strings. What we really need to do is not pretend that Roblox strings are always UTF-8. That's outside of scope for this PR though. |
@kennethloeffler When you have a moment, can you take a look at this again? Turns out it needs your approval, haha. |
Co-authored-by: Kenneth Loeffler <[email protected]>
3e4be0d
to
048b22d
Compare
Rebased onto master to resolve merge conflicts. |
I am using
rbx_binary::from_reader(input)
to load files in my rust program, and some of the files make the function spit out the error "stream did not contain valid UTF-8".I decided that I'm proficient enough at Rust to find out how to fix this pesky issue myself, and here's what I've come up with. I found two places which would produce this error and have patched them using the
String::from_utf8_lossy
function.Here is an image of an offending section of a

Script.Source
property after it was mercilessly butchered into valid UTF-8: