-
-
Notifications
You must be signed in to change notification settings - Fork 388
leave html entities untouched, it breaks pages with weird encodings #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Lorenzo, Thank you very much for reporting this. It's likely the result of html5ever (the HTML parsing library we're using) treating every document as utf-8 by default and not automatically parsing that charset meta tag. More info on the likely cause: If there's no flag available to preserve html entities, it'll likely be worth substituting those manually for non-utf8 documents upon save. |
Ouch, a five year old bug... well, thank you for the info and the quick reply! |
Is there any progress on this issue? |
That's the next big patch I'm currently working on. What encoding is not working in your case, @moriakijp? |
The fix is now in |
Uh oh!
There was an error while loading. Please reload this page.
Example: http://www.the-spoiler.com/RPG/New.World.Computing/might.and.magic6.3/mm6.htm
Look at the copyright, it's
"©"
in the original, monolith translates that to the utf8 copyright character which is wrong because that page sayscharset=iso-8859-1"
so it renders to a bogus character + the copyright sign (tested on windows/linux/openbsd).I did meet other pages like that, this is just the one where I noticed it.
Testcase:
The text was updated successfully, but these errors were encountered: