Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing tags and attributes to nh3.clean() #9

Open
carltongibson opened this issue Jul 19, 2024 · 5 comments
Open

Passing tags and attributes to nh3.clean() #9

carltongibson opened this issue Jul 19, 2024 · 5 comments

Comments

@carltongibson
Copy link
Contributor

Hi @matthiask β€”Β thanks for the great package, as always 🎁

I have a question about the nh3 sanitizer:

def _nh3_sanitizer():
from nh3 import clean
return lambda x: _actually_empty(clean(x))

Here we’re using clean but I wonder if we could pass the tags and attributes arguments as well?

Deriving those from the types used to instantiate the prose editor would allow precisely limiting the allowed HTML, I’m hoping.

What do you think?

@matthiask
Copy link
Owner

Thank you :)

So, I'm not 100% sure if this would be better achieved through settings, field arguments or some other way or if maybe it shouldn't be done at all.

The argument for not making the nh3 invocation configurable is that nh3's job here isn't to clean up the HTML, it's only to protect against XSS when someone circumvents the ProseMirror widget. As long as editors do not do anything strange ProseMirror will always serialize a cleaned up version of the HTML fragment already, even if you use the raw HTML popup. (That's also the reason why I use nh3 instead of something more opinionated such as html-sanitizer.)

@carltongibson
Copy link
Contributor Author

carltongibson commented Jul 19, 2024

So @adamchainz wrote a post on nh3 in which he says:

The nh3 defaults (from Ammonia) are generally safe, allowing only general content tags and attributes, but they are still pretty wide, allowing 75 different tags. Allowing some tags on this default list, such as <article>, may lead to surprising results. In the worst case, an attacker may be able to craft official-looking content with evil instructions to users.

Typically, incoming HTML fragments come from a rich text editor on your site. In this case, use the arguments of nh3.clean() to limit allowed tags and attributes to only those your editor will create. This way, there won’t be any chance of surprise content.
(src)

I find that kind of compelling, and we do know the restricted set of tags and attributes that should be allowed (I think πŸ€”)

SanitizedProseEditorField knows the config that it will pass to the widget at __init__() so could determine the tags and attributes allowed and pass those parameters in a closure to the sanitize function (or similar). (Better than a setting no?)

I think it's safe as is, but could maybe be more so (hardened) by restricting the allowed tags.

No sure either. But just working it through with you.

@carltongibson
Copy link
Contributor Author

I guess a validator here might show if someone was trying to be naughty πŸ˜…

@matthiask
Copy link
Owner

matthiask commented Jul 19, 2024

I find that kind of compelling, and we do know the restricted set of tags and attributes that should be allowed (I think πŸ€”)

Yes, by looking at e.g. prosemirror-schema-basic, for example the link mark:
https://github.com/ProseMirror/prosemirror-schema-basic/blob/4e358f5443f15abe09406c9327e35aa9e3c1bf8c/src/schema-basic.ts#L113-L123

I think the nodes and marks definitions in prosemirror-schema-basic and prosemirror-schema-list have been unchanged (practically) since the beginning. I have had discussions in private about allowing id and name attributes, and maybe target could be useful, but that's a different issue.

SanitizedProseEditorField knows the config that it will pass to the widget at init() so could determine the tags and attributes allowed and pass those parameters in a closure to the sanitize function (or similar). (Better than a setting no?)

Deriving the list of allowed attributes and tags from the types list sounds like the way to go!

I think it's safe as is, but could maybe be more so (hardened) by restricting the allowed tags.
[...]
I guess a validator here might show if someone was trying to be naughty πŸ˜…

I totally agree with that, and I think there's value in hardening! A little more opinions on the server side would certainly be useful. Just to spell it out for everyone else, the cleaning is already more opinionated than nh3 is since e.g. the empty pararaph <p></p> is replaced by an empty string so that "no content" is actually falsy instead of truthy. So, adding this additional hardening wouldn't be introducing completely new concepts into the code.

@carltongibson
Copy link
Contributor Author

Great. Makes sense. I will try and prototype something when I get a small window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants