-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fn:parse-xml: DTDs, external resources #1860
Comments
My instinct is to avoid over-complicating it, but yes, in principle, these options could be added. The difficulty is that it might be difficult to make them fully interoperable - the XML parser in C#, for example, offers a rather different range of options. (On some environments it's hard to find an XML parser that does DTD validation at all.) The basic concept of DTD validation is defined in the XML spec and I don't think that gives any problems (yes, it DOES involve expanding external entities): perhaps additional options are best left vendor-defined. |
My two pence: Names beginning "xml" are reserved, so I think a reasonable user expectation is that entities declared in the internal subset will be expanded unless some extra effort has been taken to prevent that. If you don't expand the entity, then that's an error. The XDM has no way to represent an unexpanded entity reference. |
Thanks; your pence are worth many pounds. I believe that an option could be helpful that allows you to use Our processor comes with a system-wide Boolean Raising an error, returning |
Returning |
Xerces seems to be lenient. The following snippet outputs String string = "<!DOCTYPE root SYSTEM 'root.dtd'><root>[&arrow;]</root>";
InputSource is = new InputSource(new StringReader(string));
SAXParserFactory sf = SAXParserFactory.newInstance();
sf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
sf.newSAXParser().parse(is, new DefaultHandler() {
@Override public void characters(char[] ch, int start, int length) throws SAXException {
System.out.println("! " + new String(ch, start, length));
}
@Override public void skippedEntity(String name) throws SAXException {
System.out.println("? " + n);
}
@Override public void warning(SAXParseException e) throws SAXException { throw e; }
@Override public void error(SAXParseException e) throws SAXException { throw e; }
@Override public void fatalError(SAXParseException e) throws SAXException { throw e; }
}); |
I think there are a few things going on here. At the level of the handler, whether or not the parser is conformant is kind of up to you, the handler writer. And at this level of detail, the XML recommendation is actually inadequate. I think you can read the XML recommendation as requiring that a processor report only characters and processing instructions. That's clearly not what was intended, but if you're trying to work out what the spec says not what its authors meant, that's the sort of trouble you get into. You can write a handler that changes all element names to "SPOON" or discards attributes that have names that don't begin with a Unicode code point that's an odd number, or, well, you're mostly limited by your imagination. Users will surely appreciate having options that allow them to load documents that have undeclared entities in them, or to load or not load external subsets, to expand or not expand entity references, etc. Trouble is, which ones of those options an implementation can actually support may depend on the underlying libraries that are being used. Xerces has a lot of options and it's widely available. But it's not the same as .NET's System.Xml, and then there's Node.js where things are really impoverished. I'm not really sure what the right answer is. |
The solution might be to define some options that implementations are not required to support: for example "external-entity-expansion-limit" if supported limits the number of external entity references that will be expanded, with the precise effect being implementation-dependent, and processors may ignore the option if not supported by the underlying parser. |
A classical example for an XXE attack that we may want to prevent: `<!DOCTYPE root [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<root>&xxe;</root>`
=> parse-xml() |
It seems to me that preventing access to |
In the context of Saxon (currently looking at both Saxon 12.5 Java and SaxonC Java) I had hoped that the |
The text doesn’t say much about what DTD validation means. Is my assumption correct that it boils down to a
SAXParserFactory.setValidating
call in Java?What about DTDs in general? Given the following snippets (using the default
false
for DTD validation)……should the result be
<xml/>
,<xml>→</xml>
, or an error? In other words, should the (potentially external)xml.dtd
resource be resolved and interpreted?Maybe we should introduce an additional
DTD
option (or options?) to control the loading of external DTDs and the handling of entities, for example:Thoughts are welcome.
The text was updated successfully, but these errors were encountered: