-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message codes revamp #1092
Comments
Hi @rdeltour We use the message codes, but not yet in an automated way so they can be changed for us. We are happy with the current system, but we don't really have a outspoken opinion about how it should be better. The only thing that would be interesting to add is if something is an error or warning. We select which problems we need to deal with. We for example could say we can ignore all the warnings. This is important because the Severity element in the XML output says error even if it is a warning. We select on this severity element is our process. But also valid files (status is valid and well-formed) can have a severity error. Thanks for all the work on epubchevk! Sam |
Don't know if this helps, but you might want to have a look at how VeraPDF (a conformance checker for the PDF/A standards) handles this. They created validation rules, where each rule contains an explicit reference to the standard it applies to, as well as the clause in the standard on which the rule is based. Below is an example: <rule specification="ISO 19005-1:2005" clause="6.7.2" testNumber="1" status="failed" passedChecks="0" failedChecks="1">
<description>The document catalog dictionary of a conforming file shall contain the Metadata key.</description>
<object>PDDocument</object>
<test>metadata_size == 1</test>
<check status="failed">
<context>root/document[0]</context>
</check>
</rule> The obvious advantage is that it establishes a direct link between the validator and the filespec. Perhaps it is possible to use something similar for EPUBCheck? A possible argument against doing this is that it might complicate things if new versions of the filespec are organised differently than the current one, since that would break this link, and fixing this could turn into a major pain, especially if there are frequent updates to the spec. Also, looking at the evolution of EPUB thus far, I think changes to the format have been both more frequent and more radical than changes to the PDF/A profiles, so the situations for both formats may not be completely comparable. In any case this would require quite a bit of coordination between the writers of the filespec and the EPUBCheck developers. It might also be a good idea to get in touch with the VeraPDF developers at the Open Preservation Foundation (OPF). One of the other tools they're maintaining is JHOVE, and they're currently working on a JHOVE EPUB module that wraps EPUBCheck. So they will probably be both interested in this and willing to help. |
The VST system uses the short-codes from EPUBCheck to interpret or discard preflight messages
|
Yes, we use, and are dependent upon, these error codes. These form the basis of our ingestion whitelisting system. The existence of these codes and their immutability makes integration of each updated epubcheck version easy for Google Play. We would vote (strongly) for "stay the course." |
I contributed the first iteration of the EPUB module for JHOVE mentioned above and it's part of the current release candidate. It makes use of the severity level and the 3-letter prefix ( |
Hi @rdeltour, We use error codes for automated analysis on multiple files. It will be harder without them. Refactoring based on specs could be a good idea. |
TL;DR: should we revamp EPUBCheck’s message code system? if yes, how?
Background: EPUBCheck message codes
All the validation messages (e.g. warnings and errors) produced by EPUBCheck are associated to codes. For example, message codes can be
RSC-005
,PKG-007
, etc.The first 3 letters indicate a topic the message is related to (e.g.
HTM
for XHTML Content Documents,PKG
for package-related issues,NAV
for issues relate to the Navigation Documents). The second part of the code is a number which is incremented when a new check is implemented an we need a new message.Drawbacks of the current system
These codes and their organization can be confusing, for various reasons.
First, the topic code (the first 3 letters) may not be always helping what the error is related to:
RSC-005
(parsing error), whether it’s about the Package Document, an XHTML Content Document, a Navigation Document, etc… Similarly, all warnings will be reported asRSC-017
.OPF-
andPKG-
is not obvious. "OPF" is EPUB 2 legacy, so errors related to the Package Document (.opf
extension) will beOPF-
and notPKG-
.PKG-
errors are more related to the package as a collection of files. So for instance, when the package document is missing ("OPF file could not be found), the error is reported asPKG-020
, notOPF-something
!HTM-048
is about SVG fixed-layout documents. Or theMED
category is used for both media files (video, images), but also sometimes media overlays.Then, the numbering scheme is a bit wonky:
RSC-005
, while parsing warnings areRSC-017
.OPF-004
,OPF-004a
,OPF-004b
, etc.Possible refactoring
There are several way to revamp the message code system, for instance:
Questions
The text was updated successfully, but these errors were encountered: