-
Notifications
You must be signed in to change notification settings - Fork 407
TestSuiteCleanup
EpubCheck’s test suite has grown organically, it consists of both packaged and unpackaged EPUB content, with no naming convention, and barely organized.
The EpubCheck TF of the EPUB 3 CG decided to start cleaning the test suite so that it is easier to use and maintain.
It's issues like #798 which drives the maintainers crazy because they have to change ~140 files in order to change a severity level of a message where there should only be one test for EPUB 2 and maybe another for EPUB 3 but no more...
Every contribution helps… so big thanks are in order for those brave enough to roll up their sleeves and join the effort!
The followings instruction describe how to pick a test from the old test suite and clean it up to port it to the new test suite:
TBD: describe how to distribute the work among the volunteers
Whether you’re directly working on a clone of the epubcheck
repo or on a private fork, we recommend that you create a new branch, so that it will be easier to submit a pull request of the changes you make. We propose the following naming convention for the branches, so that they’re easier to track:
-
tests/<your-github-name>-batchN
for a batch of tests (for exampletests/rdeltour-batch1
). -
tests/<name-of-the-old-test>
for a single test (for exampletests/20/valid/lorem-basic
)
In order to port the old test to the new test suite, you should first have a good understanding of what the test is about. Some guess work may be involved here.
- sometimes the name of the test sample would be explicit enough.
For instance, the (expanded) EPUB named
fallbacks-circular
in the directory30/expanded/invalid
indicates that the data represent an EPUB with a circular fallback chain (which is invalid). - sometimes the test sample is named after a specific issue. Look up the issue in the tracker to see if the issue’s description gives any clue.
For instance, the sample EPUB
issue188
in the directory30/expanded/valid
refers to issue #188, which implies that the test was created to assert that EPUB containing a resource with a name including the character'+'
will not create a false-positive (i.e. report an error for a valid EPUB). - sometimes the commit history will give you a hint about what the test is about.
For instance, you can get the commit history of the sample EPUB
lorem-remote-4
in directory30/expanded/invalid
by invoking the commandgit log src/test/resources/30/expanded/invalid/lorem-remote-4
. This will notably list commit5a68cc8
, which says "Fixes Issue 202
". And issue #202 will inform you that this is about some case of remote audio/video resources being incorrectly reported as invalid. - other times, you’ll just find an obvious answer by just directly looking at the test data…
If the test in question is an "error or warning" test (i.e. a test that asserts that EpubCheck correctly reports errors or warnings), it can be informative to run it and look at the test log.
You can inspect the execution log from different places:
The easiest is simply to directly run EpubCheck with the test data as input.
For instance:
$ epubcheck src/test/resources/20/epub/invalid/issue236.epub
Which will print the following to the standard output:
ERROR(OPF-017): src/test/resources/20/epub/invalid/issue236.epub/META-INF/container.xml(-1,-1): The attribute "full-path" on element "rootfile" must not be empty.
ERROR(OPF-016): src/test/resources/20/epub/invalid/issue236.epub/META-INF/container.xml(-1,-1): The element "rootfile" is missing its required attribute "full-path".
Check finished with errors
From which you can infer what the test is about.
You can also search for the Java test method that uses this test data.
For example, searching the Java test code for the string "issue236.epub" indicates that the test data from the previous example is used in the testMissingFullpathAttributeIssue236
method, at line #236 of the file Epub20CheckTest.java
class:
@Test
public void testMissingFullpathAttributeIssue236() {
Collections.addAll(expectedErrors, MessageId.OPF_017, MessageId.OPF_016);
List<MessageId> expectedFatals = new ArrayList<MessageId>();
//container.xml missing @full-path attribute or @full-path is empty
// issue 95 / issue 236
testValidateDocument("invalid/issue236.epub");
}
The code alredy gives you interesting bits of information, notably the error (or warning) codes that are expected to be raised, and here an inline comment also describes what this test is about.
Having identified the Java test method, you can now run the unit test with Maven’s test
command, filtering the tests run only this method with the test
property. For example, running:
$ mvn test -Dtest=Epub20CheckTest#testMissingFullpathAttributeIssue236
Would run the test and print (among other Maven output lines):
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running com.adobe.epubcheck.api.Epub20CheckTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.216 sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
This is not very informative however (at least when the test passes), so most of EpubCheck’s test runner methods can be used with a boolean as their last argument to indicate that they be run in verbose mode (i.e. output the execution log).
In the preceding example, you’d enable the verbose mode by replacing line 241 with:
testValidateDocument("invalid/issue236.epub", true);
and re-running the test would now print EpubCheck’s output:
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running com.adobe.epubcheck.api.Epub20CheckTest
invalid/issue236.epub: Errors: 2; Warnings: 0
ERROR: invalid/issue236.epub:META-INF/container.xml: The attribute "full-path" on element "rootfile" must not be empty.
ERROR: invalid/issue236.epub:META-INF/container.xml: The element "rootfile" is missing its required attribute "full-path".
INFO: invalid/issue236.epub:META-INF/container.xml[creation date] 2013-03-24T13:15:52Z
INFO: invalid/issue236.epub[EPUB renditions count] 2
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.886 sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
Note that if you feel more familiar using Eclipse –or any other IDE– than the command line, you can very well run all the tests –or single test methods– directly from your IDE.
Once the scope of the test data has been identified, make sure that the test is atomic enough. In other words, make sure that the data is used to test one and only one reason for failure or sucess.
If the test data is used to test more that one case, split the test in several individual test samples to make sure each covers only one case.
Now that you have identified the test cases covered by the test data, look up the existing test data that have been already ported to the new test suite, and ensure that the case is not already covered there. If it is already, then the test data can be discarded. Else, move on to the next step.
Reimplement the test based on one of the minimal EPUB templates. The new refactored test data should only add the bare mininum to these templates, in order to reduce the risk of having to change the test data in the future, due to regression bugs or update of EpubCheck’s logic.
Now that the test data has been reduced and optimized, rename it according to the naming conventions (File Naming) and move it to the relevant directory in the new test suite, following the recommended organization (site structure).
Your branch should now contain the changes implementing the port of some test data to the new structure. Submit these changes in a new pull request against the project’s master branch!
The instructions above cover the step needed to port the existing test data to the new test suite.
Once done, several additional tasks will need to be achieved to finalize the test suite clean up:
- make sure that each error message is tested at least once (this is almost true already).
- add some missing tests if required.
- refactor the Java runner classes to simplify them and make them easier to maintain.