-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standarize the notation of subtests in TAP #2
Comments
One thing that did not seem to be mentioned on the TAP mailing list was that the potential for a sub test to look like this:
or
I believe most libraries would support a failure just fine for both of those, its simpler that requiring an update to the libraries and forces stricter reasoning of tests which is a bonus. |
We really want to have this. I'm not fond of the current mechanism, but given it's out there and there's a lot of stuff outputting it already it may be a passed station. |
Both parsers I've worked on would consider those outputs an error. Subtests really should be something legacy parsers consider comments, which fortunately still gives a lot of space. |
@jonathanKingston I'm happy with the old proposal for subtests
I'd be fine with other formats, but since there are tools in Perl, Java, Node, which output something similar to that, I'd be more keen to simply formalize it in a TAP 14 specification.
+1 I believe we have the formal TAP 13 specification, and a meta TAP 13.5, or TAP 13 + proposals/custom code. Considering the number of users in different programming languages outputting subtests, I think a proposal similar to what most tools produce (and that got consensus here and in the mailing list) would be a good approach. As a side note, there is an open issue in tap4j where the TAP producer emits the subtests first. The current state machine in tap4j is not able to consume a subtest before. Quoting the example in that issue below:
A perl test that will produce this output is as follows:
It'd be nice to have the specification in cases like this to guide the developer :-) |
We should not be afraid to break old implementations and if anything that should be the correct behaviour. Old implementations should not ever ignore sub tests. This is a testing setup, ignoring tests would be a mistake. Due to the non standard nature of all the current use of subtests it is possible subtests could fail and the parent succeed - this should be caught or be a parse error; either for me is fine. I'm a little against looking to satisfy older implementations which we might end up calling 13.5 erroneously. This is a similar situation to what Markdown has been through and where possible I would prefer to go for breaking behaviour and errors will be spotted sooner rather than skipped over. This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment. I would therefore aim for:
I'm actually a fan of the tap4j as it would aid streaming output of the tests before they have completed running. The other issue with the old proposal was that there is no acknowledgement of sub test count which I think is an issue.
One last point, the number of main consumers and producers is still actually small, patching all of them to support a newer version shouldn't be a large piece of work. |
The way it was designed means that even if the subtests are ignored the correct conclusion can still be made. It leads to a data-loss, not not to incorrect operation.
In the parser I wrote a discrepancy generates a non-fatal parse error.
Do we really have that problem?
I don't follow you. They don't have extra whitespace at the start of the line, so they won't be ignored even on legacy parsers. |
The point I am making is we are expecting the producer to be error free by always returning a
That is my point, why would you want them ignored. I want a testing framework to do one of two things when it doesn't know what to do:
Older parsers may or may not be able to cope with invalid subtests. The following will not fail with some of the parsers I have seen:
However both these would:
and
This is unlikely to be true for all producers but I take your point that the parent test should fail however I would prefer the correct behaviour for parsers is that they fail.
This is unlikely to be true for all consumers, in fact the TAP philosophy (which likely needs addressing) suggests any such issue should be skipped. The only way I would support this with the subtests being before like @kinow mentioned. Even then the format isn't as easy to read. |
Even as OKs they would fail because the numbering is incorrect.
Agreed. Parsing only the parent test should clearly be considered legacy behavior, but legacy has its uses.
My take on the philosophy would be: "report errors, but when possibly make them non-fatal", as that way the user can be presented with as much information as possible. |
Very true however subtests could be part of the same numbering potentially. Is there other useful meta data that subtests have? Yeah I will raise an issue also with that as I certainly don't think it is clear enough. |
@Ovid I saw some interesting comments on the mailing list of yours. (http://www.ietf.org/mail-archive/web/tap/current/msg00530.html) Would you be able to mention your concerns here? I believe a way of demarcating subtests is needed and it should satisfy these conditions:
|
I'm not sure what you mean with that exactly. |
and
Obviously numbering itself should be optional also. |
@jonathanKingston: The approach has been suggested before and always shot down. The problem with this approach is that it is not backwards-compatible. Maintaining this backwards compatibility is a key design goal of TAP. |
@Ovid it depends how you define backwards compatibility, therein lies my issues with the TAP philosophy. However in this case we are transferring further tests, we should be doing our best to make all previous consumers to assume the tests are top level tests. Subtests should in older consumers either:
It would not be acceptable for the following to be treated as an overall pass in older consumers:
What I think this means is that the choice between any subtest format is:
Ultimately if backwards compatibility were a priority then the TAP version would not have been put in place in the first instance. In my opinion truly backwards compatible structures would not put in version strings. The problem around backwards compatibility here restricts subtest numbering completely (However looking at the specification numbering is very loosely defined itself, it doesn't even state test numbers are incremental), by adding in numbering when we have the tests any numbering policy would be problematic also. I think the benefit here is that we can define subtests without the risk of breaking older consumers - false negatives are better than false positives when it comes to tests themselves. |
This is not acceptable at all. Specially not in situations where the producer and the consumer are distributed separately. If the consequence of upgrading your producer is breakage, the most sensible decision is to not upgrade that producer. This sort of thing means people will refrain from upgrading to TAP 14.
Both of your suggestions will cause a test harness to fail despite the tests all being succeeding. Again, if upgrading a producer causes the toolchain to fail when the tests pass, the sensible thing is to not upgrade the producer.
That is already the situation today. And it's not a major issues because it only happens on a faulty producer. In a green-fields implementation this would make sense, but given the reality we can't enforce this for legacy consumers. |
I'm sorry but I really am not seeing the adverse reaction to a testing framework that requires you to upgrade your consumer when you decide to want subtests. All ideas have their pitfalls so far. Its a testing framework not a GUI tool or phone system we should not be trying to remedy old frameworks. Testing frameworks need to report errors not be ignored accuracy should be the key. It is not really greenfield at all as some have gone with the suggested wiki post / mailing list. Some have gone with the tap4j backwards format. Some have invented YAML subdocuments and etc. The best we can do is cause errors and explain why by incrementing the version number. The assertions made are that the TAP output is read by one consumer for the one producer and that won't always be the case. I'm not precious to any of the suggested structures I have given at all however I am precious to allow the safest failures for older frameworks. You mention that test harness will fail and yes ideally we should look for a structure that has the least issues in the most popular consumers however as I keep saying we should not be worried about causing false positives, false negatives would be far worse. Ideally I would prefer an implementation that breaks all old implementations to force the consumer upgrade if subtests are used. If you were upgrading a framework of any kind in a corporate setting, the best is to assume that everything interacting with the framework will break. If we make TAP 14 compelling enough then slowly people will upgrade. Upgrades in all software needs to be compelling enough as likely enough there will be issues. What I suggest is we keep looking, I really don't think it is ok that the best worst of the indentation is taken up. This causes YAML parsing issues for simplistic parsers. I'm back to lets throw as many examples at consumers until we find the most that either:
Can we at least keep looking/trying for a solution here, I think it is pretty obvious why no solution was ever chosen. Slumping for the first suggestion we know seems a little lacking. |
@jonathanKingston: I know this must be frustrating for you and I'm sorry about that. We're very, very cautious about this because TAP is the default test protocol used in the core Perl language. Perl is distributed by default with just about every major operating system except windows. This means that in just about every city on the planet with modern computers, Perl is there and thus TAP is there. There is no way I can estimate the entire scope of breakage (many of those default installations will never be touched), but there have been serious issues in the past with accidental toolchain breakage causing an uproar. And then there's the ripple effect of this change spreading out to other language test harnesses which produce and consume TAP and there would be yet another uproar (and I'm pretty sure it would kill TAP or cause a fork). I see no compelling reason for this change, but the downside is that it would be a disaster and devs in countries we've never thought of would be staring at their monitors, trying to figure out what happened. The first rule of the toolchain is that you do not break the toolchain (we Perl toolchain devs have been bitten enough times that we're learned our lessons). As for how to handle YAML diagnostic information, I thought it was to be indented four spaces from the current line, but there's a potential problem there (the following is completely made up and just used as an example):
In the above example, I could easily see a poorly-written parser choking on the YAML heredoc. That will need to be specified carefully and I don't see how the suggested changes which break the parser would mitigate the above case. To help us move forward on this, please show us concrete examples of:
Without seeing specific examples, we'll get this wrong. As an aside, is anyone reading this who was also at the Oslo QA hackathon in 2008? We hashed out a lot of this there, but more importantly, there was discussion of creating "spec tests" in YAML or XML format that parsers written in any language should be able to parse and validate. That would allow TAP consumer authors to have a standard against which they could test. Does anyone recall what happened with that? I wasn't working on the spec tests at that event. |
I also recall reading it somewhere, probably tap4j expects YAMLish in a indented block of text, starting with |
Since the specification has been frozen for a while, a TAP 14 which breaks backward compatibility could be dangerous and reduce the number of adopters, IMO. Though I agree with @jonathanKingston that we need to move further and not be afraid of changes, at this time maybe it would be more sensible to release a specification that could be easily adopted by implementations, and would let us measure its level of adoption by implementations and gather new contributors and users. @jonathanKingston in TAP 13 old website, I think there was a section Proposals. Maybe we could create similar page in the website with sections for TAP 14, TAP 15 and Backlog (or others) and write what we would be aiming to release with each new specification. Kind like a roadmap. What do you think? |
@kinow 👍 public product backlog spreadsheet prioritized for the TAP 14 draft sprint looks like useful deliverable This need was already expressed using different words in Leont's comment, my comment, Ovid's comment Where it should be hosted (GitHub Wiki, Google Docs, GitHub Issues, Trello, ...) and how would the review and voting processes look like is an unknown for me. I don't have corresponding positive open source experience and The IETF Process: An Informal Guide is not very digestible guideline |
@Ovid to be clear it is not me who actually wants this feature, however it would be a useful separation of the TAP output to see categorisation of the issues. What I am trying to prevent is causing more issues for the future of TAP by adding in the simplest option we can pick from which is whitespace indenting. Yes YAMLish formats have the indicators but I would prefer parsers to be able to ignore all indented text unless it spots either The example code you gave explains my distaste in standardising the old proposal of indenting the subtests, I want to keep parsing rules as simple as possible. I would be interested in seeing your idea around "spec tests" even if it is from memory and not exactly how it was suggested. As I said I'm not really suggesting anything besides keeping the TAP format simple whilst allowing flexibility.
I really don't buy this statement sorry, software can be updated carefully. Updating specs is possible even in the most widely used languages. I get that breaking changes will happen but that is the risk of updating the producer. This is the same reason why old browsers are prevalent and PHP 4 in enterprise setups. A fork has ultimately happened already happened with TAP-Y etc, without progressing the format to include the latest user requirements adoption will slowly die out. If TAP is so ingrained into the Perl toolchain that it can't be extended then ultimately it shouldn't have been opened up outside of the libraries that consumed it. If the original tools that created TAP want to remain using the older version, that is completely fine also. I think the only way to progress is as @xmojmr suggested creating guidelines for extending TAP and a framework where people can vote on these issues. @kinow I agree TAP 14 should just be a clean up of the spec where possible breaking backwards compatibility should be 15 (unless we opt for point increases which isn't a bad idea either). @xmojmr yep moved all talks of a roadmap to #10 thanks for all the previous comments on the previous thread also 👍 |
I think that is a very reasonable POV, I just don't think your suggested resolutions are workable. I'm sure there are syntaxes that would be ignored by a TAP12 consumer but are easier to parse for a TAP14 consumer than the current mechanism. This is a solvable problem. For example.
or
|
@Leont again I am back to thinking it doesn't need to be ignored so long as there is a failure but lets not get into that. Both those examples then need the parser to be a node based parser rather than a simple one test per line that starts with /^(not|ok)/. If I am building a parser that complex then I can filter out the YAMLish start and end. Prefixing with a character might be easier (despite not meeting my desire to have subtests parsed by older consumers):
I'm not a massive fan of that either but it's certainly simpler than context based parsing. |
I know I am late to the party, but @Leont just looped me in (kind of). Before I knew this discussion was going on, in fact it was early into my adoption of Test-Simple, I decided to build something like this in as an option into Test::Builder. As it stands the dev versions have the following as a subtest syntax you can turn on (default is the classic)
This doesn't break any of the harnesses/parsers I have used it on. Please do not take this available syntax as me going cowboy or trying to preempt anyone here. I did this before I knew there was an ongoing process or other people to discuss it with. The main takeaway should be that the dev releases now make it possible to support a new subtest syntax if one is decided. Also it is now possible to put the subtest line before the results from inside the subtest if that is necessary. |
It has been a long time sine this thread was last pinged. I just want to add that Test2 for perl, which is now the heart of most perl testing tools, supports the 2 forms of subtests I mentioned all those years ago:
and
But when concurrency is in play only the parentheses form is used to avoid subtests overlapping, which is impossible to handle if the parser tries. I will also add that Test2::Harness, aka yath, supports parsing both forms of subtests. |
Yeah, node-tap has supported both forms for years, as well. If JS and Perl both do it, can we just call it a spec? 😂 I think the "spec" has been just "whatever the implementations do" for quite a long time now. EDIT: The node-tap docs show it as:
but in practice, it will parse as a child test if the leading comment is missing. That's just a helpful hint so we can bail out with the failed test name earlier if |
Note that QUnit has always reported nested tests by printing the full name of a test, with parent names joined using the I personally also find this:
TAP version 13
ok 1 add > two numbers
1..1
# pass 1
# skip 0
# todo 0
# fail 0 We first shipped this in 2017, but didn't consider it a compromise afaik. We render our HTML UI in much the same way. I'm not proposing this as an alternative by any means. Subtests are by now established in Perl and node-tap and it's worth spec'ing for future TAP parsers just for that alone. I'm curious though what we gain in added value (or avoided complexity) through TAP subtest syntax. I've only used a relatively small corner of the TAP universe, so apologies if this is obvious, but I just thought I'd ask. It might inspire me to adopt the syntax in QUnit if it is applicable. Thanks ❤️ ! |
If there is a syntax for subtests that both Perl and Node.js have supported for years, then that makes sense to me to put it in the spec as those are probably the biggest TAP communities that I'm aware of. In spite of best intentions, IMO, the TAP 14 draft failed because it tried to do too much at once. Subsequently, it has gone unmerged for 7 years. I think a small change to the standard and a bump to revision 14 bringing the spec closer to real world usage would be a good thing. As the author of tappy for TAP in Python, I've mostly punted on the whole subtest problem because I was only interested in implementing what's in the spec. |
Raku currently only supports the unbuffered syntax, but I'm fine with adding support for adding the buffered syntax if that helps move this forward. I do wonder if it's also reusable for YAML blocks (and comments?). Can we make the block mean "this thing should be parsed as all belonging together", instead of using it exclusively for subtests. |
Node-tap uses a 2-space indented yaml blocks delimited by
Comments/directives can exist between the test point and the
I think it's best to just say that diagnostics must be actual yaml (not ish) indented 2 characters and delimited by The thing is, I really don't want to go back to having to count test points and update plan expectations like the bad old days before protocol-level child tests. Ie, this kind of code: // using node-tap
const t = require('tap')
t.plan(2)
t.test('child 1', async t => { t.pass('this is fine'); t.pass('another one') })
t.test('child 2', async t => { t.pass('also fine') }) today outputs:
I don't want to have to say "ok, you did Reporters can of course do whatever they like with it, including counting it up and flattening it:
I agree. So if there's some appetite for doing this, should we try again? If so, I'm down. Here's my minimal list of must-haves for a spec:
We can tackle these one by one so it's not too big of a drop at once. We should have a hard line against including anything in the spec that doesn't have at least 3 "relevant" implementations (ie, the most used implementation in a major programming language community), and only accept objections if a relevant implementation doesn't support it and doesn't intend to. Between me, @exodist, and @mblayman I think we've got quorum, but it'd be nice to bring in someone from Java and/or PHP if TAP is being used widely there. |
That is, at least in the case of node-tap, we've been forced to do it by virtue of the fact that there are parsers in the wild which will crash unhelpfully on invalid yaml in diagnostic sections. (I think the TAP parser built into Jenkins does this? I could be misremembering.) |
The problem is, that means you either have two separate events (which is undesirable if you want to process it in a sensible way; together), or you have to wait for the next line until you can emit the most recent test result (which is undesirable if you want to show updates as they come in). We need some kind of marker of there's more coming up if we don't want to lose TAP's streaming nature. |
Great suggestion for TAP15 ;) I mean, having written parsers that have to handle this and maintaining node-tap for 11 years now, I can say yes, you're 100% right, that sucks. And also, it's the kind of thing you do once, and isn't really that hard from then on. But we all do this now, undesirable or not, and there's value in having a spec that is minimal, descriptive, and accurate, at least to start. Then there'll be a firmer foundation upon which to build new ideas, which has been the biggest obstacle to progress in this space thus far. |
Fair enough |
For the yaml blocks, under a subtest would a yaml block have 6 space indentation for the --- and ...? To indicate the yaml block belongs to an event inside the subtest? |
@exodist Yes, when we add child tests to this specification, we'll have to specify that the YAML indentation is 2 spaces more than the indentation of the test point to which it belongs. Similarly, a child test inside of a child test would be indented 8 spaces (4 + the parent), one inside of that would be 12, a yaml block for the 12-space indented test point would be 14 spaces, etc. |
@ExE-Boss Current implementations use 4 spaces for indenting child tests. The goal of this exercise is to describe what is, so that we can make sensible progress towards what ought to be. Or we spend another 7 years debating everything and going nowhere. |
I should point out that the Test2::Harness TAP parser actually does not require 4 space indentation. Any indentation at all with proper TAP indented counts, and it considers a change in indentation one way or the other to indicate an additional subtest or and end of a subtest. it uses perls regex '\s' wildcard to capture space, that includes tabs or spaces. I am not likely to change this any time soon, it does work for the 4-space standard this wants to set, but it is more relaxed on what it accepts. |
@exodist Parsers rarely get in trouble by following Postel's rule, I think that's fine. ;) That being said, though, when we get to that part, we should specify "Harnesses may accept any level of indentation / should only parse subtests indented exactly 4 spaces", but also "producers must indent subtests 4 spaces", or something like that. |
@exodist I'm curious what would it do with something like this?
I seem to recall edge cases like that are why I flipped the table and just said it had to be 4 always in node-tap. |
The way the parser works last I modified it (a while ago to be sure) was that it had an array indexed by indentation level, every time indentation gets deeper it adds one to the right element (4 for 4 spaces, 3 for 3 spaces, etc), every time it gets more shallow it will close out all deeper subtests until the shallow level is closed. So it would handle this case just fine, and I believe I may even have a unit test for this specific edge case. Something I am less sure of from memory is if my parser considers a tab to be 4 spaces, or a single space (IE does it convert tabs to spaces, or does it just count it as a single space character?) I have to look into that. |
I should also say my parser is a couple-layered thing. Looking at this again I see the "current" implementation does not actually index into an array, I have a feature branch somewhere that does, but this is still set up such that it should handle your edge case. Anyway, the point is that I did not find it difficult to handle edge cases like yours, and I found that the parsing is much easier if the lines are first parsed into objects, then a separate watcher handles the object stream, the watcher then spawns a child-watcher for each subtest, etc. This simplifies the process greatly.. |
oh, also my parser ignores the YAML stuff, but if we get it standardized I can add it in. |
Yeah, I can see how that'd simplify things. http://npm.im/tap-parser creates a child Parser object for each indented block, and then when it sees a test point at the parent level, it closes out that child and continues on with the parent, iirc. It was also a while ago since I touched it last though, I could be completely wrong 😅 It does have to not emit the test point until the next line, in case it's yaml, which is sometimes a bit weird (that case @Leont is talking about). I don't think we'd need diagnostics to come before the test point, but it'd be nice if there was a marker on the test point line to indicate whether yaml is coming or not. Something like:
Then it could be a little less stateful and still fully interruptible. |
This issue was originally reported in TestAnything/testanything.github.io#20, and the conversation has been moved here since it is related to the TAP specification.
It concerns the subtests in TAP and a formal standard to be followed by TAP consumer and producers. In the latest TAP specification (TAP 13) there is no formal definition of subtests. There were proposals and draft documents on this, some tools like Test::More and tap4j support with limitations.
Please feel free to chime in and suggest enhancements for this standard. A message is being written to the mailing list pointing to this issue too. The mailing list has several users that participated in the IETF draft and have lots of experience with TAP and its implementations.
/cc @Leont @xmojmr
The text was updated successfully, but these errors were encountered: