-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local execution mode #183
Comments
@boxed After our discussion, I've started some poking around with local test execution on the If you want to take this for a test drive, try this:
The distributed mode of execution is still available if you instead run a worker and use |
That doesn't seem good. Where is that "foo.json" file supposed to be written? |
The issue appears to be that the baseline testing run is "failing". You can see this if you use verbose mode:
In short: when you use baseline timing for timeout calculation, we run your test suite once over unmutated code to figure out how fast it ought to run in a rough sense. If that baseline run fails, then we abort the init on the assumption that subsequent mutation testing would be meaningless. In this case, CR is exhibiting a confluence of a few bad factors. First, we aren't reporting anything useful to the user when a baseline fails. This really should be logged at the "error" level rather than "info". My bad. Second, we're almost certainly calculating "test failure" incorrectly for pytest. The current implementation naively says that the test suite as a whole passes if and only if all tests report that they passed. However, skipped tests don't report as passed, even though we should not count them as failures. So you've found something that we need to address. The pytest runner needs to be smarter about calculating failures. This shouldn't be too difficult to address. In the meantime, if you delete tests/test_declarative_class.py tests/test_evaluate.py, you should be able to test out cosmic-ray on tri.declarative. It seems to be working on my machine, at least. |
FWIW, here are the summary results I've got for tri.declarative:
With the caveat, of course, that I've removed some of your tests. |
Ah crap, just realized you're running from the latest off github but we've got something else internally here. I just pushed the latest code to github! Now the newest code on github should be ok, and it survives mutmut testing, so it's very interesting to see how it compares to CR. The info about skipped being counted as failed was good to know! Now I can do the baseline successfully if I just deleted those three tests. |
You should try out mutmut on tri.declarative btw. I think the feedback is a lot nicer than CR, and something you should be able to copy easily. For example it gives you surviving mutants when it finds them directly so you can start working on the first one if you want. |
I'll definitely take it for a spin when I get a chance. The reporting from CR right now is certainly primitive - I've really only put in enough effort needed to get the information I need for development - but creating new reporting forms should be very simple. All of the data from a testing run is stored in a database, and reports are really just formatting the output of queries against that database. |
Cool. Can I stop CR in the middle and it retains the mutants it found? |
Yes, the database gets initialized (i.e. via "cosmic-ray init") with the tests that need to get done. As results arrive they are attached to those work items. So a standard exec run only runs tests for the items that don't already have results. |
Cool. We have several mutants in tri.declarative that should be survivable. So for example, CR finds that it can change "break" to "continue" after "del to_be_moved_by_name[-1]". Mutmut has a feature to mark these as ignored with "# pragma: no mutate". I guess you can't implement this feature in CR? :( |
To my mind this is the biggest open issue with CR for practical mutation testing. The approach you're taking with mutmut is appealing because I might be able to piggy-back off the work of some existing system (e.g. astroid may help because it's part of the larger pylint ecosystem which has support for such pragmas). However, I also worry about how complex these sorts of inline directives could be. For example, I want users to be able to disable specific mutations, not just all of them, for a particular line of code...this might get verbose. The leading approach in my mind is an external file which tells CR when and how to apply exceptions. This file would need to store enough data (i.e. a patch, effectively) to tell when the file has changed too much. See the last comment in the #97 for a better description. This has the benefit of not polluting the code, and it also has the theoretical benefit (theoretical in the sense that I'm not entirely sure it's a benefit in practice) of allowing users to swap out exception files without changing the code itself. So you're right: unfortunately this is not something that CR can not do right now. It's a really interesting problem, and a really important one, so hopefully I (or someone) will get to it soon. |
I think it's nice to have it in the code, I've modeled mutmuts feature after how it works in coverage.py. |
That seems... harsh. If not outright wrong. |
Cool, I was able to run mutmut over Also, I have new CR results for the most recent
In my experience that's a really, really good score. |
Right, it's very common in practice, and that's a big mark in its favor. Maybe we could support two forms of exceptions, inline and external. I think we need some experience to know what is best overall. |
Right, this is skirting the edge of what are called "equivalent mutants". In general the "is -> ==" replacement is testable, but not in this case. We should probably disallow that mutation in general. An open issue would put it on the radar... We have a number of exception in the case already for this kind of thing. See, for example, here where we disallow the reverse mutation of "== -> is". An exploration of these kinds of exceptions is a really fun part of the research end of mutation testing. |
Jesus.. CR really creates a TON of mutations. Mutmut just has one mutation per AST node max, so that explains some of the speed difference too. The lt operator in tri.declarative has 7! mutans in CR, I just test one in mutmut (and that one is whitelisted, because we know things about index). |
That's a funny one. I would have expected that to be a crash. Why is type orderable? :P |
Right, CR isn't messing around ;) For relational operators, for example, it basically tries replacing each operator with every other operator (where this makes syntactic sense). There may be ways to trim these numbers down a bit, but in principle I want to at least offer the possibility of maximal mutation. As for the speed, it looks like mutmut is just plain faster on a per-test-run basis. I was watching them click by on each system, and mutmut was perhaps twice as fast. It's food for thought. |
Strange. I would have expected the reverse given that baron almost certainly is slower than using pythons built in AST. |
It's not ;) >>> type(1) < tuple
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: type() < type() That mutant survived, so presumably your test never got to that line. |
I don't see how that's possible, we have 100% test coverage, that's the first statement in the "and" chain so no shortcutting is possible, mutmut mutates "is" into "is not" so it should have caught that, AND we had to whitelist the raise on the line below to make mutmut shut up about it. |
It definitely warrants looking into. I'm not really sure what's going on. If I make the change from
I get identical results if I manually change This does point to one possibly sticky issue for CR which is that the pytest collector never sees the mutated code. If, as seems to be happening here, the pytest collector is dipping into the non-test code for some reason, it will have an incorrect view of the code under test. I don't know enough about pytest's internals to infer too much more. I've got to get back to real work for a bit, unfortunately, but I'll try to check back in on this discussion later. |
I've merge the local-exec branch in, so the default is now local execution. Hopefully I'll have time to address the lingering issues related to this soon. |
It seems like it should be fairly straightforward to have a purely local (i.e. non-celery) execution model that people can use for test-driving CR and for small projects. Look into it.
The text was updated successfully, but these errors were encountered: