Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local execution mode #183

Closed
abingham opened this issue Dec 3, 2016 · 24 comments
Closed

Local execution mode #183

abingham opened this issue Dec 3, 2016 · 24 comments
Assignees

Comments

@abingham
Copy link
Contributor

abingham commented Dec 3, 2016

It seems like it should be fairly straightforward to have a purely local (i.e. non-celery) execution model that people can use for test-driving CR and for small projects. Look into it.

@abingham abingham self-assigned this Dec 3, 2016
@abingham
Copy link
Contributor Author

abingham commented Dec 3, 2016

@boxed After our discussion, I've started some poking around with local test execution on the local-exec branch. The basic implementation was pretty simple, but it points to the need to introduce some new abstractions (i.e. executors which control the execution environment). That's not a huge change either, but it'll have to wait for a bit.

If you want to take this for a test drive, try this:

cosmic-ray init --test-runner=pytest --baseline=10 <session name> <module to test> -- <directory containing tests>
cosmic-ray exec <session name>
cosmic-ray report <session name>

The distributed mode of execution is still available if you instead run a worker and use cosmic-ray exec --dist.

@boxed
Copy link

boxed commented Dec 5, 2016

tri.declarative git:(master) cosmic-ray init --test-runner=pytest --baseline=10 foo tri.declarative -- tests/
tri.declarative git:(master) cosmic-ray exec foo
Traceback (most recent call last):
  File "/Users/andersh/Projects/tri.declarative/venv/bin/cosmic-ray", line 11, in <module>
    load_entry_point('cosmic-ray', 'console_scripts', 'cosmic-ray')()
  File "/Users/andersh/Projects/cosmic-ray/cosmic_ray/cli.py", line 380, in main
    sys.exit(handler(sub_config))
  File "/Users/andersh/Projects/cosmic-ray/cosmic_ray/cli.py", line 185, in handle_exec
    with use_db(db_name, mode=WorkDB.Mode.open) as db:
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/Users/andersh/Projects/cosmic-ray/cosmic_ray/work_db.py", line 212, in use_db
    db = WorkDB(path, mode)
  File "/Users/andersh/Projects/cosmic-ray/cosmic_ray/work_db.py", line 75, in __init__
    'Requested file {} not found'.format(path))
FileNotFoundError: Requested file foo.json not found

That doesn't seem good. Where is that "foo.json" file supposed to be written?

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

The issue appears to be that the baseline testing run is "failing". You can see this if you use verbose mode:

cosmic-ray --verbose init . . .

In short: when you use baseline timing for timeout calculation, we run your test suite once over unmutated code to figure out how fast it ought to run in a rough sense. If that baseline run fails, then we abort the init on the assumption that subsequent mutation testing would be meaningless.

In this case, CR is exhibiting a confluence of a few bad factors. First, we aren't reporting anything useful to the user when a baseline fails. This really should be logged at the "error" level rather than "info". My bad.

Second, we're almost certainly calculating "test failure" incorrectly for pytest. The current implementation naively says that the test suite as a whole passes if and only if all tests report that they passed. However, skipped tests don't report as passed, even though we should not count them as failures.

So you've found something that we need to address. The pytest runner needs to be smarter about calculating failures. This shouldn't be too difficult to address.

In the meantime, if you delete tests/test_declarative_class.py tests/test_evaluate.py, you should be able to test out cosmic-ray on tri.declarative. It seems to be working on my machine, at least.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

FWIW, here are the summary results I've got for tri.declarative:

total jobs: 570
complete: 570 (100.00%)
survival rate: 48.25%

With the caveat, of course, that I've removed some of your tests.

@boxed
Copy link

boxed commented Dec 5, 2016

Ah crap, just realized you're running from the latest off github but we've got something else internally here. I just pushed the latest code to github! Now the newest code on github should be ok, and it survives mutmut testing, so it's very interesting to see how it compares to CR.

The info about skipped being counted as failed was good to know! Now I can do the baseline successfully if I just deleted those three tests.

@boxed
Copy link

boxed commented Dec 5, 2016

You should try out mutmut on tri.declarative btw. I think the feedback is a lot nicer than CR, and something you should be able to copy easily. For example it gives you surviving mutants when it finds them directly so you can start working on the first one if you want.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

I'll definitely take it for a spin when I get a chance. The reporting from CR right now is certainly primitive - I've really only put in enough effort needed to get the information I need for development - but creating new reporting forms should be very simple. All of the data from a testing run is stored in a database, and reports are really just formatting the output of queries against that database.

@boxed
Copy link

boxed commented Dec 5, 2016

Cool. Can I stop CR in the middle and it retains the mutants it found?

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

Yes, the database gets initialized (i.e. via "cosmic-ray init") with the tests that need to get done. As results arrive they are attached to those work items. So a standard exec run only runs tests for the items that don't already have results.

@boxed
Copy link

boxed commented Dec 5, 2016

Cool.

We have several mutants in tri.declarative that should be survivable. So for example, CR finds that it can change "break" to "continue" after "del to_be_moved_by_name[-1]". Mutmut has a feature to mark these as ignored with "# pragma: no mutate". I guess you can't implement this feature in CR? :(

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

I guess you can't implement this feature in CR? :(

To my mind this is the biggest open issue with CR for practical mutation testing. The approach you're taking with mutmut is appealing because I might be able to piggy-back off the work of some existing system (e.g. astroid may help because it's part of the larger pylint ecosystem which has support for such pragmas). However, I also worry about how complex these sorts of inline directives could be. For example, I want users to be able to disable specific mutations, not just all of them, for a particular line of code...this might get verbose.

The leading approach in my mind is an external file which tells CR when and how to apply exceptions. This file would need to store enough data (i.e. a patch, effectively) to tell when the file has changed too much. See the last comment in the #97 for a better description. This has the benefit of not polluting the code, and it also has the theoretical benefit (theoretical in the sense that I'm not entirely sure it's a benefit in practice) of allowing users to swap out exception files without changing the code itself.

So you're right: unfortunately this is not something that CR can not do right now. It's a really interesting problem, and a really important one, so hopefully I (or someone) will get to it soon.

@boxed
Copy link

boxed commented Dec 5, 2016

I think it's nice to have it in the code, I've modeled mutmuts feature after how it works in coverage.py.

@boxed
Copy link

boxed commented Dec 5, 2016

-    if (class_to_decorate is None):
+    if (class_to_decorate == None):

That seems... harsh. If not outright wrong.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

Cool, I was able to run mutmut over tri.declarative! It's definitely faster than CR, and that probably has to do with how CR starts a sandbox process for each test run. The database CR uses is not the fastest in the world, either.

Also, I have new CR results for the most recent tri.declarative:

total jobs: 569
complete: 569 (100.00%)
survival rate: 18.28%

In my experience that's a really, really good score.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

I think it's nice to have it in the code, I've modeled mutmuts feature after how it works in coverage.py.

Right, it's very common in practice, and that's a big mark in its favor. Maybe we could support two forms of exceptions, inline and external. I think we need some experience to know what is best overall.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

That seems... harsh. If not outright wrong.

Right, this is skirting the edge of what are called "equivalent mutants". In general the "is -> ==" replacement is testable, but not in this case. We should probably disallow that mutation in general. An open issue would put it on the radar...

We have a number of exception in the case already for this kind of thing. See, for example, here where we disallow the reverse mutation of "== -> is". An exploration of these kinds of exceptions is a really fun part of the research end of mutation testing.

@boxed
Copy link

boxed commented Dec 5, 2016

Jesus.. CR really creates a TON of mutations. Mutmut just has one mutation per AST node max, so that explains some of the speed difference too. The lt operator in tri.declarative has 7! mutans in CR, I just test one in mutmut (and that one is whitelisted, because we know things about index).

@boxed
Copy link

boxed commented Dec 5, 2016

-                if ((type(obj) is tuple) and (len(obj) == 1) and isinstance(obj[0], member_class)):
+                if ((type(obj) < tuple) and (len(obj) == 1) and isinstance(obj[0], member_class)):

That's a funny one. I would have expected that to be a crash. Why is type orderable? :P

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

Jesus.. CR really creates a TON of mutations.

Right, CR isn't messing around ;) For relational operators, for example, it basically tries replacing each operator with every other operator (where this makes syntactic sense). There may be ways to trim these numbers down a bit, but in principle I want to at least offer the possibility of maximal mutation.

As for the speed, it looks like mutmut is just plain faster on a per-test-run basis. I was watching them click by on each system, and mutmut was perhaps twice as fast. It's food for thought.

@boxed
Copy link

boxed commented Dec 5, 2016

and mutmut was perhaps twice as fast

Strange. I would have expected the reverse given that baron almost certainly is slower than using pythons built in AST.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

Why is type orderable? :P

It's not ;)

>>> type(1) < tuple
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: type() < type()

That mutant survived, so presumably your test never got to that line.

@boxed
Copy link

boxed commented Dec 5, 2016

I don't see how that's possible, we have 100% test coverage, that's the first statement in the "and" chain so no shortcutting is possible, mutmut mutates "is" into "is not" so it should have caught that, AND we had to whitelist the raise on the line below to make mutmut shut up about it.

@abingham
Copy link
Contributor Author

abingham commented Dec 5, 2016

It definitely warrants looking into. I'm not really sure what's going on. If I make the change from is to < manually and run the tests normally (outside of CR), pytest complains about something entirely different:

============================= test session starts ==============================
platform darwin -- Python 2.7.10, pytest-3.0.4, py-1.4.31, pluggy-0.4.0
rootdir: /Users/sixtynorth/sandbox/cr/tri.declarative, inifile: setup.cfg
collected 58 items / 2 errors

 generated xml file: /Users/sixtynorth/sandbox/cr/tri.declarative/testreport.xml 
=========================== short test summary info ============================
ERROR tests/test_declarative_class.py
ERROR tests/test_orm.py
==================================== ERRORS ====================================
_______________ ERROR collecting tests/test_declarative_class.py _______________
tests/test_declarative_class.py:14: in <module>
    class Declarative(object):
lib/tri/declarative/__init__.py:138: in decorator
    {k: v for k, v in class_to_decorate.__dict__.items() if k not in ['__dict__', '__weakref__']})
lib/tri/declarative/__init__.py:132: in __init__
    members = get_members(cls)
lib/tri/declarative/__init__.py:117: in get_members
    sorted_bindings = sorted(bindings, key=lambda x: sort_key(x[1]))
lib/tri/declarative/__init__.py:112: in generate_member_bindings
    if type(obj) < tuple and len(obj) == 1 and isinstance(obj[0], member_class):
E   TypeError: object of type 'getset_descriptor' has no len()
______________________ ERROR collecting tests/test_orm.py ______________________
tests/test_orm.py:23: in <module>
    class SimpleSQLModel(object):
lib/tri/declarative/__init__.py:138: in decorator
    {k: v for k, v in class_to_decorate.__dict__.items() if k not in ['__dict__', '__weakref__']})
lib/tri/declarative/__init__.py:132: in __init__
    members = get_members(cls)
lib/tri/declarative/__init__.py:117: in get_members
    sorted_bindings = sorted(bindings, key=lambda x: sort_key(x[1]))
lib/tri/declarative/__init__.py:112: in generate_member_bindings
    if type(obj) < tuple and len(obj) == 1 and isinstance(obj[0], member_class):
E   TypeError: object of type 'function' has no len()
============================ pytest-warning summary ============================
WC1 None [pytest] section in setup.cfg files is deprecated, use [tool:pytest] instead.
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
================== 1 pytest-warnings, 2 error in 0.21 seconds ==================

I get identical results if I manually change is to is not. And it's happening during test collection. Curiouser and curiouser.

This does point to one possibly sticky issue for CR which is that the pytest collector never sees the mutated code. If, as seems to be happening here, the pytest collector is dipping into the non-test code for some reason, it will have an incorrect view of the code under test. I don't know enough about pytest's internals to infer too much more.

I've got to get back to real work for a bit, unfortunately, but I'll try to check back in on this discussion later.

@abingham
Copy link
Contributor Author

I've merge the local-exec branch in, so the default is now local execution. Hopefully I'll have time to address the lingering issues related to this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants