-
Notifications
You must be signed in to change notification settings - Fork 25
feat: heuristics #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: heuristics #431
Conversation
Documentation build overview
Show files changed (3 files in total): 📝 3 modified | ➕ 0 added | ➖ 0 deleted
|
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any heuristic is not met, the run will stop.
Is that temporary behavior? My understanding of our discussion of heuristics for #318 is that we would flag an output pathway as "failed" or "errored" per the current heuristics but the rest of the Snakemake workflow would continue.
We will need to add an example of the heuristics in a config file and document their usage once it is finalized.
|
|
||
| # Save the network name, number of nodes, number edges, and number of connected components | ||
| nw_name = str(file_path) | ||
| number_nodes = nw.number_of_nodes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the same changes from #411 and any new design from that pull request will be merged in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignored this for the scope of this PR
| interval_string = f"one of the intervals ({formatted_intervals})" | ||
| return f"{name} expected {desired} in interval {interval_string}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text doesn't quite match up. You could get "in interval one of the intervals..."
| GraphHeuristicsError.format_failed_heuristic(heuristic) for heuristic in failed_heuristics | ||
| ] | ||
|
|
||
| formatted_heuristics = "\n".join([f"- {formatted_heuristics}" for heuristic in formatted_heuristics]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a different character besides - like * for the list? I'm trying to imagine whether we could ever have a leading negative here in formatted_heuristics that would be confusing.
| lower_closed: bool | ||
| upper_closed: bool | ||
|
|
||
| def mem(self, num: float) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does mem stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If num is inside the interval: (we should swap this out with a library, and I'll check out the library from the below comment), but this should be in.
| and throws a GraphHeuristicsError if it fails the heuristics in `self`. | ||
| """ | ||
| # TODO: re-use from summary.py once we have a mixed/hypergraph library | ||
| G: nx.DiGraph = nx.read_edgelist(path, data=(('Rank', str), ('Direction', str)), create_using=nx.DiGraph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reading in directed edges but summary.py reads undirected edges. Those should be consistent. That is a good reason to use shared code if possible so it doesn't accidentally diverge later.
| For graph heuristics, we allow inequality intervals of the form (num) < (id)?. For example, | ||
| we can say "1500 <" for "1500 < x", or "1000 < x < 2000", etc. | ||
| [If there is ever a library that does this, we should replace this code with that library.] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this file yet. I looked for any existing libraries:
- https://boolean-parser.readthedocs.io/en/latest/intro.html is the best match. I'm still reading to confirm that it avoid
eval - SymPy and https://docs.sympy.org/latest/modules/parsing.html looks very powerful but uses
eval - https://github.com/AaryamanBhute/OpenExpressions may be able to do this. It's not as clear how to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean-parser so eval, and we can use it. I was worried about that library falling into unmaintained status, though it would be better than what is here right now. I can use it here 👍
(Optimally, I would like to use the parsing logic for booleans inside some kind of SMT, though I wasn't able to find a nice isolated library for this.)
It is temporary behavior. I don't have a good base for another error to build error handling on (I would like #321 for adding |
We implement @ntalluri's graph heuristics, or
heuristicsin configurations.If any heuristic is not met, the run will stop. Future error handling will better handle these errors (#21). [I avoid handling it here, since I want to split up the Snakefile first, mostly to make variable scoping cleaner. This change would cause this PR to be a mess.]
Most of this code is just parsing interval inequality notation.