Skip to content

Conversation

@tristan-f-r
Copy link
Collaborator

@tristan-f-r tristan-f-r commented Oct 30, 2025

We implement @ntalluri's graph heuristics, or heuristics in configurations.

heuristics:
  number_of_nodes: "<1500"
  number_of_connected_components: ["10 < x < 100", "500<"]
  ...

If any heuristic is not met, the run will stop. Future error handling will better handle these errors (#21). [I avoid handling it here, since I want to split up the Snakefile first, mostly to make variable scoping cleaner. This change would cause this PR to be a mess.]

Most of this code is just parsing interval inequality notation.

@tristan-f-r tristan-f-r added enhancement New feature or request tuning Workflow-spanning algorithm tuning blocked-by-other-pr labels Oct 30, 2025
@read-the-docs-community
Copy link

read-the-docs-community bot commented Oct 30, 2025

Documentation build overview

📚 spras | 🛠️ Build #30221003 | 📁 Comparing 4844fd6 against latest (c3b02cd)


🔍 Preview build

Show files changed (3 files in total): 📝 3 modified | ➕ 0 added | ➖ 0 deleted
File Status
genindex.html 📝 modified
fordevs/spras.config.html 📝 modified
fordevs/spras.html 📝 modified

@github-actions github-actions bot added the merge-conflict This PR has merge conflicts. label Oct 30, 2025
@github-actions github-actions bot removed the merge-conflict This PR has merge conflicts. label Oct 30, 2025
@tristan-f-r tristan-f-r marked this pull request as ready for review November 6, 2025 02:23
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any heuristic is not met, the run will stop.

Is that temporary behavior? My understanding of our discussion of heuristics for #318 is that we would flag an output pathway as "failed" or "errored" per the current heuristics but the rest of the Snakemake workflow would continue.

We will need to add an example of the heuristics in a config file and document their usage once it is finalized.


# Save the network name, number of nodes, number edges, and number of connected components
nw_name = str(file_path)
number_nodes = nw.number_of_nodes()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the same changes from #411 and any new design from that pull request will be merged in here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignored this for the scope of this PR

Comment on lines +25 to +26
interval_string = f"one of the intervals ({formatted_intervals})"
return f"{name} expected {desired} in interval {interval_string}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text doesn't quite match up. You could get "in interval one of the intervals..."

GraphHeuristicsError.format_failed_heuristic(heuristic) for heuristic in failed_heuristics
]

formatted_heuristics = "\n".join([f"- {formatted_heuristics}" for heuristic in formatted_heuristics])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a different character besides - like * for the list? I'm trying to imagine whether we could ever have a leading negative here in formatted_heuristics that would be confusing.

lower_closed: bool
upper_closed: bool

def mem(self, num: float) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does mem stand for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If num is inside the interval: (we should swap this out with a library, and I'll check out the library from the below comment), but this should be in.

and throws a GraphHeuristicsError if it fails the heuristics in `self`.
"""
# TODO: re-use from summary.py once we have a mixed/hypergraph library
G: nx.DiGraph = nx.read_edgelist(path, data=(('Rank', str), ('Direction', str)), create_using=nx.DiGraph)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reading in directed edges but summary.py reads undirected edges. Those should be consistent. That is a good reason to use shared code if possible so it doesn't accidentally diverge later.

For graph heuristics, we allow inequality intervals of the form (num) < (id)?. For example,
we can say "1500 <" for "1500 < x", or "1000 < x < 2000", etc.
[If there is ever a library that does this, we should replace this code with that library.]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review this file yet. I looked for any existing libraries:

Copy link
Collaborator Author

@tristan-f-r tristan-f-r Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean-parser so eval, and we can use it. I was worried about that library falling into unmaintained status, though it would be better than what is here right now. I can use it here 👍

(Optimally, I would like to use the parsing logic for booleans inside some kind of SMT, though I wasn't able to find a nice isolated library for this.)

@tristan-f-r
Copy link
Collaborator Author

Is that temporary behavior?

It is temporary behavior. I don't have a good base for another error to build error handling on (I would like #321 for adding timeout), though if #321 gets merged before that, I can prepare the error handling PR and make this PR depend on that, to avoid temporarily introducing this bad heuristics behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked-by-other-pr enhancement New feature or request tuning Workflow-spanning algorithm tuning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants