feat: heuristics #431

tristan-f-r · 2025-10-30T01:13:48Z

Depends on refactor: separate statistic computation #411.

We implement @ntalluri's graph heuristics, or heuristics in configurations.

heuristics:
  number_of_nodes: "<1500"
  number_of_connected_components: ["10 < x < 100", "500<"]
  ...

If any heuristic is not met, the run will stop. Future error handling will better handle these errors (#21). [I avoid handling it here, since I want to split up the Snakefile first, mostly to make variable scoping cleaner. This change would cause this PR to be a mess.]

Most of this code is just parsing interval inequality notation.

we also make it lazy

read-the-docs-community · 2025-10-30T01:14:59Z

Documentation build overview

📚 spras | 🛠️ Build #30221003 | 📁 Comparing 4844fd6 against latest (c3b02cd)

🔍 Preview build

Show files changed (3 files in total): 📝 3 modified | ➕ 0 added | ➖ 0 deleted

File	Status
genindex.html	📝 modified
fordevs/spras.config.html	📝 modified
fordevs/spras.html	📝 modified

agitter

If any heuristic is not met, the run will stop.

Is that temporary behavior? My understanding of our discussion of heuristics for #318 is that we would flag an output pathway as "failed" or "errored" per the current heuristics but the rest of the Snakemake workflow would continue.

We will need to add an example of the heuristics in a config file and document their usage once it is finalized.

agitter · 2025-11-15T03:40:02Z

spras/analysis/summary.py


        # Save the network name, number of nodes, number edges, and number of connected components
        nw_name = str(file_path)
-        number_nodes = nw.number_of_nodes()


Are these the same changes from #411 and any new design from that pull request will be merged in here?

agitter · 2025-11-15T03:41:10Z

spras/statistics.py

Ignored this for the scope of this PR

agitter · 2025-11-15T03:44:32Z

spras/config/heuristics.py

+            interval_string = f"one of the intervals ({formatted_intervals})"
+        return f"{name} expected {desired} in interval {interval_string}"


This text doesn't quite match up. You could get "in interval one of the intervals..."

agitter · 2025-11-15T03:46:55Z

spras/config/heuristics.py

+            GraphHeuristicsError.format_failed_heuristic(heuristic) for heuristic in failed_heuristics
+        ]
+
+        formatted_heuristics = "\n".join([f"- {formatted_heuristics}" for heuristic in formatted_heuristics])


Should we use a different character besides - like * for the list? I'm trying to imagine whether we could ever have a leading negative here in formatted_heuristics that would be confusing.

agitter · 2025-11-15T03:54:06Z

spras/interval.py

+    lower_closed: bool
+    upper_closed: bool
+
+    def mem(self, num: float) -> bool:


What does mem stand for?

If num is inside the interval: (we should swap this out with a library, and I'll check out the library from the below comment), but this should be in.

agitter · 2025-11-15T03:59:37Z

spras/config/heuristics.py

+        and throws a GraphHeuristicsError if it fails the heuristics in `self`.
+        """
+        # TODO: re-use from summary.py once we have a mixed/hypergraph library
+        G: nx.DiGraph = nx.read_edgelist(path, data=(('Rank', str), ('Direction', str)), create_using=nx.DiGraph)


This is reading in directed edges but summary.py reads undirected edges. Those should be consistent. That is a good reason to use shared code if possible so it doesn't accidentally diverge later.

agitter · 2025-11-15T04:20:28Z

spras/interval.py

+For graph heuristics, we allow inequality intervals of the form (num) < (id)?. For example,
+we can say "1500 <" for "1500 < x", or "1000 < x < 2000", etc.
+
+[If there is ever a library that does this, we should replace this code with that library.]


I didn't review this file yet. I looked for any existing libraries:

https://boolean-parser.readthedocs.io/en/latest/intro.html is the best match. I'm still reading to confirm that it avoid eval

SymPy and https://docs.sympy.org/latest/modules/parsing.html looks very powerful but uses eval

https://github.com/AaryamanBhute/OpenExpressions may be able to do this. It's not as clear how to use it.

boolean-parser so eval, and we can use it. I was worried about that library falling into unmaintained status, though it would be better than what is here right now. I can use it here 👍

(Optimally, I would like to use the parsing logic for booleans inside some kind of SMT, though I wasn't able to find a nice isolated library for this.)

tristan-f-r · 2025-11-15T09:46:21Z

Is that temporary behavior?

It is temporary behavior. I don't have a good base for another error to build error handling on (I would like #321 for adding timeout), though if #321 gets merged before that, I can prepare the error handling PR and make this PR depend on that, to avoid temporarily introducing this bad heuristics behavior.

tristan-f-r and others added 5 commits October 10, 2025 06:32

refactor: separate statistic computation

6ec4f62

we also make it lazy

fix: correct tuple assumption

9987189

fix: stably use graph statistic values

25eef5e

style: fmt

cb373c1

feat: init intervals and heuristics

4640bc0

tristan-f-r added enhancement New feature or request tuning Workflow-spanning algorithm tuning blocked-by-other-pr labels Oct 30, 2025

Merge branch 'main' into lazy-stats

47a9e26

github-actions bot added the merge-conflict This PR has merge conflicts. label Oct 30, 2025

tristan-f-r added 2 commits October 29, 2025 18:15

style: specify zip strict

898d568

Merge branch 'lazy-stats' into heuristics

b307f84

github-actions bot removed the merge-conflict This PR has merge conflicts. label Oct 30, 2025

tristan-f-r and others added 9 commits October 30, 2025 01:41

refactor: use heuristic error, mv heuristics outside of main schema file

8177ed6

fix: proper tokenization

fac1108

fix(interval): correct parsing

2e0d8d0

fix(interval): correct other parsing mistakes

183c3ad

feat: integrate heuristics

0b6e01f

fix: drop random code

33e004f

fix: make undirected for determining number of connected components

c675ece

Merge branch 'lazy-stats' into heuristics

6a9a0f3

fix: specify heuristics in wrapping config object

1cdaf12

tristan-f-r marked this pull request as ready for review November 6, 2025 02:23

tristan-f-r added 2 commits November 6, 2025 08:31

feat: interval and heuristic testing

7b290dc

style: fmt

4844fd6

agitter reviewed Nov 15, 2025

View reviewed changes

		interval_string = f"one of the intervals ({formatted_intervals})"
		return f"{name} expected {desired} in interval {interval_string}"

feat: heuristics #431

Are you sure you want to change the base?

feat: heuristics #431

Uh oh!

Conversation

tristan-f-r commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

read-the-docs-community bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tristan-f-r Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tristan-f-r commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tristan-f-r commented Oct 30, 2025 •

edited

Loading

read-the-docs-community bot commented Oct 30, 2025 •

edited

Loading

tristan-f-r Nov 15, 2025 •

edited

Loading