Post processing / ML Code - Aggregate per algorithm #160

ntalluri · 2024-06-10T19:57:59Z

No description provided.

ntalluri · 2024-06-10T20:18:51Z

Failed due to ML Bug #143

changes made in ML Bug are subject to change with this pull request

agitter

I merged the changes from #143 and left some high level comments before reviewing the code in detail.

agitter · 2024-06-14T13:55:15Z

spras/config.py

        self.analysis_include_summary = raw_config["analysis"]["summary"]["include"]
        self.analysis_include_graphspace = raw_config["analysis"]["graphspace"]["include"]
        self.analysis_include_cytoscape = raw_config["analysis"]["cytoscape"]["include"]
        self.analysis_include_ml = raw_config["analysis"]["ml"]["include"]
+
+        if 'aggregate_per_algorithm' not in self.ml_params:
+            raise ValueError("The 'aggregate_per_algorithm' parameter must be set to either true or false in ml analysis parameters.")


Should we have it use false as a default so that old config files are supported?

agitter · 2024-06-14T13:56:18Z

test/test_config.py


+        with pytest.raises(ValueError): #raises error if empty dataframe is used for post processing


This test would change if we use a default value as I proposed above

agitter · 2024-06-14T14:00:21Z

config/config.yaml

        include: true
+        # required; adds ml analysis per algorithm output


Maybe this should be optional and default to false?

I haven't looked through the code line-by-line yet. Is the behavior that if this is true, we will get the overall summaries as before and then additionally compute these algorithm-specified summaries? That is what I would expect.

agitter · 2024-06-14T14:15:39Z

We may need to think more about the errors we added in #143 when there are not enough pathways for the ML analysis. Now that we can aggregate by algorithm, that will be a common occurrence. Some algorithms don't have parameters so there will always be a single pathway generated. Should we inspect the number of parameter combinations in the config file and only try running ML analysis for those algorithms with multiple parameter combinations?

ntalluri · 2024-06-14T16:36:43Z

Would the snake make command --keep-going be sufficient enough and allow the error to be printed out to the command line?

I feel like it's supressing errors we want to see, but this could be an option

agitter · 2024-06-14T22:26:06Z

Would the snake make command --keep-going be sufficient enough

I would rather not require users to add another argument to Snakemake at the command line in order to successfully execute fairly common configurations. It seems like it would lead to a lot of confusion if forgetting that option causes the workflow to crash, so if we can figure out a strategy for writing default outputs or avoiding calling the per-algorithm aggregation when it won't work, that would be better.

ntalluri · 2024-06-17T15:04:52Z

I updated the Snakemake workflow to only run for algorithms that are using multiple parameters combinations when it comes to ml-aggregate

agitter

There are two types of end-to-end integration tests that would be nice to have, but we aren't set up well to make those kinds of tests currently. They would require running the entire workflow. Those tests:

given the config file config.yaml, confirm that only the expected algorithms produce ensemble files
confirm that one of the expected algorithm-specific ensembles (e.g. data1-ml/omicsintegrator1-ensemble-pathway.txt) is correct

Are these tests too hard in our current framework?

This design is great. I confirmed that if I add another parameter combination to the config file for MEO and then rerun Snakemake, then an ensemble pathway and ML outputs are created for MEO.

Snakefile

config/config.yaml

changes to code to have ml agg per algo

20300d5

ntalluri and others added 6 commits June 11, 2024 11:56

removed controllable parameter for now

e31586a

removed controllable parameter for now from test_config

a2dc174

make all algortihms true

1da8c9b

added back aggregate_per_algorithm param and added testing

0d8ec19

precommit

a787a88

Merge branch 'master' into ml-aggregate

3aca4de

agitter reviewed Jun 14, 2024

View reviewed changes

ntalluri changed the title ~~Post processing / ML Codde - Aggregate per algorithm~~ Post processing / ML Code - Aggregate per algorithm Jun 14, 2024

ntalluri added 2 commits June 14, 2024 14:37

update code to make rule not required and updated ml-agg rule

90e8e11

clean up

08bc987

clean up

6c9e1f2

ntalluri mentioned this pull request Jun 21, 2024

Decouple Ensemble Step from ML Step #163

Closed

agitter requested changes Jun 23, 2024

View reviewed changes

Snakefile Outdated Show resolved Hide resolved

Snakefile Show resolved Hide resolved

Snakefile Outdated Show resolved Hide resolved

Snakefile Show resolved Hide resolved

config/config.yaml Outdated Show resolved Hide resolved

requested changes

122fb66

agitter approved these changes Jun 29, 2024

View reviewed changes

agitter merged commit ad4da94 into Reed-CompBio:master Jun 29, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post processing / ML Code - Aggregate per algorithm #160

Post processing / ML Code - Aggregate per algorithm #160

ntalluri commented Jun 10, 2024

ntalluri commented Jun 10, 2024 •

edited

Loading

agitter left a comment

agitter Jun 14, 2024

agitter Jun 14, 2024

agitter Jun 14, 2024

agitter commented Jun 14, 2024

ntalluri commented Jun 14, 2024 •

edited

Loading

agitter commented Jun 14, 2024

ntalluri commented Jun 17, 2024 •

edited

Loading

agitter left a comment


		with pytest.raises(ValueError): #raises error if empty dataframe is used for post processing

		include: true
		# required; adds ml analysis per algorithm output

Post processing / ML Code - Aggregate per algorithm #160

Post processing / ML Code - Aggregate per algorithm #160

Conversation

ntalluri commented Jun 10, 2024

ntalluri commented Jun 10, 2024 • edited Loading

agitter left a comment

Choose a reason for hiding this comment

agitter Jun 14, 2024

Choose a reason for hiding this comment

agitter Jun 14, 2024

Choose a reason for hiding this comment

agitter Jun 14, 2024

Choose a reason for hiding this comment

agitter commented Jun 14, 2024

ntalluri commented Jun 14, 2024 • edited Loading

agitter commented Jun 14, 2024

ntalluri commented Jun 17, 2024 • edited Loading

agitter left a comment

Choose a reason for hiding this comment

ntalluri commented Jun 10, 2024 •

edited

Loading

ntalluri commented Jun 14, 2024 •

edited

Loading

ntalluri commented Jun 17, 2024 •

edited

Loading