-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post processing / ML Code - Aggregate per algorithm #160
Conversation
Failed due to ML Bug #143 changes made in ML Bug are subject to change with this pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged the changes from #143 and left some high level comments before reviewing the code in detail.
spras/config.py
Outdated
self.analysis_include_summary = raw_config["analysis"]["summary"]["include"] | ||
self.analysis_include_graphspace = raw_config["analysis"]["graphspace"]["include"] | ||
self.analysis_include_cytoscape = raw_config["analysis"]["cytoscape"]["include"] | ||
self.analysis_include_ml = raw_config["analysis"]["ml"]["include"] | ||
|
||
if 'aggregate_per_algorithm' not in self.ml_params: | ||
raise ValueError("The 'aggregate_per_algorithm' parameter must be set to either true or false in ml analysis parameters.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have it use false
as a default so that old config files are supported?
test/test_config.py
Outdated
|
||
with pytest.raises(ValueError): #raises error if empty dataframe is used for post processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would change if we use a default value as I proposed above
config/config.yaml
Outdated
include: true | ||
# required; adds ml analysis per algorithm output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be optional and default to false?
I haven't looked through the code line-by-line yet. Is the behavior that if this is true, we will get the overall summaries as before and then additionally compute these algorithm-specified summaries? That is what I would expect.
We may need to think more about the errors we added in #143 when there are not enough pathways for the ML analysis. Now that we can aggregate by algorithm, that will be a common occurrence. Some algorithms don't have parameters so there will always be a single pathway generated. Should we inspect the number of parameter combinations in the config file and only try running ML analysis for those algorithms with multiple parameter combinations? |
Would the snake make command --keep-going be sufficient enough and allow the error to be printed out to the command line? I feel like it's supressing errors we want to see, but this could be an option |
I would rather not require users to add another argument to Snakemake at the command line in order to successfully execute fairly common configurations. It seems like it would lead to a lot of confusion if forgetting that option causes the workflow to crash, so if we can figure out a strategy for writing default outputs or avoiding calling the per-algorithm aggregation when it won't work, that would be better. |
I updated the Snakemake workflow to only run for algorithms that are using multiple parameters combinations when it comes to ml-aggregate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two types of end-to-end integration tests that would be nice to have, but we aren't set up well to make those kinds of tests currently. They would require running the entire workflow. Those tests:
- given the config file
config.yaml
, confirm that only the expected algorithms produce ensemble files - confirm that one of the expected algorithm-specific ensembles (e.g.
data1-ml/omicsintegrator1-ensemble-pathway.txt
) is correct
Are these tests too hard in our current framework?
This design is great. I confirmed that if I add another parameter combination to the config file for MEO and then rerun Snakemake, then an ensemble pathway and ML outputs are created for MEO.
No description provided.