Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jmafoster1/remove data collector #308

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

jmafoster1
Copy link
Contributor

@jmafoster1 jmafoster1 commented Feb 17, 2025

  • DataCollector classes have been removed
  • ObservationalDataCollector has been replaced by directly passing dataframes
  • ExperimentalDataCollector has been replaced by ExperimentalEstimator class to directly run models
  • Removed test generation and execution functionality from MetamorphicRelation since this can be done via the JSON front end
  • Moved MetamorphicRelation classes into the testing package instead of specification
  • Changed estimators (except IPCWEstimator) to take a BaseTestCase instead of a treatment and outcome variable

Copy link

github-actions bot commented Feb 17, 2025

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 29 0 0.85s
✅ PYTHON pylint 29 0 4.69s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Feb 17, 2025

Codecov Report

Attention: Patch coverage is 98.93048% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.64%. Comparing base (3a6e2df) to head (28c99b7).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
causal_testing/testing/causal_test_result.py 50.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #308      +/-   ##
==========================================
- Coverage   96.87%   94.64%   -2.23%     
==========================================
  Files          29       25       -4     
  Lines        1856     1345     -511     
==========================================
- Hits         1798     1273     -525     
- Misses         58       72      +14     
Files with missing lines Coverage Δ
causal_testing/estimation/abstract_estimator.py 100.00% <100.00%> (ø)
...esting/estimation/abstract_regression_estimator.py 96.00% <100.00%> (-1.96%) ⬇️
...ausal_testing/estimation/cubic_spline_estimator.py 96.55% <100.00%> (+0.12%) ⬆️
...ausal_testing/estimation/experimental_estimator.py 100.00% <100.00%> (ø)
...ting/estimation/instrumental_variable_estimator.py 100.00% <100.00%> (ø)
causal_testing/estimation/ipcw_estimator.py 99.25% <100.00%> (+<0.01%) ⬆️
..._testing/estimation/linear_regression_estimator.py 100.00% <100.00%> (ø)
...esting/estimation/logistic_regression_estimator.py 100.00% <100.00%> (ø)
causal_testing/specification/causal_dag.py 99.49% <100.00%> (+0.53%) ⬆️
causal_testing/specification/scenario.py 50.00% <100.00%> (-30.77%) ⬇️
... and 7 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08ab5aa...28c99b7. Read the comment docs.

@jmafoster1 jmafoster1 marked this pull request as ready for review February 17, 2025 13:23
@jmafoster1 jmafoster1 requested a review from f-allian February 17, 2025 13:23
@jmafoster1 jmafoster1 self-assigned this Feb 17, 2025
@jmafoster1
Copy link
Contributor Author

This PR looks to drop the overall percentage code coverage, but that's because I've ripped out a bunch of code, so the proportions have changed slightly.

@jmafoster1
Copy link
Contributor Author

Resolved merging conflicts from main

Copy link
Contributor

@f-allian f-allian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmafoster1 Thanks Michael, this all looks good. However, we should also update our documentation to reflect these changes too. If you don't have the time let me know and I'll take it on 👍🏼

@jmafoster1
Copy link
Contributor Author

Good point. Done

@jmafoster1 jmafoster1 requested a review from f-allian February 17, 2025 15:44
@f-allian
Copy link
Contributor

This is because there's now an explicit test attribute on each node in the DAG which needs to be set to True for a test to be generated.

Where? I can't find any test attributes in metamorphic_relation.py. Also, there are a lot of type hints that don't correspond to the inputs/outputs so it'd be worth fixing that.

but I guess we can do that here. As for the generation folder, it can go if necessary, but I'm not as desperate to get rid of it as the data collector classes. It probably will go though since it never really amounted to anything.

Yeah it makes sense to do that in this PR. I'll remove these.

@jmafoster1
Copy link
Contributor Author

Where? I can't find any test attributes in metamorphic_relation.py.

Line 208 where we generate nodes_to_test. I changed the default behaviour so that only nodes which have test=False will not be tested (rather than only testing nodes with test=True).

@jmafoster1
Copy link
Contributor Author

Also, there are a lot of type hints that don't correspond to the inputs/outputs so it'd be worth fixing that.

Please could you handle that one?

Yeah it makes sense to do that in this PR. I'll remove these.

I've just been looking into this. It's actually turning out to be a bit complicated because it's quite baked into the JSON frontend. I'm thinking it might be worth taking this opportunity to rebuild the json frontend as the main entry point rather than just a utility class that needs to be instantiated.

@jmafoster1 jmafoster1 marked this pull request as draft February 18, 2025 11:35
@jmafoster1
Copy link
Contributor Author

Once I'd got rid of the causal test suite and the abstract causal test case classes, stripping out the rest of the Z3 references was actually pretty straightforward. One thing we do need to be careful of now though is that the scenario.constraints currently doesn't interact with the data at all (i.e. we're not doing df.query anywhere) because Scenario objects are separate from causal test cases. This will need to be done in the entrypoint. To be honest, I quite often skip creating them altogether, but let's not strip them out too!

@jmafoster1 jmafoster1 marked this pull request as ready for review February 18, 2025 16:05
@jmafoster1 jmafoster1 marked this pull request as draft February 19, 2025 15:14
@jmafoster1
Copy link
Contributor Author

I think this is ready for review now. Absolute monster of a PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants