-
Notifications
You must be signed in to change notification settings - Fork 30
Created base structure for adding discovery and other algorithms #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Amit Sharma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR establishes the foundational structure for a causal discovery library by creating base interfaces and utility functions. The changes introduce a protocol-based design for datasets and skeleton implementations for evaluation metrics.
- Defines a
Dataset
protocol with methods for graph access, data retrieval, and synthetic data generation - Creates placeholder functions for standard causal discovery evaluation metrics (accuracy, precision, recall, F1)
Reviewed Changes
Copilot reviewed 2 out of 8 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
pywhyllm/datasets/dataset.py | Defines the Dataset protocol interface with methods for graph, data, and synthetic data generation |
pywhyllm/datasets/metrics.py | Creates skeleton functions for evaluation metrics used in causal discovery |
Comments suppressed due to low confidence (1)
pywhyllm/datasets/metrics.py:21
- Function name 'F1' should follow Python naming conventions. Consider renaming to 'f1_score' or 'f1' (lowercase).
def F1(edges, true_edges)
Co-authored-by: Copilot <[email protected]> Signed-off-by: Amit Sharma <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Amit Sharma <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Amit Sharma <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Amit Sharma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I added a few questions about the structure.
Also: do we want this dataset protocol to integrate well with data handling in other pywhyllm libraries? Or is this only for self-contained pywhy-llm benchmarking for example?
|
||
class Dataset(Protocol): | ||
|
||
def graph(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to return the ground truth graph, or is this function intended to execute the PyWhyLLM functions to derive a candidate graph?
""" | ||
pass | ||
|
||
def generate_data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about merging data() and generate_data() into a single function? Then some Dataset objects might be synthetic Datasets, and some might be grounded datasets (real data)?
@@ -0,0 +1,25 @@ | |||
|
|||
|
|||
def accuracy(edges, true_edges): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given edges and true_edges, is the calculation of accuracy, precision, and recall specific to the dataset?
New files added that convey the directory structure.
This PR does not add any functionality, just the protocol class and structure. The goal is to enable collaborators to contribute code.