-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary #1519
Conversation
…e, removes unreachable code and adds tests for _load_from_local_source
…into release/v3 * 'release/v3' of https://github.com/scaliseraoul/pandas-ai: ci: fix lint refactor: remove workspace
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
…into release/v3 * 'release/v3' of https://github.com/scaliseraoul/pandas-ai: refactor(loader): renames _load_from_source to _load_from_local_source removes unreachable code and adds tests for _load_from_local_source (Sinaptik-AI#1514)
…gical and business validation
…over the code instead of the dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to a43c1db in 1 minute and 44 seconds
More details
- Looked at
944
lines of code in9
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. pandasai/data_loader/loader.py:65
- Draft comment:
Add a validation for unsupportedsource_type
before proceeding with the else block. This will prevent potential runtime errors if thesource_type
is neither local nor remote. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code has been refactored to use a SemanticLayerSchema class which would handle validation during instantiation. The old _validate_source_type method was removed because this validation is now handled at schema creation. The else block is safe because invalid types would fail earlier during schema creation.
I could be wrong about what validation the SemanticLayerSchema class performs. I don't have access to that class's code.
Even without seeing the SemanticLayerSchema class, the removal of the explicit validation and the use of a strongly-typed schema class strongly suggests this validation is handled elsewhere. The change appears intentional and well-structured.
The comment should be deleted as the validation concern is already handled by the SemanticLayerSchema class during instantiation.
2. pandasai/data_loader/loader.py:231
- Draft comment:
Consider adding a check for unsupported cache formats in_cache_data
and raise an exception if encountered. This will ensure robustness against unexpected formats. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The format validation is already handled by the schema validation and other methods. Adding another check here would be redundant. The code is already robust against unexpected formats through the schema validation layer. The comment is suggesting defensive programming that isn't necessary given the architecture.
I could be wrong about the schema validation - I don't see the SemanticLayerSchema class definition. Maybe it doesn't actually validate the format.
Even if the schema doesn't validate it, the _get_cache_file_path() method on line 91-93 enforces only parquet or csv formats, so by the time we reach _cache_data(), the format is guaranteed to be valid.
The comment should be deleted because format validation is already handled earlier in the code flow, making additional validation redundant and unnecessary.
3. pandasai/data_loader/query_builder.py:28
- Draft comment:
Converting the table name to lowercase might not be necessary or correct for all databases, as table names can be case-sensitive. Consider handling this conversion based on specific database requirements. - Reason this comment was not posted:
Comment was on unchanged code.
4. pandasai/dataframe/base.py:185
- Draft comment:
The_create_yml_template
method uses hardcoded values for source and destination types and paths. Consider parameterizing these values or providing a way to customize them for better flexibility. - Reason this comment was not posted:
Comment was on unchanged code.
5. pandasai/data_loader/loader.py:156
- Draft comment:
The error message can be improved for clarity and grammar:
raise ValueError(f"Unsupported file format: {format}.")
- Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_1NzYHgQxAg6alDnD
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@field_validator("format") | ||
@classmethod | ||
def is_format_supported(cls, format: str) -> str: | ||
if format not in LOCAL_SOURCE_TYPES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_format_supported
validator checks if the format is in LOCAL_SOURCE_TYPES
, which might be misleading. Consider using a more appropriate constant or list for format validation.
Important
Introduces
SemanticLayerSchema
for schema validation and updates components to utilize it, with corresponding test updates.SemanticLayerSchema
class insemantic_layer_schema.py
for schema validation using Pydantic.DatasetLoader
inloader.py
to useSemanticLayerSchema
for schema handling._apply_transformations()
to useSemanticLayerSchema
.QueryBuilder
inquery_builder.py
to useSemanticLayerSchema
.DataFrame
inbase.py
to useSemanticLayerSchema
for schema handling._create_yml_template()
to useSemanticLayerSchema
.VirtualDataFrame
invirtual_dataframe.py
to useSemanticLayerSchema
.test_loader.py
,test_query_builder.py
,test_semantic_layer_schema.py
, andtest_dataframe.py
to reflect changes in schema handling.SemanticLayerSchema
validation and functionality.This description was created by for a43c1db. It will automatically update as commits are pushed.