feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary #1519

scaliseraoul · 2025-01-14T16:46:54Z

Some of the unit tests were removed as duplicates of the newly created for SemanticLayerSchema
Some checks on the scheme were removed because they are now redundant with pydantic checks

Tests added and passed if fixing a bug or adding a new feature.
All code checks passed.

Important

Introduces SemanticLayerSchema for schema validation and updates components to utilize it, with corresponding test updates.

SemanticLayerSchema:
- Introduces SemanticLayerSchema class in semantic_layer_schema.py for schema validation using Pydantic.
- Validates source type, connection, and table for remote sources.
- Validates column types and transformation types.
DatasetLoader:
- Updates DatasetLoader in loader.py to use SemanticLayerSchema for schema handling.
- Removes redundant source type validation.
- Refactors _apply_transformations() to use SemanticLayerSchema.
QueryBuilder:
- Updates QueryBuilder in query_builder.py to use SemanticLayerSchema.
- Refactors query building methods to use schema attributes directly.
DataFrame:
- Updates DataFrame in base.py to use SemanticLayerSchema for schema handling.
- Refactors _create_yml_template() to use SemanticLayerSchema.
VirtualDataFrame:
- Updates VirtualDataFrame in virtual_dataframe.py to use SemanticLayerSchema.
Tests:
- Updates unit tests in test_loader.py, test_query_builder.py, test_semantic_layer_schema.py, and test_dataframe.py to reflect changes in schema handling.
- Adds tests for SemanticLayerSchema validation and functionality.

^{This description was created by}^{for a43c1db. It will automatically update as commits are pushed.}

…e, removes unreachable code and adds tests for _load_from_local_source

…into release/v3 * 'release/v3' of https://github.com/scaliseraoul/pandas-ai: ci: fix lint refactor: remove workspace

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

…into release/v3 * 'release/v3' of https://github.com/scaliseraoul/pandas-ai: refactor(loader): renames _load_from_source to _load_from_local_source removes unreachable code and adds tests for _load_from_local_source (Sinaptik-AI#1514)

…gical and business validation

…over the code instead of the dictionary

ellipsis-dev

❌ Changes requested. Reviewed everything up to a43c1db in 1 minute and 44 seconds

More details

Looked at 944 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. pandasai/data_loader/loader.py:65

Draft comment:
Add a validation for unsupported source_type before proceeding with the else block. This will prevent potential runtime errors if the source_type is neither local nor remote.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code has been refactored to use a SemanticLayerSchema class which would handle validation during instantiation. The old _validate_source_type method was removed because this validation is now handled at schema creation. The else block is safe because invalid types would fail earlier during schema creation.
I could be wrong about what validation the SemanticLayerSchema class performs. I don't have access to that class's code.
Even without seeing the SemanticLayerSchema class, the removal of the explicit validation and the use of a strongly-typed schema class strongly suggests this validation is handled elsewhere. The change appears intentional and well-structured.
The comment should be deleted as the validation concern is already handled by the SemanticLayerSchema class during instantiation.

2. pandasai/data_loader/loader.py:231

Draft comment:
Consider adding a check for unsupported cache formats in _cache_data and raise an exception if encountered. This will ensure robustness against unexpected formats.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The format validation is already handled by the schema validation and other methods. Adding another check here would be redundant. The code is already robust against unexpected formats through the schema validation layer. The comment is suggesting defensive programming that isn't necessary given the architecture.
I could be wrong about the schema validation - I don't see the SemanticLayerSchema class definition. Maybe it doesn't actually validate the format.
Even if the schema doesn't validate it, the _get_cache_file_path() method on line 91-93 enforces only parquet or csv formats, so by the time we reach _cache_data(), the format is guaranteed to be valid.
The comment should be deleted because format validation is already handled earlier in the code flow, making additional validation redundant and unnecessary.

3. pandasai/data_loader/query_builder.py:28

Draft comment:
Converting the table name to lowercase might not be necessary or correct for all databases, as table names can be case-sensitive. Consider handling this conversion based on specific database requirements.
Reason this comment was not posted:
Comment was on unchanged code.

4. pandasai/dataframe/base.py:185

Draft comment:
The _create_yml_template method uses hardcoded values for source and destination types and paths. Consider parameterizing these values or providing a way to customize them for better flexibility.
Reason this comment was not posted:
Comment was on unchanged code.

5. pandasai/data_loader/loader.py:156

Draft comment:
The error message can be improved for clarity and grammar:

            raise ValueError(f"Unsupported file format: {format}.")

Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_1NzYHgQxAg6alDnD

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-01-14T16:48:47Z

pandasai/data_loader/semantic_layer_schema.py

+    @field_validator("format")
+    @classmethod
+    def is_format_supported(cls, format: str) -> str:
+        if format not in LOCAL_SOURCE_TYPES:


The is_format_supported validator checks if the format is in LOCAL_SOURCE_TYPES, which might be misleading. Consider using a more appropriate constant or list for format validation.

scaliseraoul and others added 8 commits January 13, 2025 15:32

refactor(loader): renames _load_from_source to _load_from_local_sourc…

c479eee

…e, removes unreachable code and adds tests for _load_from_local_source

chore: fixing ruff formatting

5943901

Merge branch 'release/v3' of https://github.com/scaliseraoul/pandas-ai …

01c125b

…into release/v3 * 'release/v3' of https://github.com/scaliseraoul/pandas-ai: ci: fix lint refactor: remove workspace

Update pandasai/data_loader/loader.py

1b4f334

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

feat(SemanticLayerSchema): Adding a SemanticLayerSchema class with lo…

c5afb38

…gical and business validation

feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all …

fc4a625

…over the code instead of the dictionary

conflict resolution

a43c1db

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 14, 2025

ellipsis-dev bot reviewed Jan 14, 2025

View reviewed changes

scaliseraoul changed the title ~~feat(SemanticLayerSchema): Adding a SemanticLayerSchema class with logical and business validation~~ feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary Jan 14, 2025

scaliseraoul closed this Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary #1519

feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary #1519

scaliseraoul commented Jan 14, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Jan 14, 2025

feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary #1519

feat(SemanticLayerSchema): Refactoring using SemanticLayerSchema all over the code instead of the dictionary #1519

Conversation

scaliseraoul commented Jan 14, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Jan 14, 2025

Choose a reason for hiding this comment

scaliseraoul commented Jan 14, 2025 •

edited by ellipsis-dev bot

Loading