-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Enhance Schema adapter to accommodate evolving struct #15295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
kosiew
wants to merge
156
commits into
apache:main
Choose a base branch
from
kosiew:schema-adapter
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
156 commits
Select commit
Hold shift + click to select a range
c8236ed
feat: implement NestedStructSchemaAdapter for handling schema evoluti…
kosiew afbe1ed
feat: enhance NestedStructSchemaAdapter with schema mapping capabilities
kosiew c774cab
test: add schema mapping test for NestedStructSchemaAdapter
kosiew 5f5cd45
feat: implement NestedStructSchemaAdapterFactory for handling nested …
kosiew 6065bc1
test: add unit test for NestedStructSchemaAdapterFactory to validate …
kosiew 410f8d7
test: refactor test_create_appropriate_adapter for clarity and effici…
kosiew 50cf134
feat: enhance create_appropriate_adapter to support nested schema tra…
kosiew 3f52617
refactor: simplify create_appropriate_adapter logic for nested schema…
kosiew ad74d3a
refactor: remove redundant default adapter test in nested schema adapter
kosiew 134dace
feat: enhance NestedStructSchemaAdapter to support additional table s…
kosiew aa89671
refactor: simplify test_nested_struct_evolution
kosiew f361311
refactor: streamline schema creation in nested schema adapter tests
kosiew a914a6b
Fix clippy errors
kosiew d8eb3eb
test: add async test for schema evolution with compaction in NestedSt…
kosiew 1735b45
refactor: add missing imports and clean up test code in nested_schema…
kosiew 72aee85
Rollback to before adding test_datafusion_schema_evolution_with_compa…
kosiew 772fbce
feat: add nested_struct.rs to test nested schema evolution test with …
kosiew 20af2c0
chore: remove nested_struct.rs example file to streamline repository …
kosiew 3c0844c
feat: Add nested_struct.rs async function for schema evolution with c…
kosiew ad09e60
feat: Enhance logging in nested_struct.rs for better traceability 📜✨
kosiew 61f1f6e
created helper functions
kosiew 16a47d3
map batch1 to schema2
kosiew 7b7183e
feat: Enhance NestedStructSchemaAdapter with custom schema mapping fo…
kosiew 84ab195
feat: Add debug print statements to map_batch for tracing execution f…
kosiew 51dacc5
fix: Refactor nested schema mapping for improved error handling and c…
kosiew aa5128a
refactor: Remove debug print statements for cleaner code execution 🧹✨
kosiew 839bf61
nested_struct - plug adapter into ListingTableConfig
kosiew 2e99158
feat: Add optional schema adapter factory to ListingTableConfig for e…
kosiew fe7ff84
feat: Add optional schema adapter factory to FileScanConfig for enhan…
kosiew 3689140
feat: Enhance ListingTableConfig to support schema adapter factory fo…
kosiew 76fbc6f
struct NestedStructSchemaMapping - remove table_schema, file_schema
kosiew f2d6b60
refactor: Remove nested_struct.rs example for schema evolution and co…
kosiew 6b7fed9
style: Fix comment tests in ListingOptions documentation 📜✨
kosiew 2cef654
Merge branch 'main' into test-merge
kosiew 565ad5c
SchemaMapping remove table_schema, nested_schema_adapter remove map_p…
kosiew 778da1e
docs: Update comments for schema_adapter_factory in ListingTableConfi…
kosiew f066e59
refactor: Extract schema adapter preservation logic into a helper fun…
kosiew 4cc5f77
refactor: Extract schema adapter application logic into a dedicated f…
kosiew b6a828c
docs: Enhance adapt_fields documentation with performance considerati…
kosiew 41fb40c
docs: Add detailed documentation for RecordBatch mapping in NestedStr…
kosiew 3133cd7
refactor: Add missing import for FileSource in ListingTable implement…
kosiew 5ad6287
refactor: Update license documentation comments for NestedSchemaAdapt…
kosiew 8fa34da
refactor: Remove unused file_scan_exec.rs to clean up the codebase 🗑️✨
kosiew d229dd3
refactor: Remove unused file_scan_config.rs to streamline the codebas…
kosiew ff41c43
Moved the adapt_column method from NestedStructSchemaMapping to a sta…
kosiew 2df74b6
Fix Clippy errors
kosiew bb4a5de
docs: Correct the struct names in documentation for NestedStructSchem…
kosiew a8cce59
Merge branch 'main' into schema-adapter
kosiew f547355
fix: remove unnecessary clone in create_physical_plan call for Listin…
kosiew fa7c17f
refactor: rename preserve_schema_adapter_factory to preserve_conf_sch…
kosiew e9c93d6
refactor: rename create_appropriate_adapter to create_adapter for cla…
kosiew 64a4e3f
feature gate parquet
kosiew dd9f66d
Trigger CI
kosiew ca511df
refactor: mod tests, add user_infos
kosiew 54590f4
feat: expose nested schema adapter and source for improved data handl…
kosiew 50f67cb
Merge branch 'main' into schema-adapter
kosiew 18a368e
Resolve merge conflict
kosiew 42bb782
Refactor schema adapter application in ListingTable
kosiew 0f52160
Merge branch 'main' into schema-adapter
kosiew 52044da
Merge branch 'main' into schema-adapter
kosiew 81d2a25
trigger ci
kosiew 30db2d7
Merge branch 'main' into schema-adapter
kosiew 90d260b
feat: add column statistics mapping for NestedStructSchemaMapping
kosiew f07dfdc
feat: add column statistics mapping for NestedStructSchemaMapping
kosiew 0d7728f
add tests
kosiew 6314b24
test: add helper functions for readability
kosiew eee5566
refactor: simplify DataType usage in NestedStructSchemaAdapter
kosiew 5cf3a3c
fix: update timestamp array casting to include timezone metadata
kosiew 25af310
streamline the tests to ensure no duplicate
kosiew 5167825
verify_column_statistics - include expected_sum
kosiew 38e4dc5
Merge branch 'schema-adapter-2' into schema-adapter
kosiew 09d4b65
Copy license header
kosiew 81f7ea5
Merge branch 'main' into schema-adapter
kosiew bd207e6
fix clippy errors
kosiew 5257b44
Add nested_struct to test schema adaptation
kosiew c130bc6
fix: correct adapter creation method in schema evolution test
kosiew 759e678
fix: update schema references in schema evolution test
kosiew 039306e
amend create_batch to create a batch with fields as per schema withou…
kosiew b30e76f
fix: remove unnecessary Arc wrapping in create_array_for_field
kosiew c9192b5
feat: enhance logging in schema evolution test for better traceability
kosiew 9bb3a5f
refactor: rename test function and remove old schema4 definition for …
kosiew 4752b2c
feat: add logging for field names in create_batch and enhance timesta…
kosiew 64d3a56
fix: replace create_batch2 with create_batch in schema evolution test
kosiew 5243f7a
refactor: update schema adapter creation and mapping in schema evolut…
kosiew 71ae846
pass adapter_factory to listing table config
kosiew 41b6edd
refactor: streamline schema evolution test by creating a helper funct…
kosiew d4cdf2d
feat: add debug logging for column counts in PartitionColumnProjector
kosiew b6df0b3
Fix clippy errors
kosiew 3d381c1
Merge branch 'main' into schema-adapter
kosiew a71c6f0
refactor: reorder test file paths
kosiew 2834842
add jobs.parquet, nested_struct2.rs
kosiew b951294
feat: add debug logging for column mismatch in PartitionColumnProjector
kosiew 9b53a88
fix: replace debug logging with println for column mismatch in Partit…
kosiew 27a32bc
remove compacting section, test with select * query
kosiew 67e080a
refactor: remove compacted parquet file writing and update SQL query …
kosiew 65cfdc3
add jobs.parquet, amend nested_struct2 not to delete it
kosiew 63132b2
remove adapter_factory
kosiew 854f2ce
refactor: remove schema adapter factory and reorder test file paths
kosiew 087c815
add adapter_factory
kosiew ee41f12
Merge branch 'main' into schema-adapter
kosiew 8692877
fix: ListingTableConfig remove schema
kosiew 37ecc57
fix: Simplify paths in test_datafusion_schema_evolution and add resul…
kosiew efda93d
fix: Enhance schema adaptation for projection in nested struct fields…
kosiew 02f7c33
nested_struct2 use adapter_factory
kosiew 034499f
fix cargo fmt error
kosiew f241545
fix: Add NestedStructSchemaAdapterFactory import in nested_struct2 ex…
kosiew 3b50aa9
fix: amend create_schema4
kosiew 54f90a0
fix: add query results display in schema evolution test
kosiew 4ecc450
chore: remove unused nested_struct and nested_struct2 examples, and d…
kosiew df66922
fix: remove debug print statements from file_scan_config.rs
kosiew e238c10
refactor fn map_schema in schema_adapter.rs, nested_schema_adapter.rs…
kosiew ed6d2c3
fix: add missing Field import in schema_adapter.rs
kosiew c2264d3
refactor: extract can_cast_field helper function to improve code read…
kosiew 6018c24
refactor: remove unused create_schema_mapping function to clean up code
kosiew 091bb6a
test: amend create_nested_schema to include original user and timesta…
kosiew 2b7f9df
Merge branch 'main' into test-merge
kosiew 57d8671
doc: enhance documentation for with_schema_adapter_factory in Listing…
kosiew b28eb92
feat: add schema evolution support for FileSource with extension trait
kosiew 1903158
refactor: remove unused imports in nested_schema_adapter.rs
kosiew 3cb816b
refactor: remove adapt_fields function and related schema adaptation …
kosiew 792fc20
refactor: remove adapt_schema tests from NestedStructSchemaAdapter
kosiew 008e6ad
fix: correct cloning of self in with_schema_adapter_factory method
kosiew dc11478
refactor: enhance with_schema_adapter method for dynamic schema adapt…
kosiew 7eeba38
fully qualif <Arc<dyn FileSource> as FileSourceExt>
kosiew baef480
FileSourceExt-change self to source
kosiew 61226a0
refactor: use fully qualified syntax for with_schema_adapter method i…
kosiew afc87cd
refactor: simplify with_schema_adapter_factory method by using mutabl…
kosiew 581379a
refactor: update with_schema_adapter method to use self instead of so…
kosiew 5494de1
refactor: cast self to Arc<dyn FileSource> for compatibility in FileS…
kosiew 7d9d038
refactor: simplify with_schema_adapter method by removing explicit Ar…
kosiew 2b80a63
refactor: enhance ListingTableConfig by implementing Default trait an…
kosiew 1b0f83c
refactor: removing apply_schema_adapter_to_source function
kosiew 4f9aba6
refactor: simplify schema adapter factory handling in ParquetSource
kosiew 085ae46
refactor: impl FileSourceExt for dyn FileSource
kosiew 2a37983
refactor: remove apply_schema_adapter_to_source function, integrate i…
kosiew 54acd98
refactor: rename with_factory to with_schema_adapter_factory for clarity
kosiew 3018b02
refactor: update schema adapter factory methods to use Option type fo…
kosiew 7154234
refactor: remove with_schema_adapter_factory_opt method
kosiew 64c1691
refactor: enhance schema adapter factory handling in ParquetSource
kosiew 5eede31
refactor: simplify SchemaMapping instantiation in DefaultSchemaAdapter
kosiew d99556b
refactor: improve documentation for create_field_mapping and SchemaMa…
kosiew c8d642f
test: add unit tests for schema mapping happy and error paths
kosiew 3243ab7
refactor: add with_schema_adapter_factor directly to FileSource
kosiew 84f0991
refactor: add Sized constraint to with_schema_adapter_factory method …
kosiew 58dd0d9
refactor: update with_schema_adapter_factory method to indicate defau…
kosiew 3ef15c1
revert to before add with_schema_adapter to FileSource
kosiew 1804498
refactor: FileSource implement with_schema_adapter_factory
kosiew f69b80e
refactor: add schema_adapter_factory support to CsvSource
kosiew 0a5db8b
refactor: reintroduce From implementation for ParquetSource and add g…
kosiew 9aeaacc
Revert "refactor: reintroduce From implementation for ParquetSource a…
kosiew 9dc95be
refactor: add as_file_source helper function for FileSource conversion
kosiew a56b05c
refactor: implement From trait for CsvSource to use as_file_source he…
kosiew 8e744f6
refactor: enhance JsonSource with schema adapter factory support and …
kosiew cd65627
refactor: remove unused ParquetSource import from table.rs
kosiew 3aab6ec
refactor: add schema adapter factory support to ArrowSource
kosiew 248e276
refactor: update TestSource to support schema adapter factory
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at a high level, it makes a lot of sense to provide the schema adapter factory to the listing table