Enhance Schema adapter to accommodate evolving struct #15295

kosiew · 2025-03-18T08:43:37Z

Which issue does this PR close?

Closes Datafusion can't seem to cast evolving structs #14757.

Rationale for this change

arrow-rs suggests that SchemaAdapter is better approach for handling evolving struct.
This change introduces support for evolving nested schemas in file-based data sources, particularly for Parquet. In many real-world data ingestion pipelines, schemas evolve over time — especially in nested fields — and systems need to be able to read historical and new data seamlessly. This patch provides infrastructure to adapt such evolving schemas dynamically without breaking query execution.

What changes are included in this PR?

Introduced NestedStructSchemaAdapter and NestedStructSchemaAdapterFactory to handle schema evolution in nested fields.
Enhanced the ListingTableConfig and ListingTable to include and propagate an optional schema_adapter_factory.
Added logic in the physical plan creation to apply schema adapters to FileSource implementations like ParquetSource.
Ensured ParquetFormat respects and preserves schema adapter factories during physical plan creation.
Added helper function preserve_schema_adapter_factory to maintain schema adaptation context in ParquetSource.
Added comprehensive unit tests for nested schema adaptation, including:
- Schema adaptation logic for nested structs.
- Schema mapping and projection logic.
- Record batch transformation with nested structs and missing fields.

Are these changes tested?

✅ Yes.

The patch includes extensive unit tests covering:

Basic and advanced nested struct adaptation.
Schema mapping consistency.
Record batch transformation.
Error cases and fallback behaviors.

These tests ensure correct and predictable behavior when handling evolving nested schemas.

Are there any user-facing changes?

✅ Yes, but non-breaking.

Users can now provide a schema_adapter_factory when constructing a ListingTableConfig.
This enables schema evolution support (including nested structs) for supported formats like Parquet.

🔁 If no schema_adapter_factory is provided, behavior remains unchanged, ensuring backward compatibility.

…on of nested structs

- Refactored adapt_fields method to accept Fields instead of Field arrays for better type handling. - Added create_schema_mapper method to facilitate mapping between source and target schemas. - Updated map_column_index and map_schema methods to improve schema adaptation and mapping logic. - Enhanced test cases to validate nested struct evolution with new schema mappings.

…struct schema evolution - Added NestedStructSchemaAdapterFactory to create schema adapters that manage nested struct fields. - Introduced methods for creating appropriate schema adapters based on schema characteristics. - Implemented checks for nested struct fields to enhance schema evolution handling.

…adapter selection and schema handling

…ency

…nsformations - Added an optional source schema parameter to create_appropriate_adapter for better handling of nested structs. - Updated logic to return NestedStructSchemaAdapter when adapting between schemas with different structures or when the source schema contains nested structs. - Improved default case handling for simple schemas. - Added a new test case to validate the adaptation from a simple schema to a nested schema, ensuring correct field mapping and structure.

… handling

This commit eliminates the test for the default adapter's failure with nested schema transformations, streamlining the test suite. The focus is now on validating the functionality of the NestedStructSchemaAdapter, which is designed to handle missing nested fields effectively.

…chema handling - Updated the `create` method in `NestedStructSchemaAdapterFactory` to accept and utilize the full table schema. - Modified the `NestedStructSchemaAdapter` to store both projected and full table schemas for improved schema adaptation. - Refactored the `adapt_schema` method to use the full table schema for field adaptation. - Added helper functions to create basic and enhanced nested schemas for testing. - Updated tests to validate the new schema handling logic, ensuring compatibility with nested structures.

kosiew · 2025-03-18T08:44:51Z

datafusion/datasource/src/nested_schema_adapter.rs

+        assert!(default_result.is_err());
+        if let Err(e) = default_result {
+            assert!(
+                format!("{}", e).contains("Cannot cast file schema field metadata"),


This is similar to the error mentioned in #14757

Error: Plan("Cannot cast file schema field additionalInfo of type Struct([Field { name: \"location\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"timestamp_utc\", data_type: Timestamp(Millisecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"reason\", data_type: Struct([Field { name: \"_level\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"details\", data_type: Struct([Field { name: \"rurl\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"s\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"t\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) to table schema field of type Struct([Field { name: \"location\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"timestamp_utc\", data_type: Timestamp(Millisecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])

TheBuilderJR · 2025-03-20T04:31:41Z

@kosiew any chance you can try running the test case in #14757? It's a real world example of schema evolution that I hope can be solved with your PR. I managed to solve it with #15259, but it seems quite different than your PR so I'm curious if your PR also solves it.

…ructSchemaAdapter - Introduced a new asynchronous test `test_datafusion_schema_evolution_with_compaction` to validate schema evolution and data compaction functionality. - Added necessary imports for the new test, including `RecordBatch`, `SessionContext`, and various array types. - Created two sample schemas and corresponding record batches to simulate data before and after schema evolution. - Implemented logic to write the record batches to Parquet files and read them back to ensure data integrity. - Verified that the results from the compacted data match the original data, ensuring the correctness of the schema evolution process.

…_adapter

…ction

…NestedStructSchemaAdapter - Added a new example in nested_struct.rs to demonstrate schema evolution using NestedStructSchemaAdapter. - Created two parquet files with different schemas: one without the 'reason' field and one with it. - Implemented logic to read and write these parquet files, showcasing the handling of nested structures. - Added detailed logging to track the process and results of the schema evolution test. - Included assertions to verify the correctness of the data and schema in the compacted output. 🎉 This enhances the testing capabilities for nested schemas in DataFusion! 🚀

…structure 🚮

…ompaction in DataFusion examples 📊✨ - Implemented `test_datafusion_schema_evolution_with_compaction` to demonstrate schema evolution and data compaction using Parquet files. - Created two schemas and corresponding record batches to simulate data processing. - Added logic to write and read Parquet files, ensuring data integrity and compactness. - Registered tables in the session context and executed SQL queries to validate results. - Cleaned up temporary files after execution to maintain a tidy environment. 🗑️

kosiew · 2025-03-21T12:42:06Z

hi @TheBuilderJR

I haven't completed the PR yet.

Here's the interim progress I used the NestedSchemaAdapter in test_datafusion_schema_evolution_with_compaction.
My next step is to plug in NestedSchemaAdapter somewhere in ListingTableConfig.

use datafusion::arrow::array::{
    Array, Float64Array, StringArray, StructArray, TimestampMillisecondArray,
};
use datafusion::arrow::datatypes::{DataType, Field, Schema, TimeUnit};
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::dataframe::DataFrameWriteOptions;
use datafusion::datasource::file_format::parquet::ParquetFormat;
use datafusion::datasource::listing::{
    ListingOptions, ListingTable, ListingTableConfig, ListingTableUrl,
};
use datafusion::datasource::nested_schema_adapter::NestedStructSchemaAdapterFactory;
use datafusion::prelude::*;
use std::error::Error;
use std::fs;
use std::sync::Arc;

async fn test_datafusion_schema_evolution_with_compaction() -> Result<(), Box<dyn Error>>
{
    let ctx = SessionContext::new();

    let schema1 = create_schema1();
    let schema2 = create_schema2();

    let batch1 = create_batch1(&schema1)?;

   // adapter start
    let adapter = NestedStructSchemaAdapterFactory::create_appropriate_adapter(
        schema2.clone(),
        schema2.clone(),
    );

    let (mapping, _) = adapter
        .map_schema(&schema1.clone())
        .expect("map schema failed");
    let mapped_batch = mapping.map_batch(batch1)?;  
    // adapter end

    let path1 = "test_data1.parquet";
    let _ = fs::remove_file(path1);

    let df1 = ctx.read_batch(mapped_batch)?;
    df1.write_parquet(
        path1,
        DataFrameWriteOptions::default()
            .with_single_file_output(true)
            .with_sort_by(vec![col("timestamp_utc").sort(true, true)]),
        None,
    )
    .await?;

    let batch2 = create_batch2(&schema2)?;

    let path2 = "test_data2.parquet";
    let _ = fs::remove_file(path2);

    let df2 = ctx.read_batch(batch2)?;
    df2.write_parquet(
        path2,
        DataFrameWriteOptions::default()
            .with_single_file_output(true)
            .with_sort_by(vec![col("timestamp_utc").sort(true, true)]),
        None,
    )
    .await?;

    let paths_str = vec![path1.to_string(), path2.to_string()];

    let config = ListingTableConfig::new_with_multi_paths(
        paths_str
            .into_iter()
            .map(|p| ListingTableUrl::parse(&p))
            .collect::<Result<Vec<_>, _>>()?,
    )
    .with_schema(schema2.as_ref().clone().into());

    let config = config.infer(&ctx.state()).await?;

    let config = ListingTableConfig {
        options: Some(ListingOptions {
            file_sort_order: vec![vec![col("timestamp_utc").sort(true, true)]],
            ..config.options.unwrap_or_else(|| {
                ListingOptions::new(Arc::new(ParquetFormat::default()))
            })
        }),
        ..config
    };

    let listing_table = ListingTable::try_new(config)?;

    ctx.register_table("events", Arc::new(listing_table))?;

    let df = ctx
        .sql("SELECT * FROM events ORDER BY timestamp_utc")
        .await?;

    let results = df.clone().collect().await?;

    assert_eq!(results[0].num_rows(), 2);

    let compacted_path = "test_data_compacted.parquet";
    let _ = fs::remove_file(compacted_path);

    df.write_parquet(
        compacted_path,
        DataFrameWriteOptions::default()
            .with_single_file_output(true)
            .with_sort_by(vec![col("timestamp_utc").sort(true, true)]),
        None,
    )
    .await?;

    let new_ctx = SessionContext::new();
    let config = ListingTableConfig::new_with_multi_paths(vec![ListingTableUrl::parse(
        compacted_path,
    )?])
    .with_schema(schema2.as_ref().clone().into())
    .infer(&new_ctx.state())
    .await?;

    let listing_table = ListingTable::try_new(config)?;
    new_ctx.register_table("events", Arc::new(listing_table))?;

    let df = new_ctx
        .sql("SELECT * FROM events ORDER BY timestamp_utc")
        .await?;
    let compacted_results = df.collect().await?;

    assert_eq!(compacted_results[0].num_rows(), 2);
    assert_eq!(results, compacted_results);

    let _ = fs::remove_file(path1);
    let _ = fs::remove_file(path2);
    let _ = fs::remove_file(compacted_path);

    Ok(())
}

fn create_schema2() -> Arc<Schema> {
    let schema2 = Arc::new(Schema::new(vec![
        Field::new("component", DataType::Utf8, true),
        Field::new("message", DataType::Utf8, true),
        Field::new("stack", DataType::Utf8, true),
        Field::new("timestamp", DataType::Utf8, true),
        Field::new(
            "timestamp_utc",
            DataType::Timestamp(TimeUnit::Millisecond, None),
            true,
        ),
        Field::new(
            "additionalInfo",
            DataType::Struct(
                vec![
                    Field::new("location", DataType::Utf8, true),
                    Field::new(
                        "timestamp_utc",
                        DataType::Timestamp(TimeUnit::Millisecond, None),
                        true,
                    ),
                    Field::new(
                        "reason",
                        DataType::Struct(
                            vec![
                                Field::new("_level", DataType::Float64, true),
                                Field::new(
                                    "details",
                                    DataType::Struct(
                                        vec![
                                            Field::new("rurl", DataType::Utf8, true),
                                            Field::new("s", DataType::Float64, true),
                                            Field::new("t", DataType::Utf8, true),
                                        ]
                                        .into(),
                                    ),
                                    true,
                                ),
                            ]
                            .into(),
                        ),
                        true,
                    ),
                ]
                .into(),
            ),
            true,
        ),
    ]));
    schema2
}

fn create_batch1(schema1: &Arc<Schema>) -> Result<RecordBatch, Box<dyn Error>> {
    let batch1 = RecordBatch::try_new(
        schema1.clone(),
        vec![
            Arc::new(StringArray::from(vec![Some("component1")])),
            Arc::new(StringArray::from(vec![Some("message1")])),
            Arc::new(StringArray::from(vec![Some("stack_trace")])),
            Arc::new(StringArray::from(vec![Some("2025-02-18T00:00:00Z")])),
            Arc::new(TimestampMillisecondArray::from(vec![Some(1640995200000)])),
            Arc::new(StructArray::from(vec![
                (
                    Arc::new(Field::new("location", DataType::Utf8, true)),
                    Arc::new(StringArray::from(vec![Some("USA")])) as Arc<dyn Array>,
                ),
                (
                    Arc::new(Field::new(
                        "timestamp_utc",
                        DataType::Timestamp(TimeUnit::Millisecond, None),
                        true,
                    )),
                    Arc::new(TimestampMillisecondArray::from(vec![Some(1640995200000)])),
                ),
            ])),
        ],
    )?;
    Ok(batch1)
}

fn create_schema1() -> Arc<Schema> {
    let schema1 = Arc::new(Schema::new(vec![
        Field::new("component", DataType::Utf8, true),
        Field::new("message", DataType::Utf8, true),
        Field::new("stack", DataType::Utf8, true),
        Field::new("timestamp", DataType::Utf8, true),
        Field::new(
            "timestamp_utc",
            DataType::Timestamp(TimeUnit::Millisecond, None),
            true,
        ),
        Field::new(
            "additionalInfo",
            DataType::Struct(
                vec![
                    Field::new("location", DataType::Utf8, true),
                    Field::new(
                        "timestamp_utc",
                        DataType::Timestamp(TimeUnit::Millisecond, None),
                        true,
                    ),
                ]
                .into(),
            ),
            true,
        ),
    ]));
    schema1
}

fn create_batch2(schema2: &Arc<Schema>) -> Result<RecordBatch, Box<dyn Error>> {
    let batch2 = RecordBatch::try_new(
        schema2.clone(),
        vec![
            Arc::new(StringArray::from(vec![Some("component1")])),
            Arc::new(StringArray::from(vec![Some("message1")])),
            Arc::new(StringArray::from(vec![Some("stack_trace")])),
            Arc::new(StringArray::from(vec![Some("2025-02-18T00:00:00Z")])),
            Arc::new(TimestampMillisecondArray::from(vec![Some(1640995200000)])),
            Arc::new(StructArray::from(vec![
                (
                    Arc::new(Field::new("location", DataType::Utf8, true)),
                    Arc::new(StringArray::from(vec![Some("USA")])) as Arc<dyn Array>,
                ),
                (
                    Arc::new(Field::new(
                        "timestamp_utc",
                        DataType::Timestamp(TimeUnit::Millisecond, None),
                        true,
                    )),
                    Arc::new(TimestampMillisecondArray::from(vec![Some(1640995200000)])),
                ),
                (
                    Arc::new(Field::new(
                        "reason",
                        DataType::Struct(
                            vec![
                                Field::new("_level", DataType::Float64, true),
                                Field::new(
                                    "details",
                                    DataType::Struct(
                                        vec![
                                            Field::new("rurl", DataType::Utf8, true),
                                            Field::new("s", DataType::Float64, true),
                                            Field::new("t", DataType::Utf8, true),
                                        ]
                                        .into(),
                                    ),
                                    true,
                                ),
                            ]
                            .into(),
                        ),
                        true,
                    )),
                    Arc::new(StructArray::from(vec![
                        (
                            Arc::new(Field::new("_level", DataType::Float64, true)),
                            Arc::new(Float64Array::from(vec![Some(1.5)]))
                                as Arc<dyn Array>,
                        ),
                        (
                            Arc::new(Field::new(
                                "details",
                                DataType::Struct(
                                    vec![
                                        Field::new("rurl", DataType::Utf8, true),
                                        Field::new("s", DataType::Float64, true),
                                        Field::new("t", DataType::Utf8, true),
                                    ]
                                    .into(),
                                ),
                                true,
                            )),
                            Arc::new(StructArray::from(vec![
                                (
                                    Arc::new(Field::new("rurl", DataType::Utf8, true)),
                                    Arc::new(StringArray::from(vec![Some(
                                        "https://example.com",
                                    )]))
                                        as Arc<dyn Array>,
                                ),
                                (
                                    Arc::new(Field::new("s", DataType::Float64, true)),
                                    Arc::new(Float64Array::from(vec![Some(3.14)]))
                                        as Arc<dyn Array>,
                                ),
                                (
                                    Arc::new(Field::new("t", DataType::Utf8, true)),
                                    Arc::new(StringArray::from(vec![Some("data")]))
                                        as Arc<dyn Array>,
                                ),
                            ])),
                        ),
                    ])),
                ),
            ])),
        ],
    )?;
    Ok(batch2)
}

fn main() -> Result<(), Box<dyn Error>> {
    // Create a Tokio runtime for running our async function
    let rt = tokio::runtime::Runtime::new()?;

    // Run the function in the runtime
    rt.block_on(async { test_datafusion_schema_evolution_with_compaction().await })?;

    println!("Example completed successfully!");
    Ok(())
}

TheBuilderJR · 2025-03-23T01:06:29Z

Nice! Fwiw another edge case I found recently that's probably worth testing is a List where the Struct evolves. I ended up solving it by updating list_coersion but curious if you have a better way: https://github.com/apache/datafusion/pull/15259/files

- Added log statements to indicate the start of the test function and the writing of parquet files. - Included logs for successful creation of ListingTable and registration of the table. - Improved visibility into the execution flow by logging SQL query execution and result collection.

kosiew · 2025-05-20T10:25:32Z

This PR is large and cumbersome to review.

I propose to close it and re-implement as:

PR 1: Extract and test core SchemaAdapter helpers

Description
Pull the field‐mapping logic out of DefaultSchemaAdapter into two new helpers and add a simple constructor. Include unit tests for:

can_cast_field: succeeds on castable types, errors otherwise
create_field_mapping: yields the correct projection & mapping on a toy schema
SchemaMapping::new ctor

Changed files

datafusion/datasource/src/schema_adapter.rs
New tests: datafusion/datasource/src/schema_adapter_tests.rs

PR 2: Add schema_adapter_factory to ListingTableConfig (with tests)

Description
Extend ListingTableConfig so users can supply a schema_adapter_factory:

#[derive(Default)] on ListingTableConfig
Add field schema_adapter_factory: Option<Arc<dyn SchemaAdapterFactory>>
Builder method with_schema_adapter_factory(...)
Propagate through new, new_with_multi_paths, and all to_*_config methods

Unit tests in datafusion/core/src/datasource/listing/table_tests.rs verify:

Default config has no factory
with_schema_adapter_factory sets it
Factory survives a round-trip through to_scan_config

Changed files

datafusion/core/src/datasource/listing/table.rs
New tests: datafusion/core/src/datasource/listing/table_tests.rs

PR 3: Hook FileSourceExt & Parquet preservation (with tests)

Description
Allow any FileSource to carry a schema-adapter, and ensure Parquet scans preserve it:

FileSourceExt trait + with_schema_adapter(...) impl in table.rs
In datasource-parquet/src/file_format.rs, add preserve_conf_schema_adapter_factory and call it in scan()
Unit tests in datasource-parquet/src/file_format_tests.rs confirm:
- A ParquetSource with an adapter remains carrying that factory after scan()

Changed files

datafusion/core/src/datasource/listing/table.rs
datafusion/datasource-parquet/src/file_format.rs
New tests: datafusion/datasource-parquet/src/file_format_tests.rs

PR 4: Nested-struct SchemaAdapter implementation & re-exports (with tests)

Description
Ship the full nested-struct adapter plus public re-exports:

nested_schema_adapter.rs with NestedStructSchemaAdapterFactory, NestedStructSchemaAdapter, mapping logic, helpers
Export via:
- datafusion/datasource/src/mod.rs
- datafusion/core/src/datasource/mod.rs
Comprehensive unit tests (already present) cover adapter selection, struct-field evolution, batch mapping, and stats mapping

Changed files

datafusion/datasource/src/nested_schema_adapter.rs
datafusion/datasource/src/mod.rs
datafusion/core/src/datasource/mod.rs

TheBuilderJR · 2025-05-20T19:59:35Z

@kosiew that sounds good but please keep this working branch around on the schema-adapter branch. Maybe you can cherry-pick the changes onto a new branch when you break these changes up?

…pping::new functions

alamb · 2025-05-21T18:53:01Z

This PR is large and cumbersome to review.

I propose to close it and re-implement as:

The break up as you suggest sounds reasonable to me

@mbutrovich and @adriangb perhaps you have some time to look at this PR to see if it makes sense

alamb

Thank you @kosiew -- this looks like a very nice PR. Can't wait to see the smaller chunks

alamb · 2025-05-21T18:53:33Z

datafusion/core/src/datasource/listing/table.rs

+    /// Schema adapters handle schema evolution over time, allowing the table to adapt
+    /// to changes in file schemas. This is particularly useful for handling nested fields
+    /// in formats like Parquet where the schema may evolve.
+    pub fn with_schema_adapter_factory(


at a high level, it makes a lot of sense to provide the schema adapter factory to the listing table

alamb · 2025-05-21T18:54:23Z

datafusion/core/src/datasource/listing/table.rs

@@ -1178,6 +1207,31 @@ impl ListingTable {
    }
 }

+/// Extension trait for FileSource to allow schema evolution support
+pub trait FileSourceExt {


this is unfortunate (that we have something here that depends on parquet). Maybe we can add a with_schema_adapter_factor directly to FileSource 🤔

I agree: to me it makes sense to just make this a method on FileSource

alamb · 2025-05-21T18:55:22Z

datafusion/datasource/src/nested_schema_adapter.rs

+
+/// A SchemaAdapter that handles schema evolution for nested struct types
+#[derive(Debug, Clone)]
+pub struct NestedStructSchemaAdapter {


People have requested something like this for other DataFusion operations (such as CAST and coercing structs to other types) -- I wonder if there is some way we make the logic more reusable 🤔

Something like separate the schema mapping structure and actual logic out of the datasource crate

…in FileSource trait

…lt unimplemented behavior for unsupported file formats

adriangb

This is a beautifully put together PR! That break out will make it much easier to review, although it is already not bad and I was able to go through it pretty easily.

adriangb · 2025-05-21T19:08:10Z

datafusion/core/src/datasource/listing/table.rs

@@ -1178,6 +1207,31 @@ impl ListingTable {
    }
 }

+/// Extension trait for FileSource to allow schema evolution support
+pub trait FileSourceExt {


I agree: to me it makes sense to just make this a method on FileSource

…eneric From for FileSource

…nd add generic From for FileSource" This reverts commit 0a5db8b.

…lper

…conversion to FileSource

kosiew · 2025-05-22T13:51:45Z

Thanks @alamb , @adriangb for the review.

Working on the smaller PRs..

kosiew added 12 commits March 18, 2025 10:44

feat: implement NestedStructSchemaAdapter for handling schema evoluti…

c8236ed

…on of nested structs

test: add schema mapping test for NestedStructSchemaAdapter

c774cab

test: add unit test for NestedStructSchemaAdapterFactory to validate …

6065bc1

…adapter selection and schema handling

test: refactor test_create_appropriate_adapter for clarity and effici…

410f8d7

…ency

refactor: simplify create_appropriate_adapter logic for nested schema…

3f52617

… handling

refactor: simplify test_nested_struct_evolution

aa89671

refactor: streamline schema creation in nested schema adapter tests

f361311

github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate labels Mar 18, 2025

kosiew commented Mar 18, 2025

View reviewed changes

Fix clippy errors

a914a6b

alamb mentioned this pull request Mar 18, 2025

Datafusion can't seem to cast evolving structs #14757

Closed

kosiew added 6 commits March 21, 2025 12:27

refactor: add missing imports and clean up test code in nested_schema…

1735b45

…_adapter

Rollback to before adding test_datafusion_schema_evolution_with_compa…

72aee85

…ction

chore: remove nested_struct.rs example file to streamline repository …

20af2c0

…structure 🚮

kosiew marked this pull request as draft March 21, 2025 12:32

kosiew added 3 commits March 25, 2025 15:20

created helper functions

61f1f6e

map batch1 to schema2

16a47d3

refactor: enhance schema adapter factory handling in ParquetSource

64c1691

kosiew force-pushed the schema-adapter branch from 00fd661 to 64c1691 Compare May 20, 2025 10:12

kosiew mentioned this pull request May 20, 2025

Schema adapter helper #16108

Merged

kosiew added 3 commits May 21, 2025 18:53

refactor: simplify SchemaMapping instantiation in DefaultSchemaAdapter

5eede31

refactor: improve documentation for create_field_mapping and SchemaMa…

d99556b

…pping::new functions

test: add unit tests for schema mapping happy and error paths

c8d642f

alamb reviewed May 21, 2025

View reviewed changes

kosiew added 4 commits May 22, 2025 09:10

refactor: add with_schema_adapter_factor directly to FileSource

3243ab7

refactor: add Sized constraint to with_schema_adapter_factory method …

84f0991

…in FileSource trait

refactor: update with_schema_adapter_factory method to indicate defau…

58dd0d9

…lt unimplemented behavior for unsupported file formats

revert to before add with_schema_adapter to FileSource

3ef15c1

adriangb approved these changes May 22, 2025

View reviewed changes

kosiew added 10 commits May 22, 2025 12:53

refactor: FileSource implement with_schema_adapter_factory

1804498

refactor: add schema_adapter_factory support to CsvSource

f69b80e

refactor: reintroduce From implementation for ParquetSource and add g…

0a5db8b

…eneric From for FileSource

Revert "refactor: reintroduce From implementation for ParquetSource a…

9aeaacc

…nd add generic From for FileSource" This reverts commit 0a5db8b.

refactor: add as_file_source helper function for FileSource conversion

9dc95be

refactor: implement From trait for CsvSource to use as_file_source he…

a56b05c

…lper

refactor: enhance JsonSource with schema adapter factory support and …

8e744f6

…conversion to FileSource

refactor: remove unused ParquetSource import from table.rs

cd65627

refactor: add schema adapter factory support to ArrowSource

3aab6ec

refactor: update TestSource to support schema adapter factory

248e276

kosiew mentioned this pull request May 22, 2025

Implement schema adapter support for FileSource and add integration tests #16148

Merged

This was referenced Jun 6, 2025

Inconsistent schema coercion in ListingTableConfig #16270

Closed

Add nested struct casting support and integrate into SchemaAdapter #16371

Merged

kosiew mentioned this pull request Jun 27, 2025

Add SchemaAdapterFactory Support for ListingTable with Schema Evolution and Mapping #16583

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance Schema adapter to accommodate evolving struct #15295

Enhance Schema adapter to accommodate evolving struct #15295

Uh oh!

kosiew commented Mar 18, 2025 •

edited

Loading

Uh oh!

kosiew Mar 18, 2025 •

edited

Loading

Uh oh!

TheBuilderJR commented Mar 20, 2025

Uh oh!

kosiew commented Mar 21, 2025 •

edited

Loading

Uh oh!

TheBuilderJR commented Mar 23, 2025

Uh oh!

kosiew commented May 20, 2025 •

edited

Loading

Uh oh!

TheBuilderJR commented May 20, 2025

Uh oh!

alamb commented May 21, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb May 21, 2025

Uh oh!

alamb May 21, 2025

Uh oh!

adriangb May 21, 2025

Uh oh!

alamb May 21, 2025

Uh oh!

adriangb left a comment

Uh oh!

adriangb May 21, 2025

Uh oh!

kosiew commented May 22, 2025

Uh oh!

Uh oh!

Enhance Schema adapter to accommodate evolving struct #15295

Are you sure you want to change the base?

Enhance Schema adapter to accommodate evolving struct #15295

Uh oh!

Conversation

kosiew commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kosiew Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBuilderJR commented Mar 20, 2025

Uh oh!

kosiew commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBuilderJR commented Mar 23, 2025

Uh oh!

kosiew commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR 1: Extract and test core SchemaAdapter helpers

PR 2: Add schema_adapter_factory to ListingTableConfig (with tests)

PR 3: Hook FileSourceExt & Parquet preservation (with tests)

PR 4: Nested-struct SchemaAdapter implementation & re-exports (with tests)

Uh oh!

TheBuilderJR commented May 20, 2025

Uh oh!

alamb commented May 21, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb May 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb May 21, 2025

Choose a reason for hiding this comment

Uh oh!

adriangb May 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb May 21, 2025

Choose a reason for hiding this comment

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

adriangb May 21, 2025

Choose a reason for hiding this comment

Uh oh!

kosiew commented May 22, 2025

Uh oh!

Uh oh!

kosiew commented Mar 18, 2025 •

edited

Loading

kosiew Mar 18, 2025 •

edited

Loading

kosiew commented Mar 21, 2025 •

edited

Loading

kosiew commented May 20, 2025 •

edited

Loading