Skip to content

Releases: dlt-hub/dlt

1.24.0

19 Mar 11:59
fa36355

Choose a tag to compare

dlt 1.24.0 Release Notes

Breaking Changes

  1. Custom resource metrics now stored as tables (#3718 @rudolfix) — Incremental metrics in the trace are now represented in table format. This changes the location and structure of incremental metrics in the trace object.

Highlights

  • Insert-only merge strategy (#3741 @rudolfix, based on #3372 by @OnAzart) — New insert-only merge strategy that performs idempotent, key-based appending: inserts records whose primary key doesn't exist in the destination while silently skipping duplicates. No updates or deletes. Supported across all SQL destinations, Delta Lake, and Iceberg.
  • Parallelize all sources in Airflow (#3652 @JustinSobayo) — In parallel and parallel-isolated decompose modes, all source components now fan out concurrently from a shared start node. Previously the first source had to complete before others could begin, adding unnecessary wall-clock time. This release also adds basic Airflow 3 support with smoke tests.
  • ClickHouse ReplacingMergeTree support (#3366 @prevostc) — New replacing_merge_tree table engine type for ClickHouse that enables native deduplication and soft deletes via dedup_sort and hard_delete column hints.
  • Custom resource metrics as tables (#3718 @rudolfix) — Resources can now emit custom metrics that are stored as tables in the trace, enabling richer observability for pipelines.

Core Library

  • Insert-only merge strategy (#3741 @rudolfix, based on #3372 by @OnAzart) — See Highlights.
  • ClickHouse ReplacingMergeTree support (#3366 @prevostc) — See Highlights.
  • Parallelize all sources in Airflow (#3652 @JustinSobayo) — See Highlights.
  • Custom resource metrics as tables (#3718 @rudolfix) — See Highlights.
  • Configurable Arrow table concatenation promote_options (#3701 @AyushPatel101) — arrow_concat_promote_options can now be set to "default" or "permissive" instead of the hardcoded "none", enabling automatic type promotion when yielding multiple Arrow tables with slightly different inferred types.
  • Fix: CLI info/show fails on custom destinations (#3676 @anuunchin) — dlt pipeline info/show no longer crashes with UnknownDestinationModule on pipelines using @dlt.destination.
  • Fix: Primary key assignment for incremental resources (#3679 @shnhdan) — Passing primary_key=() to Incremental to disable deduplication is no longer silently overwritten by the resource's own primary key.
  • Fix: MotherDuck missing catalog validation (#3723 @YuF-9468) — Connection strings that omit the catalog/database name (e.g. bare md:) now raise a clear configuration error instead of a confusing connection failure.
  • Fix: BigQuery infinite loop on internal error (#3732 @aditypan) — BigQuery jobs that encounter an internal error no longer cause an infinite retry loop.
  • Fix: SCD2 column order mismatch in SQLAlchemy destinations (#3733 @anuunchin) — SCD2 validity column insert jobs now match the column order of existing tables in SQLAlchemy destinations.
  • Fix: Timezone mapping in SQL timestamp datatype (#3735 @aditypan) — Timezone is now correctly set for timestamp/datetime column datatypes.

Docs

  • Realistic closure-based data masking example (#3617 @veeceey) — Replaced the hardcoded example with a reusable mask_columns() function supporting all sql_database backends.
  • Redirects for removed pages (#3688 @djudjuu)
  • AI workbench license info (#3729 @lis365b)
  • Minor doc fixes (#3734 @anuunchin)

Chores

New Contributors

1.23.0

06 Mar 14:47
b981294

Choose a tag to compare

Breaking Changes

  1. Streamlit dashboard removed (#3674 @rudolfix) — The legacy Streamlit-based pipeline dashboard (dlt pipeline show) has been removed. It was a dead code for a long time.

  2. New sources.<name>.<key> configuration lookup path (#3626 @rudolfix) — Source configuration now supports a compact layout. When a source's section name differs from its resource/source name, dlt now also looks up sources.<name>.<key> in addition to the full sources.<section>.<name>.<key> path. For example, for a source registered under section chess_com with name chess:

    # Before (still works): full qualified path
    [sources.chess_com.chess]
    api_key = "secret"
    
    # New (also works now): compact path using just the source name
    [sources.chess]
    api_key = "secret"
    
    # Credentials follow the same pattern:
    # Full:    sources.chess_com.chess.credentials.api_key
    # Compact: sources.chess.credentials.api_key

    This is breaking if you previously had values at sources.<name> that were unrelated to this source — they will now be resolved where they were previously ignored.

Highlights

  • AI Workbench (#3674 @rudolfix) — New dlt ai CLI command group that turns dlt workspaces into AI-assisted development environments. Includes toolkit system for installing curated skill/rule bundles, pluggable MCP server architecture with composable features (pipeline, workspace, toolkit, secrets), and multi-agent support (Claude Code, Cursor, Codex).

  • Relational normalizer optimization (#3626 @rudolfix) — Major performance improvements to JSON data normalization and schema evolution: 5x faster on flat data, ~2x on nested REST API data, ~1.8x on wide nested data. ISO timestamp parsing improved 2-3x by removing timezone conversions.

  • Iceberg table properties (#3699 @rudolfix) — Adds support for setting Iceberg table and namespace properties via the adapter and configuration.

Core Library

  • Fetch Databricks compute credentials (#3667 @aditypan) — Automatically fetches credentials from Databricks shared/job compute when running dlt in a notebook, fixing the issue of defaulting to SQL warehouse connections.
  • Add override_data_path option to DuckLake ATTACH (#3709 @udus122) — New override_data_path configuration option that appends OVERRIDE_DATA_PATH true to the ATTACH statement, allowing the current DATA_PATH to override the path stored in catalog metadata.
  • Add missing parameters in Paginator Configs (#3658 @aditypan) — Adds missing parameters to PageNumberPaginatorConfig, OffsetPaginatorConfig, and JSONResponseCursorPaginatorConfig.
  • Fix: path traversal in FileStorage (CWE-22) (#3678 @rudolfix) — Replaced os.path.commonprefix() with os.path.commonpath() in FileStorage.is_path_in_storage() to correctly validate path containment using path segments instead of characters.
  • Fix: monotonic wall clock (#3695 @rudolfix) — Improves elapsed time calculation across several places, ensuring load IDs are always monotonic even on systems with clock jitter.
  • Fix: threading issues causing potential locking (#3698 @rudolfix) — Fixes async pool shutdown in extract (now closed with timeout) and corrects synchronization sections in various tests.
  • Fix: dev mode survives attach and reset (#3662 @rudolfix) — Saves dev_mode flag in pipeline local state so it persists across dlt.attach() calls. Detects dev→non-dev transitions and resets working folder cleanly.
  • Fix: respect custom Hugging Face endpoint for dataset card operations (#3696 @jorritsandbrink) — Fixes custom endpoint support broken by subset/dataset card feature by temporarily setting HF_ENDPOINT env var for card operations.
  • Fix: explicit dataset name should be authoritative (#3700 @anuunchin) — Makes the dataset argument passed to the pipeline authoritative, always setting pipeline dataset when restoring state.
  • Fix: start_out_of_range flag with range_start="open" (#3708 @AyushPatel101) — Correctly sets start_out_of_range=True when a row's cursor value equals start_value with range_start="open", fixing delayed can_close() in descending-order pipelines.
  • Fix: LanceDB SQL view creation with dataset_name=None (#3710 @Travior) — Handles the case where dataset_name is None in LanceDBSqlClient.create_view, preventing None prefix in view names.

Docs

  • Fix docstring typo in BigQuery factory (#3705 @dnskr)

New Contributors

1.22.2

01 Mar 18:44
259a36d

Choose a tag to compare

Highlights

  • Hugging Face filesystem destination (#3669 @jorritsandbrink) — Adds hf protocol support to the filesystem destination, enabling direct loading to Hugging Face datasets. Closes #1227.
  • Composable marimo widgets (#3613 @zilto) — Introduces composable widgets built with marimo for interactive pipeline exploration. Widgets can accept inputs and produce outputs, building on earlier read-only widgets with updated schema viewer, load package viewer, and a new pipeline selector.

Core Library

  • Hugging Face subsets (#3689 @jorritsandbrink) — Adds dataset cards with metadata to configure a subset for each table, enabling the Hugging Face dataset viewer to display tables properly.
  • Hugging Face filesystem destination (#3669 @jorritsandbrink) — See Highlights.
  • Composable marimo widgets (#3613 @zilto) — See Highlights.
  • Dashboard UX improvements (#3675 @sh-rp) — Collapsed sections show title and subtitle on a single line to reduce vertical space, shortened long subtitles, and improved layout for narrow viewports.

Docs

  • Hugging Face destination documentation (#3687 @AstrakhantsevaAA) — Rewrote HF destination docs and moved content to a dedicated page.
  • Remove outdated Motherduck troubleshooting (#3683 @elviskahoro) — Removed read-only database troubleshooting section for deprecated DuckDB versions.
  • Update DuckLake docs for v1.4 (#3682 @elviskahoro) — Updated DuckLake documentation to reflect Motherduck as catalog database and corrected catalog URI format.

New Contributors

1.22.1

23 Feb 20:59
75efe7d

Choose a tag to compare

dlt 1.22.1 Release Notes

Core Library

  • feat(workspace): add default exclude patterns for file selector (#3661 @canassa) — WorkspaceFileSelector now ships with DEFAULT_EXCLUDES (.git/, .venv/, __pycache__/, node_modules/, etc.) so well-known non-deployable paths are always excluded, even without a .gitignore.
  • feat(workspace): add ignore_file_found attribute to WorkspaceFileSelector (#3663 @canassa) — Consumers can now check whether the configured ignore file (e.g. .gitignore) was actually found.
  • Dashboard cleanup and refactor (#3660 @sh-rp) — Broke up monolithic utils.py and dlt_dashboard.py into focused modules with simplified UI across all sections.
  • Sets default MCP transport to http-stream (#3624 @rudolfix) — Swaps sse for http-stream transport for built-in MCP servers and annotates pipeline trace schema.
  • Fixes data inspection tools (#3664 @rudolfix) — Allows incomplete columns in schema converters, attaches pipeline in every command, adds new allowed layout for sources.<name>.api_key.
  • Fix: Mermaid doesn't handle incomplete columns (#3659 @anuunchin) — .to_mermaid() now handles columns missing the data_type field instead of crashing.
  • Fix: ClickHouse makes reads sequential by default (#3651 @rudolfix) — Enforces select_sequential_consistency to fix flaky tests caused by ClickHouse's eventual consistency model.
  • Fix: data quality checks component (#3647 @zilto) — Fixes silently broken data quality checks code caused by upstream dlthub changes.
  • Fix: autouse test storage dir not empty (#3648 @tetelio) — Fixes intermittent CI failure in read-only file deletion by aligning with shutil docs.
  • Fix: dashboard tests (#3672 @sh-rp) — Fixes a few broken tests in the dashboard.

Docs

  • Run streamlit/MCPs in runtime (#3510 @tetelio) — Adds documentation for running MCP servers and Streamlit apps in the runtime.
  • Release highlights 1.18 & 1.19 (#3654 @AstrakhantsevaAA)
  • Add dlthub metrics section; update checks (#3641 @zilto)

Chores

  • Apply all docs/ linting in one make command (#3666 @anuunchin) — Introduces an overarching lint target in the docs Makefile. Resolves #3642.
  • Tests newest dbt on dbt runner, enables fabric (#3656 @rudolfix)
  • Adjust scaffold api and vibe source tests (#3649 @djudjuu) — Tests no longer expect source.md file.
  • Add install command for make test-load-local-p (#3645 @tetelio) — Convenience make install target for local load tests on duckdb and filesystem.
  • Remove license autofixture, add selective license application (#3646 @rudolfix)
  • Setup Claude and Continue agents (#3622 @rudolfix) — Adds comprehensive AI assistant configuration for Claude Code and Continue IDE.

New Contributors

1.22.0

17 Feb 16:55
15c7b46

Choose a tag to compare

Breaking Changes

  1. Pydantic v1 support removed (#3572 @anuunchin) — All Pydantic v1 compatibility code has been removed. The codebase now requires Pydantic v2 only.
  2. data_type contract semantic change (#3572 @anuunchin @rudolfix) — The data_type contract now applies to full data type (ie. precision, nullability), not only to variant columns (data type change). Users with data_type: freeze who relied on changing nullable/precision/scale on existing columns will now be blocked.
  3. merge_columns now removes compound properties (#3431 @anuunchin) — Previously merge_columns was purely additive, which caused compound properties like merge_key to be incorrectly replaced rather than properly merged. The function now correctly removes compound properties that should be removed.

Highlights

  • Pydantic data validation overhaul (#3572 @anuunchin @rudolfix ) — Major rework of Pydantic support: discriminated union RootModel types (validation of event streams with various event types), schema contracts properly separate resource-defined vs data-derived hints, Pydantic model columns bypass contract checks when authoritative. Supports Pydantic models on arrow and model items with full schema contract enforcement. Prepares for Pydantic v3.
  • Snowflake atomic table swap for replace (#3540 @Travior) — Uses ALTER TABLE ... SWAP for staging-optimized replace strategy on Snowflake, eliminating table downtime during data replacement.
  • Custom backends for sql_database (#3595 @rudolfix) — Register custom TableLoader implementations as named backends. ConnectorX backend ported as PoC; ADBC and paginated loader implemented as test cases.
  • SQLAlchemy destination dialect customization (#3600 @rudolfix) — Customize type mapping, adjust SQLAlchemy table schemas before creation, and override destination capabilities per-dialect.
  • llms.txt and Markdown docs generation (#3635 @rudolfix) — Generates llms.txt index and Markdown versions of docs pages with a "View Markdown" navigation option, making the docs LLM-friendly.

Core Library

  • rest_api: parallelized dependent resources (#3574 @Shadesfear) — Add parallelized flag to dependent resources (transformers) so child resource fetches run concurrently.
  • dlt.Relation: filter by load_id (#3547 @zilto) — Filter dataset relations by load ID (experimental).
  • dlt.Relation: flatten logic and improve typing (#3578 @zilto) — Remove dynamic methods; explicit return types for .df(), .arrow(), etc.
  • Source preprocessors on SourceFactory (#3636 @rudolfix) — Add preprocessor hooks to dlt.source factory for modifying source instances.
  • engine_kwargs for sql_database/sql_table sources (#3414 @tetelio) — Pass SQLAlchemy engine arguments directly to create_engine() for sources.
  • DECFLOAT support for Snowflake (#3513 @ivasio) — Properly handles DECFLOAT columns via the SQLAlchemy backend.
  • Athena query_result_bucket now optional (#3566 @arel) — Omit or set to None when using Athena's managed results bucket.
  • ClickHouse extra_credentials for S3 (#2888 @warje) — Adds extra_credentials config for role-based S3 authentication.
  • Fix: Snowflake sort column escaping (#3594 @rudolfix)
  • Fix: BigQuery partition clause on ALTER TABLE (#3571 @kien-truong)
  • Fix: Redshift schema existence check (#3570 @timH6502)
  • Fix: _dlt_load_id written as dict on MSSQL + ADBC (#3584 @rudolfix)
  • Fix: ClickHouse CREATE OR REPLACE for merge temp tables (#3589 @rudolfix)
  • Fix: read_csv_duckdb respects filename=True (#3606 @karlanka)
  • Fix: column order mismatch in sql_database (#3638 @rudolfix)
  • Fix: consistent UUID handling as strings (#3599 @rudolfix)
  • Fix: managed SQLAlchemy engine ref counting (#3601 @rudolfix)
  • Fix: suppress psutil warning during dlt init (#3615 @rudolfix)
  • Fix: query lifecycle cleanup (#3627 @rudolfix)
  • Fix: Pydantic model synthesis bugs (#3605 @rudolfix)
  • Detect AI agent execution context (#3628 @rudolfix)
  • Upgrade ibis-framework, remove sqlglot constraint (#3621 @Travior)
  • Vibe sources: use new scaffold API (#3512 @djudjuu)
  • Update GitHub API pipeline template (#3603 @ShreyasGS)

Docs

Chores

New Contributors

1.21.0

20 Jan 12:11
ab0459a

Choose a tag to compare

This release adds several interesting improvements and many bugfixes. Lancedb destination now uses duckdb extension to let you query lance tables with SQL, ibis or sqlglot via our standard .dataset() interface. We introduced several iceberg-relates improvements (catalog support, s3 tables for Athena, advanced partitioning). There's also new fabric destination and additional options in `clickhouse_adapter. Finally: we have test environment for Oracle and we stared to fix Oracle related bugs.

Core Library

Bugfixes

  • pyarrow: respect resource hints before extract by @djudjuu in #3436
  • Fix: 3490 better error message in schema contract application by @anuunchin in #3498
  • Fix state file being ignored when pipline_name includes FILENAME_SEPA… by @Travior in #3448
  • Fix/3464 sync error results in success label (workspace dashboard) by @anuunchin in #3492
  • Fix/3376 load state changes in load package that changed it by @anuunchin in #3521
  • 3353 normalize start method spawn seems to ignore environment variables by @djudjuu in #3463
  • Fix: Connectorx arrow_stream timestamp conversion issue by @louiewhw and @anuunchin in #3528
  • fix/3141 - process Oracle "table not found" exception by @ivasio in #3509
  • fix: 3514 mermaid reference label by @zilto in #3515
  • Fix cluster hint overriding partition hint on bigquery by @Travior in #3497
  • Fix: Special handling of numeric type for oracle by @ivasio in #3144
  • Fix/3159 pydantic model incorrect serialization by @tetelio in #3421

Chores

Docs

New Contributors

Full Changelog: 1.20.0...1.21.0

1.20.0

09 Dec 23:13
a7c3571

Choose a tag to compare

Core Library

  • feat: implement ConfigurationFileSelector by @ivasio in #3418
  • Fix: reset config in PluggableRunContext.reload_providers by @ivasio in #3409
  • add runtime CLI configs in WorkspaceRuntimeConfiguration by @ivasio in #3424
  • implements run artifacts sync to a bucket using filesystem by @ivasio in #3339
  • Fix: extensive .gitignore for dlt init by @anuunchin in #3437
  • Fix: Invisible sections are receiving border and background color in dashboard by @anuunchin in #3439
  • implements cancellation of normalize jobs by @rudolfix in #3444
  • information on pending and partially loaded packages when pipeline fails @rudolfix in #3444
  • Fix race condition in LimitItem by @burnash in #3442
  • Add offset/limit body_path fields to OffsetPaginatorConfig by @kinghuang in #3260
  • [fix/3358] add pagination stopping to JSONResponseCursorPaginator by @segetsy in #3374
  • (feat) small dashboard improvements by @rudolfix in #3450

Chores

  • Skip doc examples requiring secrets on fork PRs by @burnash in #3438

Docs

New Contributors

Full Changelog: 1.19.1...1.20.0

1.19.1

02 Dec 22:02
10cd908

Choose a tag to compare

Bugfixes

Full Changelog: 1.19.0...1.19.1

1.19.0

01 Dec 15:52
7132f0b

Choose a tag to compare

Core Library

  • Feat: support return_type = arrow_stream for connectorx backend by @ivasio in #3218
  • Feat: visual pipeline run section in dashboard by @anuunchin in #3250
  • ingests parquet into mssql, mysql and sqlite via ADBC by @rudolfix in #3333
  • fix/3165 Athena LakeFormation permissions are required even tho Lakeformation is not used by @alkaline-0 in #3271
  • feat: Schema.to_mermaid() by @zilto in #3364
  • fix/3190 Fixed the persistence issue of boundary timestamp after removing it #3367 by @alkaline-0 in #3378
  • feat: snowflake clustering key modifications by @jorritsandbrink in #3365
  • fixes athena refresh mode (iceberg data incorrectly dropped) by @rudolfix in #3313
  • override local marimo theme for dashboard, fix to 'light' by @djudjuu in #3337
  • (fix) 3346 fix trace loading: ignore trace if cannot be unpickled by @rudolfix in #3354
  • Reinitialize packages after exit() is called by @JayJai04 in #3300
  • feat: updated scaffolding template by @zilto in #3275
  • fix: dashboard no longer crashes on broken home cell by @djudjuu in #3348
  • (fix) use sparse checkout for dlt init dlthub by @rudolfix in #3356
  • fix: minor typos and redundant variable by @tahamuzammil100 in #3314
  • feat/3198 profile selection in Dashboard if enabled in workspace by @alkaline-0 in #3295
  • fix: make with_table_name and other functions available through `dlt.pip… by @hello-world-bfree in #3318
  • Redshift feature: Include STS session token in COPY CREDENTIALS. by @timH6502 in #3307
  • Fix: The child table column remains in the schema as a partial column with seen-null-first=True by @anuunchin in #3131
  • Fix: Uncalled source loses resource-level hints in pipeline.run() by @anuunchin in #3369
  • (fix) does not overwrite local file context in destination factory by @rudolfix in #3398
  • (fix) 3351 fixes default type var to allow running with old typing_extensions (ie. old Databricks clusters) by @rudolfix in #3373
  • Fix: pipeline_drop.init() got an unexpected keyword argument 'no_pwd' by @anuunchin in #3386
  • sets ducklake fingerprint to storage fingerprint by @rudolfix in #3388
  • Fix: Section backrgound colors and top margins in dashboard by @anuunchin in #3393

Docs

Chores

  • Enable CI run for the runtime branch by @sh-rp in #3317
  • Chore: Update docs npm dependencies and clean up docs build tooling by @sh-rp in #3247
  • fix flaky dashboard tests by @rudolfix in #3370
  • chore: add proper optional typehint to dlt/extract/hints.py module by @luqmansen in #3332

New Contributors

Full Changelog: 1.18.2...1.19.0

1.18.2

03 Nov 16:04
801bcf7

Choose a tag to compare

Core Library

  • resolves "default" limit via ibis options by @rudolfix in #3273
  • clones command repos in global_dir & bumps to 1.18.2 by @rudolfix in #3279

Docs

Full Changelog: 1.18.1...1.18.2