Releases · tidyverse/duckplyr

db_path <- tempfile(fileext = ".duckdb")
con <- DBI::dbConnect(duckdb::duckdb(), db_path)
DBI::dbWriteTable(con, "my_table", data.frame(x = 1:5, y = letters[1:5]))
DBI::dbDisconnect(con)

read_tbl_duckdb(db_path, "my_table") |>
  filter(x > 2)

unlink(db_path)

first(), last(), nth(), round(), and n() inside mutate(.by = ...) are now translated directly to DuckDB (#626, #854).

duckdb_tibble(g = c("a", "a", "b", "b", "b"), x = c(10, 20, 30, 40, 50), .prudence = "stingy") |>
  summarise(.by = g, first_x = first(x), last_x = last(x), second_x = nth(x, 2))

duckdb_tibble(g = c("a", "a", "b", "b"), x = 1:4, .prudence = "stingy") |>
  mutate(count = n(), .by = g)

compute_parquet() and compute_csv() now accept an options argument to pass format-specific settings to the underlying DuckDB operation and also applies them when reading back the data (#729, #821).

df <- duckdb_tibble(x = 1:3, y = c("a", "b", "c"), .prudence = "stingy")
path <- tempfile(fileext = ".parquet")
compute_parquet(df, path, options = list(compression = "zstd"))

compute_parquet() and compute_csv() are now generic S3 functions, making it easier to add methods for custom classes (#746, #818).

Functions with named arguments are now translated to DuckDB (#822).

duckdb_tibble(x = c(1.23, 4.56, 7.89), .prudence = "stingy") |>
  mutate(y = round(x, digits = 1L))

transmute() can now reference new variables created within the same call (#796, #819).

duckdb_tibble(x = 1:3, .prudence = "stingy") |>
  transmute(y = x * 2, z = y + 10)

Add experimental translation for filter_out() (#869, #870).

duckdb_tibble(x = 1:3, .prudence = "stingy") |>
  filter_out(x > 2)

Documentation

Document row.names incompatibility (#603, #825).
Add examples for specifying CSV column types by name (#775, #820).
Add superseded lifecycle badge to transmute() documentation (#364, #824).
Add blog post to pkgdown config (#612, #827).
Review contributing guide (#657).

Chore

Align internal tests with dplyr 1.2.0 (#863).
Migrate from deprecated qs to qs2 (#846, #847).
Format code with air.

Assets 2

06 Nov 02:19

github-actions

v1.1.3

22886fe

duckplyr 1.1.3

Features

read_file_duckdb() only wraps path into a list if the length is not equal to one, to support read_stat().

Continuous integration

Avoid example failing in R 4.2 and older.

Documentation

Add "Supported by Posit" badge.

Assets 2

20 Sep 01:59

github-actions

v1.1.2

3bb5ddf

duckplyr 1.1.2

Features

Fully support dd::...() syntax (#795).
Threshold for prudence = "thrifty" is reduced to 1000 cells when the data comes from a remote data source.
Support named arguments for dd::...() functions.

Performance

Generate a more balanced expresion when translating %in% to avoid performance problems in duckdb v1.4.0.

Assets 2

01 Aug 02:59

github-actions

v1.1.1

b9cb799

duckplyr 1.1.1

Chore

Fix CRAN failure with _R_CHECK_THINGS_IN_OTHER_DIRS_=true.

Assets 2

10 May 02:14

github-actions

v1.1.0

0d11c63

duckplyr 1.1.0

This release improves compatibility with dbplyr and DuckDB.
See vignette("duckdb") for details.

Features

Pass functions prefixed with dd$ directly to DuckDB, e.g., dd$ROW() will be translated as DuckDB's ROW() function (#658).
New as_tbl() to convert to a dbplyr tbl object (#634, #685).
Register Ark methods for Positron's "Variables" pane (@DavisVaughan, #661, #678). DuckDB tibbles are no longer displayed as data frames in the "Variables" pane due to a limitation in Positron. Use collect() to convert them to data frames if you rely on the viewer functionality.
Translate n_distinct() as macro with support for na.rm = TRUE (@joakimlinde, #572, #655).
Translate coalesce().
compute() does not have a fallback, failures are reported to the client (#637).
Implement slice_head() (#640).

Bug fixes

Set functions like union() no longer trigger materialization (#654, #692).
Joins no longer materialize the input data when the package is used with methods_overwrite() or library(duckplyr) (#641).
Correct formatting for controlled fallbacks with Sys.setenv(DUCKPLYR_FALLBACK_INFO = TRUE).

Chore

Bump duckdb and pillar dependencies.
Use roxyglobals from CRAN rather than GitHub (@andreranza, #659).
Bring tools and patch up to date (@joakimlinde, #647).
Internal rel_to_df() needs prudence argument (#644).
Fix sync scripts and add reproducible code (#639).
Check loadability of extensions in test (#636).

Documentation

Document slice_head() as supported.
Add Posit's ROR ID (#592).
Add vignette("duckdb") (#690).
Add experimental badge.
Verbose conflict_prefer() (#667, #684).
Typos + clarification edits to "large" vignette (@mine-cetinkaya-rundel, #665).

Testing

Skip tests using grep() or sub() on CRAN.

Contributors

mine-cetinkaya-rundel, DavisVaughan, and 2 other contributors

Assets 2

01 Mar 02:17

github-actions

v1.0.1

02a2f92

duckplyr 1.0.1

Bug fixes

Check if extensions can be loaded before running examples and vignettes (#620).
Show source of error if data frame cannot be converted to duck frame (#614).
Correct formatting for controlled fallbacks with Sys.setenv(DUCKPLYR_FALLBACK_INFO = TRUE)

Chore

Require duckdb >= 1.2.0 (#619).
Break this version with duckdb 2.0.0 (#623).

Documentation

Separate ?compute_parquet and ?compute_csv (#610, #622).
Italicize book title in README (@wibeasley, #607).
Fix typo in filter(.by = ...) error message (@maelle, #611).
Fix link in documentation (#600, #601).

Contributors

wibeasley and maelle

Assets 2

09 Feb 02:05

github-actions

v1.0.0

097102f

duckplyr 1.0.0

Features

Large data

Improved support for handling large data from files and S3: ingestion with read_parquet_duckdb() and others, and materialization with as_duckdb_tibble(), compute.duckplyr_df() and compute_file(). See vignette("large") for details.
Control automatic materialization of duckplyr frames with the new prudence argument to as_duckdb_tibble(), duckdb_tibble(), compute.duckplyr_df() and compute_file(). See vignette("prudence") for details.

New functions

read_csv_duckdb() and others, deprecating duckplyr_df_from_csv() and df_from_csv() (#210, #396, #459).
read_sql_duckdb() (experimental) to run SQL queries against the default DuckDB connection and return the result as a duckplyr frame (duckdb/duckdb-r#32, #397).
db_exec() to execute configuration queries against the default duckdb connection (#39, #165, #227, #404, #459).
duckdb_tibble() (#382, #457).
as_duckdb_tibble(), replaces as_duckplyr_tibble() and as_duckplyr_df() (#383, #457) and supports dbplyr connections to a duckdb database (#86, #211, #226).
compute_parquet() and compute_csv(), implement compute.duckplyr_df() (#409, #430).
fallback_config() to create a configuration file for the settings that do not affect behavior (#216, #426).
is_duckdb_tibble(), deprecates is_duckplyr_df() (#391, #392).
last_rel() to retrieve the last relation object used in materialization (#209, #375).
Add "prudent_duckplyr_df" class that stops automatic materialization and requires collect() (#381, #390).

Translations

Partial support for across() in mutate() and summarise() (#296, #306, #318, @lionel-, @DavisVaughan).
Implement na.rm handling for sum(), min(), max(), any() and all(), with fallback for window functions (#205, #566).
Add support for sub() and gsub() (@toppyy, #420).
Handle dplyr::desc() (#550).
Avoid forwarding is.na() to is.nan() to support non-numeric data, avoid checking roundtrip for timestamp data (#482).
Correctly handle missing values in if_else().
Limit number of items that can be handled with %in% (#319).
duckdb_tibble() checks if columns can be represented in DuckDB (#537).
Fall back to dplyr when passing multiple with joins (#323).

Error messages

Improve fallback error message by explicitly materializing (#432, #456).
Point to the native CSV reader if encountering data frames read with readr (#127, #469).
Improve as_duckdb_tibble() error message for invalid x (@maelle, #339).

Behavior

Depend on dplyr instead of reexporting all generics (#405). Nothing changes for users in scripts. When using duckplyr in a package, you now also need to import dplyr.
Fallback logging is now on by default, can be disabled with configuration (#422).
The default DuckDB connection is now based on a file, the location defaults to a subdirectory of tempdir() and can be controlled with the DUCKPLYR_TEMP_DIR environment variable (#439, #448, #561).
collect() returns a tibble (#438, #447).
explain() returns the input, invisibly (#331).

Bug fixes

Compute ptype only for join columns in a safe way without materialization, not for the entire data frame (#289).
Internal expr_scrub() (used for telemetry) can handle function-definitions (@toppyy, #268, #271).
Harden telemetry code against invalid arguments (#321).

Documentation

New articles: vignette("large"), vignette("prudence"), vignette("fallback"), vignette("limits"), vignette("developers"), vignette("telemetry") (#207, #504).
New flights_df() used instead of palmerpenguins::penguins (#408).
Move to the tidyverse GitHub organization, new repository URL https://github.com/tidyverse/duckplyr/ (#225).
Avoid base pipe in examples for compatibility with R 4.0.0 (#463, #466).

Performance

Comparison expressions are translated in a way that allows them to be pushed down to Parquet (@toppyy, #270).
Printing a duckplyr frame no longer materializes (#255, #378).
Prefer vctrs::new_data_frame() over tibble() (#500).

Contributors

lionel-, maelle, and 2 other contributors

Assets 2

14 Jul 00:49

github-actions

v0.4.1

8466ce0

duckplyr 0.4.1

Features

df_from_file() and related functions support multiple files (#194, #195), show a clear error message for non-string path arguments (#182), and create a tibble by default (#177).
New as_duckplyr_tibble() to convert a data frame to a duckplyr tibble (#177).
Support descending sort for character and other non-numeric data (@toppyy, #92, #175).
Avoid setting memory limit (#193).
Check compatibility of join columns (#168, #185).
Explicitly list supported functions, add contributing guide, add analysis scripts for GitHub activity data (#179).

Documentation

Add contributing guide (#179).
Show a startup message at package load if telemetry is not configured (#188, #198).
?df_from_file shows how to read multiple files (#181, #186) and how to specify CSV column types (#140, #189), and is shown correctly in reference index (#173, #190).
Discuss dbplyr in README (#145, #191).
Add analysis scripts for GitHub activity data (#179).

Contributors

toppyy

Assets 2

23 May 00:45

github-actions

v0.4.0

8f0a941

duckplyr 0.4.0

Features

Use built-in rfuns extension to implement equality and inequality operators, improve translation for as.integer(), NA and %in% (#83, #154, #148, #155, #159, #160).
Reexport non-deprecated dplyr functions (#144, #163).
library(duckplyr) calls methods_overwrite() (#164).
Only allow constant patterns in grepl().
Explicitly reject calls with named arguments for now.
Reduce default memory limit to 1 GB.

Bug fixes

Stricter type checks in the set operations intersect(), setdiff(), symdiff(), union(), and union_all() (#169).
Distinguish between constant NA and those used in an expression (#157).
head(-1) forwards to the default implementation (#131, #156).
Fix cli syntax for internal error message (#151).
More careful detection of row names in data frame.
Always check roundtrip for timestamp columns.
left_join() and other join functions call auto_copy().
Only reset expression depth if it has been set before.
Require fallback if the result contains duplicate column names when ignoring case.
row_number() returns integer.
is.na(NaN) is TRUE.
summarise(count = n(), count = n()) creates only one column named count.
Correct wording in instructions for enabling fallback logging (@TimTaylor, #141).

Chore

Remove styler dependency (#137, #138).
Avoid error from stats collection.

Documentation

Mention wildcards to read multiple files in ?df_from_file (@andreranza, #133, #134).

Testing

Reenable tests that now run successfully (#166).
Synchronize tests (#153).
Test that vec_ptype() does not materialize (#149).
Improve telemetry tests.
Promote equality checks to expect_identical() to capture differences between doubles and integers.

Contributors

TimTaylor and andreranza

Assets 2

Releases: tidyverse/duckplyr

duckplyr 1.2.1

Bug fixes

Continuous integration

Contributors

Uh oh!

duckplyr 1.2.0

Features

Documentation

Chore

Uh oh!

duckplyr 1.1.3

Features

Continuous integration

Documentation

Uh oh!

duckplyr 1.1.2

Features

Performance

Uh oh!

duckplyr 1.1.1

Chore

Uh oh!

duckplyr 1.1.0

Features

Bug fixes

Chore

Documentation

Testing

Contributors

Uh oh!

duckplyr 1.0.1

Bug fixes

Chore

Documentation

Contributors

Uh oh!

duckplyr 1.0.0

Features

Large data

New functions

Translations

Error messages

Behavior

Bug fixes

Documentation

Performance

Contributors

Uh oh!

duckplyr 0.4.1

Features

Documentation

Contributors

Uh oh!

duckplyr 0.4.0

Features

Bug fixes

Chore

Documentation

Testing

Contributors

Uh oh!