diff --git a/content/blog/duckplyr-1-1-0/index.Rmd b/content/blog/duckplyr-1-1-0/index.Rmd new file mode 100644 index 000000000..d1aee1125 --- /dev/null +++ b/content/blog/duckplyr-1-1-0/index.Rmd @@ -0,0 +1,233 @@ +--- +output: hugodown::hugo_document + +slug: duckplyr-1-1-0 +title: duckplyr fully joins the tidyverse! +date: 2025-06-19 +author: Kirill Müller and Maëlle Salmon +description: > + duckplyr 1.1.0 is on CRAN! + A drop-in replacement for dplyr, powered by DuckDB for speed. + It is the most dplyr-like of dplyr backends. + +photo: + url: https://www.pexels.com/photo/a-mallard-duck-on-water-6918877/ + author: Kiril Gruev + +# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other" +categories: [package] +tags: + - duckplyr + - dplyr + - tidyverse +--- + +```{r include = FALSE} +options( + pillar.min_title_chars = 20, + pillar.max_footer_lines = 7, + pillar.bold = TRUE +) +options(conflicts.policy = list(warn = FALSE)) +library(conflicted) +conflict_prefer("filter", "dplyr", quiet = TRUE) +``` + + +We're well chuffed to announce the release of [duckplyr](https://duckplyr.tidyverse.org) 1.1.0. +This is a dplyr backend powered by [DuckDB](https://duckdb.org/), a fast in-memory analytical database system[^duckdb]. +duckplyr uses the power of DuckDB for impressive performance where it can, and seemlessly falls back to R where it can't. +You can install it from CRAN with: + +[^duckdb]: If you haven't heard of it yet, watch [Hannes Mühleisen's keynote at posit::conf(2024)](https://www.youtube.com/watch?v=GELhdezYmP0&feature=youtu.be). + +```{r, eval = FALSE} +install.packages("duckplyr") +``` + +This article shows how duckplyr can be used instead of dplyr, explain how you can help improve the package, and share a selection of further resources. + +## A drop-in replacement for dplyr + +Imagine you have to wrangle a huge dataset, like this one from the [TPC-H benchmark](https://duckdb.org/2024/04/02/duckplyr.html#benchmark-tpc-h-q1), a famous database benchmarking dataset. + +```{r} +lineitem_tbl <- duckdb:::sql( + "INSTALL tpch; LOAD tpch; CALL dbgen(sf=1); FROM lineitem;" +) +lineitem_tbl <- tibble::as_tibble(lineitem_tbl) +dplyr::glimpse(lineitem_tbl) +``` + +To work with this in duckplyr instead of dplyr, all you need to do is load duckplyr: + +```{r} +library(duckplyr) +``` + +Now we can express the well-known (at least in the database community!) "TPC-H benchmark query 1" in dplyr syntax and execute it in DuckDB via duckplyr. + +```{r} +tpch_dplyr <- function(lineitem) { + lineitem |> + filter(l_shipdate <= !!as.Date("1998-09-02")) |> + summarise( + sum_qty = sum(l_quantity), + sum_base_price = sum(l_extendedprice), + sum_disc_price = sum(l_extendedprice * (1 - l_discount)), + sum_charge = sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)), + avg_qty = mean(l_quantity), + avg_price = mean(l_extendedprice), + avg_disc = mean(l_discount), + count_order = n(), + .by = c(l_returnflag, l_linestatus) + ) |> + arrange(l_returnflag, l_linestatus) +} + +tpch_dplyr(lineitem_tbl) +``` + +Like other dplyr backends such as dtplyr and dbplyr, duckplyr gives you higher performance without learning a different syntax. +Unlike other dplyr backends, duckplyr does not require you to change existing code or learn specific idiosyncrasies. +Not only is the syntax the same, the semantics are too! +If an operation cannot be carried out with DuckDB, it is automatically outsourced to dplyr. +Over time, we expect fewer and fewer fallbacks to dplyr to be needed. + +## How to use duckplyr + +There are two ways to use duckplyr: + +- As above, you can `library(duckplyr)`, and replace all existing dplyr methods. This is safe because duckplyr is guaranteed to give the exactly same the results as dplyr, unlike other backends. + +- Create individual "duck frames" using _conversion functions_ like `duckdplyr::duckdb_tibble()` or `duckdplyr::as_duckdb_tibble()`, or _ingestion functions_ like `duckdplyr::read_csv_duckdb()`. + +Here's an example of the second form: + +```{r} +out <- lineitem_tbl |> + duckplyr::as_duckdb_tibble() |> + tpch_dplyr() + +out +``` + +Note that the resulting object is indistinguishable from a regular tibble, except for the additional class. + +```{r} +typeof(out) +class(out) +out$count_order +``` + +Operations not yet supported by duckplyr are automatically outsourced to dplyr. +For instance, filtering on grouped data is not supported, but it still works thanks to the fallback mechanism. +By default, the fallback is silent, but you can make it visible by setting an environment variable. +This is useful if you want to better understanding what's making your code slow. + +```{r} +Sys.setenv(DUCKPLYR_FALLBACK_INFO = TRUE) + +lineitem_tbl |> + duckplyr::as_duckdb_tibble() |> + filter(l_quantity == max(l_quantity), .by = c(l_returnflag, l_linestatus)) +``` + +You can also directly use DuckDB functions with the `dd$` qualifier. +Functions with this prefix will not be translated at all and passed through directly to DuckDB. +For example, the following code uses DuckDB's internal implementation of [Levenstein distance](https://duckdb.org/docs/stable/sql/functions/text.html#editdist3s1-s2): + +```{r} +tibble(a = "dbplyr", b = "duckplyr") %>% + mutate(c = dd$levenshtein(a, b)) +``` + +See `vignette("duckdb")` for more information on these features. + +If you're working with dbplyr too, you can use `as_tbl()` you to convert a duckplyr tibble to a dbplyr lazy table. +This allows you to seamlessly interact with existing code that might use inline SQL or other dbplyr functionality. +With `as_duckdb_tibble()`, you can convert a dbplyr lazy table to a duckplyr tibble. +Both operations work without intermediate materialization. + +## Benchmark + +duckplyr is often much faster than dplyr. +The comparison below is done in a fresh R session where dplyr is attached but duckplyr is not. + +```{r include = FALSE} +# Undo the effect of library(duckplyr) +methods_restore() +``` + +We use `tpch_dplyr()` as defined above to run the query with dplyr. +The function that runs it with duckplyr only wraps the input data in a duck frame and forwards it to the dplyr function. +The `collect()` at the end is required only for this benchmark to ensure fairness.[^collect] + +[^collect]: If omitted, the results would be unchanged but the measurements would be wrong. The computation would then be triggered by the check. See `vignette("prudence")` for details. + +```{r} +tpch_duckplyr <- function(lineitem) { + lineitem |> + duckplyr::as_duckdb_tibble() |> + tpch_dplyr() |> + collect() +} +``` + +And now we compare the two: + +```{r} +bench::mark( + tpch_dplyr(lineitem_tbl), + tpch_duckplyr(lineitem_tbl), + check = ~ all.equal(.x, .y, tolerance = 1e-10) +) +``` + +In this example, duckplyr is a lot faster than dplyr. +It also appears to use much less memory, but this is misleading: DuckDB manages the memory, not R, so the memory usage is not visible to `bench::mark()`. + +## Out-of-memory data + +As well as improved speed with in-memory datasets, duckplyr makes it easy to work with datasets that are too big to fit in memory. +In this case, you want: + +1. To work with data stored in modern formats designed for large data (e.g. Parquet). +1. To be able to store large intermediate results on disk, keeping them out of memory. +1. Fast computation! + +duckdplyr provides each of these features: + +1. You can read data from disk with functions like `read_parquet_duckdb()`. +1. You can save intermediate results to disk with `compute_parquet()` and `compute_csv()`. +1. duckdplyr takes advantage of DuckDB's query planner which considers your entire pipeline holistically to figure out the most efficient way to get the data you need. + +See `vignette("large")` for a walkthrough and more details. + +## Help us improve duckplyr! + +Our goals for future development of duckplyr include: + +- Enabling users to provide [custom translations](https://github.com/tidyverse/duckplyr/issues/158) of dplyr functionality; +- Making it easier to contribute code to duckplyr; +- Supporting more dplyr and tidyr functionality natively in DuckDB. + +You can help! + +- Please report any issues, especially regarding unknown incompabilities. See `vignette("limits")`. +- Contribute to the codebase after reading duckplyr's [contributing guide](https://duckplyr.tidyverse.org/CONTRIBUTING.html). +- Turn on telemetry to help us hear about the most frequent fallbacks so we can prioritize working on the corresponding missing dplyr translation. See `vignette("telemetry")` and `duckplyr::fallback_sitrep()`. + +## Additional resources + +Eager to learn more about duckplyr -- beside by trying it out yourself? +The duckplyr website features several [articles](https://duckplyr.tidyverse.org/articles/). +Furthermore, the blog post ["duckplyr: dplyr Powered by DuckDB"](https://duckdb.org/2024/04/02/duckplyr.html) by Hannes Mühleisen provides some context on duckplyr including its inner workings, as also seen in a [section](https://blog.r-hub.io/2025/02/13/lazy-meanings/#duckplyr-lazy-evaluation-and-prudence) of the R-hub blog post ["Lazy introduction to laziness in R"](https://blog.r-hub.io/2025/02/13/lazy-meanings/) by Maëlle Salmon, Athanasia Mo Mowinckel and Hannah Frick. + +## Acknowledgements + +A big thanks to all folks who filed issues, created PRs and generally helped to improve duckplyr and its workhorse [duckdb](https://r.duckdb.org/)! + +[@adamschwing](https://github.com/adamschwing), [@alejandrohagan](https://github.com/alejandrohagan), [@andreranza](https://github.com/andreranza), [@apalacio9502](https://github.com/apalacio9502), [@apsteinmetz](https://github.com/apsteinmetz), [@barracuda156](https://github.com/barracuda156), [@beniaminogreen](https://github.com/beniaminogreen), [@bob-rietveld](https://github.com/bob-rietveld), [@brichards920](https://github.com/brichards920), [@cboettig](https://github.com/cboettig), [@davidjayjackson](https://github.com/davidjayjackson), [@DavisVaughan](https://github.com/DavisVaughan), [@Ed2uiz](https://github.com/Ed2uiz), [@eitsupi](https://github.com/eitsupi), [@era127](https://github.com/era127), [@etiennebacher](https://github.com/etiennebacher), [@eutwt](https://github.com/eutwt), [@fmichonneau](https://github.com/fmichonneau), [@hadley](https://github.com/hadley), [@hannes](https://github.com/hannes), [@hawkfish](https://github.com/hawkfish), [@IndrajeetPatil](https://github.com/IndrajeetPatil), [@JanSulavik](https://github.com/JanSulavik), [@JavOrraca](https://github.com/JavOrraca), [@jeroen](https://github.com/jeroen), [@jhk0530](https://github.com/jhk0530), [@joakimlinde](https://github.com/joakimlinde), [@JosiahParry](https://github.com/JosiahParry), [@kevbaer](https://github.com/kevbaer), [@larry77](https://github.com/larry77), [@lnkuiper](https://github.com/lnkuiper), [@lorenzwalthert](https://github.com/lorenzwalthert), [@lschneiderbauer](https://github.com/lschneiderbauer), [@luisDVA](https://github.com/luisDVA), [@math-mcshane](https://github.com/math-mcshane), [@meersel](https://github.com/meersel), [@multimeric](https://github.com/multimeric), [@mytarmail](https://github.com/mytarmail), [@nicki-dese](https://github.com/nicki-dese), [@PMassicotte](https://github.com/PMassicotte), [@prasundutta87](https://github.com/prasundutta87), [@rafapereirabr](https://github.com/rafapereirabr), [@Robinlovelace](https://github.com/Robinlovelace), [@romainfrancois](https://github.com/romainfrancois), [@sparrow925](https://github.com/sparrow925), [@stefanlinner](https://github.com/stefanlinner), [@szarnyasg](https://github.com/szarnyasg), [@thomasp85](https://github.com/thomasp85), [@TimTaylor](https://github.com/TimTaylor), [@Tmonster](https://github.com/Tmonster), [@toppyy](https://github.com/toppyy), [@wibeasley](https://github.com/wibeasley), [@yjunechoe](https://github.com/yjunechoe), [@ywhcuhk](https://github.com/ywhcuhk), [@zhjx19](https://github.com/zhjx19), [@ablack3](https://github.com/ablack3), [@actuarial-lonewolf](https://github.com/actuarial-lonewolf), [@ajdamico](https://github.com/ajdamico), [@amirmazmi](https://github.com/amirmazmi), [@anderson461123](https://github.com/anderson461123), [@andrewGhazi](https://github.com/andrewGhazi), [@Antonov548](https://github.com/Antonov548), [@appiehappie999](https://github.com/appiehappie999), [@ArthurAndrews](https://github.com/ArthurAndrews), [@arthurgailes](https://github.com/arthurgailes), [@babaknaimi](https://github.com/babaknaimi), [@bcaradima](https://github.com/bcaradima), [@bdforbes](https://github.com/bdforbes), [@bergest](https://github.com/bergest), [@bill-ash](https://github.com/bill-ash), [@BorgeJorge](https://github.com/BorgeJorge), [@brianmsm](https://github.com/brianmsm), [@chainsawriot](https://github.com/chainsawriot), [@ckarnes](https://github.com/ckarnes), [@clementlefevre](https://github.com/clementlefevre), [@cregouby](https://github.com/cregouby), [@cy-james-lee](https://github.com/cy-james-lee), [@daranzolin](https://github.com/daranzolin), [@david-cortes](https://github.com/david-cortes), [@DavZim](https://github.com/DavZim), [@denis-or](https://github.com/denis-or), [@developertest1234](https://github.com/developertest1234), [@dicorynia](https://github.com/dicorynia), [@dsolito](https://github.com/dsolito), [@e-kotov](https://github.com/e-kotov), [@EAVWing](https://github.com/EAVWing), [@eddelbuettel](https://github.com/eddelbuettel), [@edward-burn](https://github.com/edward-burn), [@elefeint](https://github.com/elefeint), [@eli-daniels](https://github.com/eli-daniels), [@elysabethpc](https://github.com/elysabethpc), [@erikvona](https://github.com/erikvona), [@florisvdh](https://github.com/florisvdh), [@gaborcsardi](https://github.com/gaborcsardi), [@ggrothendieck](https://github.com/ggrothendieck), [@hdmm3](https://github.com/hdmm3), [@hope-data-science](https://github.com/hope-data-science), [@IoannaNika](https://github.com/IoannaNika), [@jabrown-aepenergy](https://github.com/jabrown-aepenergy), [@JamesLMacAulay](https://github.com/JamesLMacAulay), [@jangorecki](https://github.com/jangorecki), [@javierlenzi](https://github.com/javierlenzi), [@Joe-Heffer-Shef](https://github.com/Joe-Heffer-Shef), [@kalibera](https://github.com/kalibera), [@lboller-pwbm](https://github.com/lboller-pwbm), [@lgaborini](https://github.com/lgaborini), [@m-muecke](https://github.com/m-muecke), [@meztez](https://github.com/meztez), [@mgirlich](https://github.com/mgirlich), [@mtmorgan](https://github.com/mtmorgan), [@nassuphis](https://github.com/nassuphis), [@nbc](https://github.com/nbc), [@olivroy](https://github.com/olivroy), [@pdet](https://github.com/pdet), [@phdjsep](https://github.com/phdjsep), [@pierre-lamarche](https://github.com/pierre-lamarche), [@r2evans](https://github.com/r2evans), [@ran-codes](https://github.com/ran-codes), [@rplsmn](https://github.com/rplsmn), [@Saarialho](https://github.com/Saarialho), [@SimonCoulombe](https://github.com/SimonCoulombe), [@tau31](https://github.com/tau31), [@thohan88](https://github.com/thohan88), [@ThomasSoeiro](https://github.com/ThomasSoeiro), [@timothygmitchell](https://github.com/timothygmitchell), [@vincentarelbundock](https://github.com/vincentarelbundock), [@VincentGuyader](https://github.com/VincentGuyader), [@wlangera](https://github.com/wlangera), [@xbasics](https://github.com/xbasics), [@xiaodaigh](https://github.com/xiaodaigh), [@xtimbeau](https://github.com/xtimbeau), [@yng-me](https://github.com/yng-me), [@Yousuf28](https://github.com/Yousuf28), [@yutannihilation](https://github.com/yutannihilation), and [@zcatav](https://github.com/zcatav) + +Special thanks to Joe Thorley ([@joethorley](https://github.com/joethorley)) for help with choosing the right words. diff --git a/content/blog/duckplyr-1-1-0/index.md b/content/blog/duckplyr-1-1-0/index.md new file mode 100644 index 000000000..f375ecc80 --- /dev/null +++ b/content/blog/duckplyr-1-1-0/index.md @@ -0,0 +1,303 @@ +--- +output: hugodown::hugo_document + +slug: duckplyr-1-1-0 +title: duckplyr fully joins the tidyverse! +date: 2025-06-19 +author: Kirill Müller and Maëlle Salmon +description: > + duckplyr 1.1.0 is on CRAN! + A drop-in replacement for dplyr, powered by DuckDB for speed. + It is the most dplyr-like of dplyr backends. + +photo: + url: https://www.pexels.com/photo/a-mallard-duck-on-water-6918877/ + author: Kiril Gruev + +# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other" +categories: [package] +tags: + - duckplyr + - dplyr + - tidyverse +rmd_hash: e61d2b86a57469dc + +--- + +We're well chuffed to announce the release of [duckplyr](https://duckplyr.tidyverse.org) 1.1.0. This is a dplyr backend powered by [DuckDB](https://duckdb.org/), a fast in-memory analytical database system[^1]. duckplyr uses the power of DuckDB for impressive performance where it can, and seemlessly falls back to R where it can't. You can install it from CRAN with: + +
install.packages("duckplyr")
+
+lineitem_tbl <- duckdb:::sql("INSTALL tpch; LOAD tpch; CALL dbgen(sf=1); FROM lineitem;")
+lineitem_tbl <- tibble::as_tibble(lineitem_tbl)
+dplyr::glimpse(lineitem_tbl)
+#> Rows: 6,001,215
+#> Columns: 16
+#> $ l_orderkey <dbl> 1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 6, …
+#> $ l_partkey <dbl> 155190, 67310, 63700, 2132, 24027, 15635, 106170, 4297…
+#> $ l_suppkey <dbl> 7706, 7311, 3701, 4633, 1534, 638, 1191, 1798, 6540, 3…
+#> $ l_linenumber <dbl> 1, 2, 3, 4, 5, 6, 1, 1, 2, 3, 4, 5, 6, 1, 1, 2, 3, 1, …
+#> $ l_quantity <dbl> 17, 36, 8, 28, 24, 32, 38, 45, 49, 27, 2, 28, 26, 30, …
+#> $ l_extendedprice <dbl> 21168.23, 45983.16, 13309.60, 28955.64, 22824.48, 4962…
+#> $ l_discount <dbl> 0.04, 0.09, 0.10, 0.09, 0.10, 0.07, 0.00, 0.06, 0.10, …
+#> $ l_tax <dbl> 0.02, 0.06, 0.02, 0.06, 0.04, 0.02, 0.05, 0.00, 0.00, …
+#> $ l_returnflag <chr> "N", "N", "N", "N", "N", "N", "N", "R", "R", "A", "A",…
+#> $ l_linestatus <chr> "O", "O", "O", "O", "O", "O", "O", "F", "F", "F", "F",…
+#> $ l_shipdate <date> 1996-03-13, 1996-04-12, 1996-01-29, 1996-04-21, 1996-…
+#> $ l_commitdate <date> 1996-02-12, 1996-02-28, 1996-03-05, 1996-03-30, 1996-…
+#> $ l_receiptdate <date> 1996-03-22, 1996-04-20, 1996-01-31, 1996-05-16, 1996-…
+#> $ l_shipinstruct <chr> "DELIVER IN PERSON", "TAKE BACK RETURN", "TAKE BACK RE…
+#> $ l_shipmode <chr> "TRUCK", "MAIL", "REG AIR", "AIR", "FOB", "MAIL", "RAI…
+#> $ l_comment <chr> "to beans x-ray carefull", " according to the final fo…
+
+
+library(duckplyr)
+#> Loading required package: dplyr
+#> The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility.
+#> Fallback events can be collected and uploaded for analysis to guide future development. By
+#> default, data will be collected but no data will be uploaded.
+#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
+#> `?duckplyr::fallback()`.
+#> ✔ Number of reports ready for upload: 4.
+#> → Review with `duckplyr::fallback_review()`, upload with `duckplyr::fallback_upload()`.
+#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
+#> ✔ Overwriting dplyr methods with duckplyr methods.
+#> ℹ Turn off with `duckplyr::methods_restore()`.
+
+
+tpch_dplyr <- function(lineitem) {
+ lineitem |>
+ filter(l_shipdate <= !!as.Date("1998-09-02")) |>
+ summarise(
+ sum_qty = sum(l_quantity),
+ sum_base_price = sum(l_extendedprice),
+ sum_disc_price = sum(l_extendedprice * (1 - l_discount)),
+ sum_charge = sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
+ avg_qty = mean(l_quantity),
+ avg_price = mean(l_extendedprice),
+ avg_disc = mean(l_discount),
+ count_order = n(),
+ .by = c(l_returnflag, l_linestatus)
+ ) |>
+ arrange(l_returnflag, l_linestatus)
+}
+
+tpch_dplyr(lineitem_tbl)
+#> # A tibble: 4 × 10
+#> l_returnflag l_linestatus sum_qty sum_base_price sum_disc_price sum_charge
+#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
+#> 1 A F 37734107 56586554401. 53758257135. 55909065223.
+#> 2 N F 991417 1487504710. 1413082168. 1469649223.
+#> 3 N O 74476040 111701729698. 106118230308. 110367043872.
+#> 4 R F 37719753 56568041381. 53741292685. 55889619120.
+#> # ℹ 4 more variables: avg_qty <dbl>, avg_price <dbl>, avg_disc <dbl>,
+#> # count_order <int>
+
+
+out <- lineitem_tbl |>
+ duckplyr::as_duckdb_tibble() |>
+ tpch_dplyr()
+
+out
+#> # A duckplyr data frame: 10 variables
+#> l_returnflag l_linestatus sum_qty sum_base_price sum_disc_price sum_charge
+#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
+#> 1 A F 37734107 56586554401. 53758257135. 55909065223.
+#> 2 N F 991417 1487504710. 1413082168. 1469649223.
+#> 3 N O 74476040 111701729698. 106118230308. 110367043872.
+#> 4 R F 37719753 56568041381. 53741292685. 55889619120.
+#> # ℹ 4 more variables: avg_qty <dbl>, avg_price <dbl>, avg_disc <dbl>,
+#> # count_order <int>
+
+
+typeof(out)
+#> [1] "list"
+class(out)
+#> [1] "duckplyr_df" "tbl_df" "tbl" "data.frame"
+out$count_order
+#> [1] 1478493 38854 2920374 1478870
+
+
+Sys.setenv(DUCKPLYR_FALLBACK_INFO = TRUE)
+
+lineitem_tbl |>
+ duckplyr::as_duckdb_tibble() |>
+ filter(l_quantity == max(l_quantity), .by = c(l_returnflag, l_linestatus))
+#> Cannot process duckplyr query with DuckDB, falling back to dplyr.
+#> ℹ `filter(.by = ...)` not implemented, try `mutate(.by = ...)` followed by a simple `filter()`.
+#> # A duckplyr data frame: 16 variables
+#> l_orderkey l_partkey l_suppkey l_linenumber l_quantity l_extendedprice
+#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 5 37531 35 3 50 73426.
+#> 2 131 44255 9264 2 50 59962.
+#> 3 199 132072 9612 1 50 55204.
+#> 4 231 198124 644 3 50 61106
+#> 5 260 155887 5888 1 50 97144
+#> 6 263 142891 434 3 50 96694.
+#> 7 323 163628 1177 1 50 84581
+#> 8 354 58125 8126 3 50 54156
+#> 9 484 183351 5870 3 50 71718.
+#> 10 485 149523 9524 1 50 78626
+#> # ℹ more rows
+#> # ℹ 10 more variables: l_discount <dbl>, l_tax <dbl>, l_returnflag <chr>,
+#> # l_linestatus <chr>, l_shipdate <date>, l_commitdate <date>,
+#> # l_receiptdate <date>, l_shipinstruct <chr>, l_shipmode <chr>,
+#> # l_comment <chr>
+
+
+tibble(a = "dbplyr", b = "duckplyr") %>%
+ mutate(c = dd$levenshtein(a, b))
+#> # A tibble: 1 × 3
+#> a b c
+#> <chr> <chr> <dbl>
+#> 1 dbplyr duckplyr 3
+
+
+tpch_duckplyr <- function(lineitem) {
+ lineitem |>
+ duckplyr::as_duckdb_tibble() |>
+ tpch_dplyr() |>
+ collect()
+}
+
+bench::mark(
+ tpch_dplyr(lineitem_tbl),
+ tpch_duckplyr(lineitem_tbl),
+ check = ~ all.equal(.x, .y, tolerance = 1e-10)
+)
+#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
+#> # A tibble: 2 × 6
+#> expression min median `itr/sec` mem_alloc `gc/sec`
+#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
+#> 1 tpch_dplyr(lineitem_tbl) 611.6ms 611.6ms 1.64 1.25GB 1.64
+#> 2 tpch_duckplyr(lineitem_tbl) 71.4ms 72.3ms 13.8 314.38KB 0
+
+
+