From 75d2def2201dc2c7b5ddbfc69f379a2d841ba9af Mon Sep 17 00:00:00 2001 From: Matthew Turner Date: Wed, 19 Jan 2022 11:03:58 -0500 Subject: [PATCH 1/3] Add roadmap to readme --- README.md | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/README.md b/README.md index 5e32bf7d516c..eaf402bf933a 100644 --- a/README.md +++ b/README.md @@ -141,6 +141,60 @@ datafusion = "6.0.0" DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information. +# Roadmap + +A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding. + +## 2022 Q1 + +### DataFusion Core + +- Publish official Arrow2 branch +- Implementation of memory manager (i.e. to enable spilling to disk as needed) + +### Benchmarking + +- Inclusion in Db-Benchmark with all quries covered +- All TPCH queries covered + +### Performance Improvements + +- Predicate evaluation +- Multi-column comparisons that can't be vectorized +- Null constant support + +### New Features + +- Read JSON as table +- Simplify DDL with Datafusion-Cli +- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support +- Add new experimental e-graph based optimizer + +### Ballista + +- Begin work on design documents and plan / priorities for development + +### Extensions + +- Stable S3 support +- Begin design discussions and prototyping of a stream provider + +## Beyond 2022 Q1 + +There is no clear timeline for the below, but community members have expressed interest in working on these topics. + +### DataFusion Core + +- Custom SQL support +- Split DataFusion into multiple crates +- Push based query execution and code gen + +### Ballista + +- Evolve architecture so that it can be deployed in a multi-tenant cloud native environment +- Ensure Ballista is scalable, elastic, and stable for production usage +- Develop distributed ML capabilities + # Status ## General From 3a98927db197e365184841ea0b62ade201087668 Mon Sep 17 00:00:00 2001 From: Matthew Turner Date: Wed, 19 Jan 2022 11:42:32 -0500 Subject: [PATCH 2/3] Link to datafusion-contrib --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index eaf402bf933a..625c2d2f8c26 100644 --- a/README.md +++ b/README.md @@ -174,7 +174,7 @@ A quarterly roadmap will be published to give the DataFusion community visibilit - Begin work on design documents and plan / priorities for development -### Extensions +### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib])) - Stable S3 support - Begin design discussions and prototyping of a stream provider @@ -187,7 +187,7 @@ There is no clear timeline for the below, but community members have expressed i - Custom SQL support - Split DataFusion into multiple crates -- Push based query execution and code gen +- Push based query execution and code generation ### Ballista From 9b929b6aca64578786a11eb7c6f07b93115afdda Mon Sep 17 00:00:00 2001 From: Matthew Turner Date: Thu, 20 Jan 2022 00:31:37 -0500 Subject: [PATCH 3/3] Update multi column comparisons --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 625c2d2f8c26..a2918ab22d5a 100644 --- a/README.md +++ b/README.md @@ -160,7 +160,7 @@ A quarterly roadmap will be published to give the DataFusion community visibilit ### Performance Improvements - Predicate evaluation -- Multi-column comparisons that can't be vectorized +- Improve multi-column comparisons (that can't be vectorized at the moment) - Null constant support ### New Features