From 565ee2055c9a5d9c60430767255305e5b0920fc4 Mon Sep 17 00:00:00 2001 From: "Ritesh.K" Date: Thu, 9 Oct 2025 23:33:27 +0530 Subject: [PATCH 1/6] Add documentation for Delta Sharing node type Signed-off-by: Ritesh.K --- docs/node/guides/lab21.md | 77 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 docs/node/guides/lab21.md diff --git a/docs/node/guides/lab21.md b/docs/node/guides/lab21.md new file mode 100644 index 0000000..6a3b867 --- /dev/null +++ b/docs/node/guides/lab21.md @@ -0,0 +1,77 @@ +--- +sidebar_position: 20 +--- + +# Lab 21: Delta Sharing + +This lab contains an example configuration on VILLASnode's delta_sharing node-type. + +An example is created to connect to an open source server present at "https://sharing.delta.io/delta_sharing/". + +The delta sharing node connects to the server mentioned in the share file added in the configuration. +The required table is then read by mentioning the schema and the name of the table. + +## VILLASnode configuration file + +### Delta Sharing client + +``` url="external/node/etc/labs/lab21.conf" title="node/etc/labs/lab21.conf" +nodes = { + node1 = { + type = "delta_sharing" + profile_path = "open-datasets.share" + table_path = "open-datasets.share#delta_sharing.default.COVID_19_NYT" + cache_dir = "cache" + op = "read" + }, + node2 = { + type = "file" + uri = "delta_output.dat" + in = { + epoch_mode = "direct" + read_mode = "all" + eof = "stop" + } + out = { + + } + } + +} +paths = ( + { + in = "node1" + out = "node2" + } +) +``` + +### Share file + +``` url="external/node/etc/labs/open-datasets.share" title="node/etc/labs/open-datasets.share" +{ + "shareCredentialsVersion": 1, + "endpoint": "https://sharing.delta.io/delta-sharing/", + "bearerToken": "faaie590d541265bcab1f2de9813274bf233" +} +``` + +This configuration file is used to read from and (planned) write to Delta Sharing tables using Apache Arrow/Parquet. Files downloaded from the server are cached locally. + +Default cache directory is cwd/cache unless specified otherwise using the cache_dir parameter. + +Supported keys in the configuration: +profile_path: path to a Delta Sharing profile JSON. +table_path: path for the table, here we mention the server, share and the schema in the format - ```server#share.schame.table``` +batch_size: batch size to be used for parsing rows in the Arrow table. Currently not implemented. + +The output is then piped into a .dat file using the file nodetype. + +To start the delta sharing node, in a terminal: + +```shell +villas node lab21.conf +``` + +The received data from the remote table should then be displayed in the terminal and also written into the dat file. + From 0bdeb2cd90321339ca3da4330033b3c206e5c0d5 Mon Sep 17 00:00:00 2001 From: "Ritesh.K" Date: Mon, 13 Oct 2025 12:59:41 +0530 Subject: [PATCH 2/6] Add documentation for delta sharing node type Signed-off-by: Ritesh.K --- docs/node/guides/lab21.md | 2 +- docs/node/nodes/delta_sharing.md | 47 ++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 docs/node/nodes/delta_sharing.md diff --git a/docs/node/guides/lab21.md b/docs/node/guides/lab21.md index 6a3b867..0b322ae 100644 --- a/docs/node/guides/lab21.md +++ b/docs/node/guides/lab21.md @@ -62,7 +62,7 @@ Default cache directory is cwd/cache unless specified otherwise using the cache_ Supported keys in the configuration: profile_path: path to a Delta Sharing profile JSON. -table_path: path for the table, here we mention the server, share and the schema in the format - ```server#share.schame.table``` +table_path: path for the table, here we mention the server, share and the schema in the format - ```server#share.schema.table``` batch_size: batch size to be used for parsing rows in the Arrow table. Currently not implemented. The output is then piped into a .dat file using the file nodetype. diff --git a/docs/node/nodes/delta_sharing.md b/docs/node/nodes/delta_sharing.md new file mode 100644 index 0000000..f558b94 --- /dev/null +++ b/docs/node/nodes/delta_sharing.md @@ -0,0 +1,47 @@ +--- +hide_table_of_contents: true +--- + +# Delta Sharing + +The `delta_sharing` node type integrates with a Delta Sharing server to read from Delta tables using Apache Arrow. + +## Prerequisites + +- A reachable Delta Sharing server and a valid Delta Sharing profile path (`profile_path`). +- Apache Arrow and Parquet are required at build time. They are core dependencies for this node type. +- A local cache directory to store the downloaded parquet files. + +Supported Keys: + +- `profile_path` (string, required): Path to a Delta Sharing profile file. +- `cache_dir` (string, optional): Local directory for caching fetched parquet files. +- `table_path` (string, required for `read`/`write`): Table path in the format `server#share.schema.table`. +- `op` (string, optional): One of `read`, `write`, `noop`. Defaults to `noop`. +- `batch_size` (integer, optional): Batch size for chunk I/O (currently not implemented). + +## Behaviour: + +- On start, the node initializes a Delta Sharing Client from `profile_path` and lists available shares, schemas and tables. +- For `op=read`, the node parses `table_path` populates cache from each file, loads the first file as an Arrow table. It then maps Arrow types to VILLASnode supported datatypes. +- For `op = write` the node constructs and in-memory Arrow `Table` from outgoing VILLASnode samples based on the supported signal types. Current implementation does not upload to a Delta Sharing server yet. +- Supported datatypes for reading are DOUBLE, FLOAT, INT64, INT32. Others are classified as unsupported and filled with defaults. + +## Example + +``` url="external/node/etc/examples/nodes/delta_sharing.conf" title="node/etc/examples/nodes/delta_sharing.conf" + +nodes = { + delta_node = { + type = "delta_sharing" + + + ### The following settings are specific to the delta sharing node type!! ### + + profile_path = "dataset.share" # This specifies the URI where the server credentials are saved + table_path = "dataset.share.share#delta_sharing.default.example_table" # The format for the table should be in this format: server#share.schema.table + cache_dir = "cache" # This specifies the uri for the cache directory + + op = "read" # Either read or write tables + } +} \ No newline at end of file From efe69e8e68f1f6e3fec99e16607c9fda597d413f Mon Sep 17 00:00:00 2001 From: RiteshKarki27 Date: Mon, 20 Oct 2025 10:04:48 +0530 Subject: [PATCH 3/6] Update docs/node/nodes/delta_sharing.md Co-authored-by: Steffen Vogel Signed-off-by: RiteshKarki27 --- docs/node/nodes/delta_sharing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/node/nodes/delta_sharing.md b/docs/node/nodes/delta_sharing.md index f558b94..4fb55bb 100644 --- a/docs/node/nodes/delta_sharing.md +++ b/docs/node/nodes/delta_sharing.md @@ -20,7 +20,7 @@ Supported Keys: - `op` (string, optional): One of `read`, `write`, `noop`. Defaults to `noop`. - `batch_size` (integer, optional): Batch size for chunk I/O (currently not implemented). -## Behaviour: +## Behaviour - On start, the node initializes a Delta Sharing Client from `profile_path` and lists available shares, schemas and tables. - For `op=read`, the node parses `table_path` populates cache from each file, loads the first file as an Arrow table. It then maps Arrow types to VILLASnode supported datatypes. From 0b3dd90145ce49dd8d8a69fd2465974acbca5b1a Mon Sep 17 00:00:00 2001 From: RiteshKarki27 Date: Mon, 20 Oct 2025 10:05:02 +0530 Subject: [PATCH 4/6] Update docs/node/nodes/delta_sharing.md Co-authored-by: Steffen Vogel Signed-off-by: RiteshKarki27 --- docs/node/nodes/delta_sharing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/node/nodes/delta_sharing.md b/docs/node/nodes/delta_sharing.md index 4fb55bb..18315b1 100644 --- a/docs/node/nodes/delta_sharing.md +++ b/docs/node/nodes/delta_sharing.md @@ -23,7 +23,7 @@ Supported Keys: ## Behaviour - On start, the node initializes a Delta Sharing Client from `profile_path` and lists available shares, schemas and tables. -- For `op=read`, the node parses `table_path` populates cache from each file, loads the first file as an Arrow table. It then maps Arrow types to VILLASnode supported datatypes. +- For `op = read`, the node parses `table_path` populates cache from each file, loads the first file as an Arrow table. It then maps Arrow types to VILLASnode supported datatypes. - For `op = write` the node constructs and in-memory Arrow `Table` from outgoing VILLASnode samples based on the supported signal types. Current implementation does not upload to a Delta Sharing server yet. - Supported datatypes for reading are DOUBLE, FLOAT, INT64, INT32. Others are classified as unsupported and filled with defaults. From 3c97f5b8b76feebd9fc9b084b587a6a00329a5f2 Mon Sep 17 00:00:00 2001 From: RiteshKarki27 Date: Mon, 20 Oct 2025 10:05:28 +0530 Subject: [PATCH 5/6] Update docs/node/nodes/delta_sharing.md Co-authored-by: Steffen Vogel Signed-off-by: RiteshKarki27 --- docs/node/nodes/delta_sharing.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/node/nodes/delta_sharing.md b/docs/node/nodes/delta_sharing.md index 18315b1..778842c 100644 --- a/docs/node/nodes/delta_sharing.md +++ b/docs/node/nodes/delta_sharing.md @@ -30,7 +30,6 @@ Supported Keys: ## Example ``` url="external/node/etc/examples/nodes/delta_sharing.conf" title="node/etc/examples/nodes/delta_sharing.conf" - nodes = { delta_node = { type = "delta_sharing" From f82ea865c714782401aecbe8a8adbe9aa8587e53 Mon Sep 17 00:00:00 2001 From: "Ritesh.K" Date: Mon, 3 Nov 2025 19:22:58 +0530 Subject: [PATCH 6/6] Add installation instructions for building with delta sharing node type Signed-off-by: Ritesh.K --- docs/node/installation.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/node/installation.md b/docs/node/installation.md index 06c1aba..f5ab187 100644 --- a/docs/node/installation.md +++ b/docs/node/installation.md @@ -64,6 +64,7 @@ VILLASnode currently has the following list of dependencies: | [protobuf](https://github.com/google/protobuf) | >= 2.6.0 | for the [Protobuf format-type](formats/protobuf.md) | optional | similar to BSD | | [rabbitmq-c](https://github.com/alanxz/rabbitmq-c) | >= 0.8.0 | for the [AMQP node-type](nodes/amqp.md) | optional | MIT | | [rdkafka](https://github.com/edenhill/librdkafka) | >= 1.5.0 | for the [Kafka node-type](nodes/kafka.md) | optional | BSD | +| [arrow] (https://github.com/apache/arrow.git) | - | for the [Delta Sharing node-type](nodes/delta_sharing.md) | optional | Apache 2 | There are three ways to install these dependencies: @@ -106,7 +107,12 @@ sudo apt-get install \ uuid-dev \ libre2-dev \ libglib2.0-dev \ - libcriterion-dev + libcriterion-dev \ + libre2 libutf8proc-dev \ + zlib1g-dev liblz4-dev \ + brotli-dev libzstd-dev \ + libsnappy-dev libboost-all-dev \ + libthrift-dev rapidjson-dev libxsimd-dev ``` or the following line for Fedora/Redhat/RockyLinux systems: @@ -146,7 +152,13 @@ sudo dnf install \ spdlog-devel \ zeromq-devel \ glib2-devel \ - libnice-devel + libnice-devel \ + re2-devel utf8proc-devel \ + zlib-devel brotli-devel lz4-devel zstd-devel \ + snappy-devel boost-devel thrift thrift-devel \ + rapidjson-devel xsimd-devel + + ```