-
Notifications
You must be signed in to change notification settings - Fork 2
mabel.data.writers.batch_writer
Justin Joyce edited this page Jul 23, 2022
·
6 revisions
The batch data writer to writes data records into blobs. Batches are written into timestamped folders called Partitions.
-
dataset - string (optional)
The name of the dataset - this is used to map to a path -
schema - mabel.validator.Schema (optional)
Schema used to test records for conformity, default is no schema and therefore no validation - format - string (optional)
- text: raw text lines - jsonl: raw json lines - flat: flattened json records in json lines - lzma: lzma compressed json lines - zstd: zstandard compressed json lines (default) - parquet: Apache Parquet
-
date - date or string (optional)
A date, a string representation of a date to use for creating the dataset. The default is today's date -
blob_size - integer (optional)
The maximum size of blobs, the default is 64Mb -
inner_writer - BaseWriter (optional)
The component used to commit data, the default writer is the NullWriter - frame_id - string (optional)
raw_path: boolean (optional) Don't automatically add any date parts to dataset names
-
index_on - collection (optional)
Index on these columns, the default is to not index -
metadata - dict (optional)
data to write into the frame.complete file
- Different inner_writers may take or require additional parameters.
This file has been automatically generated, it is not the truth. If in doubt the code will tell you unambiguously what it does.