Skip to content

mabel.data.writers.batch_writer

Justin Joyce edited this page Jul 23, 2022 · 6 revisions

CLASS: BatchWriter ()

The batch data writer to writes data records into blobs. Batches are written into timestamped folders called Partitions.

Parameters

  • dataset - string (optional)
    The name of the dataset - this is used to map to a path
  • schema - mabel.validator.Schema (optional)
    Schema used to test records for conformity, default is no schema and therefore no validation
  • format - string (optional)
  • text: raw text lines - jsonl: raw json lines - flat: flattened json records in json lines - lzma: lzma compressed json lines - zstd: zstandard compressed json lines (default) - parquet: Apache Parquet
  • date - date or string (optional)
    A date, a string representation of a date to use for creating the dataset. The default is today's date
  • blob_size - integer (optional)
    The maximum size of blobs, the default is 64Mb
  • inner_writer - BaseWriter (optional)
    The component used to commit data, the default writer is the NullWriter
  • frame_id - string (optional)

raw_path: boolean (optional) Don't automatically add any date parts to dataset names

  • index_on - collection (optional)
    Index on these columns, the default is to not index
  • metadata - dict (optional)
    data to write into the frame.complete file

Note

  • Different inner_writers may take or require additional parameters.

finalize ()


This file has been automatically generated, it is not the truth. If in doubt the code will tell you unambiguously what it does.

Clone this wiki locally