Skip to content

help_me

Justin Joyce edited this page Apr 29, 2021 · 1 revision

Answers to Questions

Should I use a BatchWriter or StreamWriter?

BatchWriters close datasets at the end of the job - any subsequent writes are to a new version of the dataset. StreamWriters don't close datasets, any subsequent writes append to the existing dataset.

If you are copying a dataset from one place to another, you want the batch writer - when the copy is done, any future copies should be a different version of the data.

If you are building up a data set and different runs of the jobs are adding to it, you should use a StreamWriter.

Why can't the Reader Find my File?

The Reader is designed to read datasets made of multiple files rather than reading specific of individual files. It is much happier being directed at folders, which it will then read over the files within that folder.

There are some reserved words which the Reader uses for specific meaning such as folders prefixed with as_at_, folders called BACKOUT and files called .ignore. Any of these in your structure, not put there by mabel, may confuse the Reader.

The Reader ignores files it doesn't understand, there is a list of file extensions which it will read from. Files with extensions not on this list will be ignored.

To help debug your code, setting the LOGGING_LEVEL environment variable will provide more information about what the Reader is doing as it tries to read data.

Clone this wiki locally