-
Notifications
You must be signed in to change notification settings - Fork 2
help_me
BatchWriters close datasets at the end of the job - any subsequent writes are to a new version of the dataset. StreamWriters don't close datasets, any subsequent writes append to the existing dataset.
If you are copying a dataset from one place to another, you want the batch writer - when the copy is done, any future copies should be a different version of the data.
If you are building up a data set and different runs of the jobs are adding to it, you should use a StreamWriter.
The Reader
is designed to read datasets made of multiple files rather than
reading specific of individual files. It is much happier being directed
at folders, which it will then read over the files within that folder.
There are some reserved words which the Reader
uses for specific meaning
such as folders prefixed with as_at_, folders called BACKOUT and files
called .ignore. Any of these in your structure, not put there by mabel,
may confuse the Reader
.
The Reader
ignores files it doesn't understand, there is a list of file
extensions which it will read from. Files with extensions not on this list
will be ignored.
To help debug your code, setting the LOGGING_LEVEL
environment variable will provide more information about what the Reader
is doing as it tries to read data.