Skip to content

Conversation

@gnat42
Copy link
Contributor

@gnat42 gnat42 commented Aug 21, 2015

Allow the Offset filter to stop the process function loop. The impetus is the
lack of detail the result object contains - particularly when using the Offset
filter. When processing a file in chunks the StepAggregator reports the number
of successfully imported results and errors but not the number of skipped. This
allows a user to seek the reader, and then process a batch and then stop.
Receiving an accurate set of processed,skipped and errors.

Allow the Offset filter to stop the process function loop. The impetus is the
lack of detail the result object contains - particularly when using the Offset
filter. When processing a file in chunks the StepAggregator reports the number
of successfully imported results and errors but not the number of skipped. This
allows a user to seek the reader, and then process a batch and then stop.
Receiving an accurate set of processed,skipped and errors.
@gnat42
Copy link
Contributor Author

gnat42 commented Aug 21, 2015

To give some added detail to the impetus for this.

We're building a system that allows users to build maps and import data. The backend of the import process is this library. However the use case includes uploading files with 150 000 rows. Obviously this needs to be processed in chunks. When importing this I was using the OffsetFilter to move into position and then process X rows of the source file. I then move forward based on the totalProcessedCount of the Result object. However really totalProcessedCount was only a count of the imported+errors. It excluded any skipped rows. Once realizing this, I started digging.

This resulted in realizing that the OffsetFilter didn't affect the reader, and would still cause the reader to read through all rows un-necessarily. So I added to the OffsetFilter the ability to stop the processing by throwing a specific exception. I considered adding one to skip the first set but figured that was a bit odd. So in my usecase I call seek() on the reader prior to processing.

The next issue is knowing that I've actually consumed my batch size. I would request that I process 400 records using the OffsetFilter($currentRowPosition,400). And get a totalProcessedCount of lets say 365. This was because 35 rows has some kind of issue and were skipped.

With this change I can know how many rows were processed, skipped or had an exception as well as the number that were imported.

I'm still a bit unsure if the OffsetFilter should be using a SkipException to filter the first set of rows mainly for a consistent 'API' as it were. Though really reading through and 'hydrating' the rows to get to the position in place is a bit weird/in-efficient.

When a filter throws the StopException we need to decrement the processed
counter to match the actual count.
When an exception occurs we need continue the loop but not increment the
imported counter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant