Skip to content

Conversation

@pulinagrawal
Copy link

Previously NUM_THREADS sized chunks were created out of the videos list. Now NUM_THREADS chunks of almost equal size are created.

@YodaEmbedding
Copy link

YodaEmbedding commented Jul 17, 2020

That's one way but I think the following way is nicer since a single thread processes contiguous array elements (from which it is easier to trace any errors which might have occurred during the process):

def split(xs, n):
    """Yields n roughly even-sized chunks from xs."""
    size = len(xs)
    q = size // n
    r = size % n
    offset = 0
    for i in range(r):
        yield xs[offset : offset + (q + 1)]
        offset += q + 1
    for i in range(n - r):
        yield xs[offset : offset + q]
        offset += q
    assert offset == size

One should also sort the list of files beforehand via:

video_list = os.listdir(VIDEO_ROOT)
video_list.sort(key=lambda x: int(x.split(".")[0]))

EDIT: On the other hand, your method makes it so that the dataset is mostly processed "in order", assuming the threads are roughly synced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants