fix chunking equal number of videos for each thread. #40

pulinagrawal · 2019-12-14T00:15:51Z

Previously NUM_THREADS sized chunks were created out of the videos list. Now NUM_THREADS chunks of almost equal size are created.

YodaEmbedding · 2020-07-17T02:06:33Z

That's one way but I think the following way is nicer since a single thread processes contiguous array elements (from which it is easier to trace any errors which might have occurred during the process):

def split(xs, n):
    """Yields n roughly even-sized chunks from xs."""
    size = len(xs)
    q = size // n
    r = size % n
    offset = 0
    for i in range(r):
        yield xs[offset : offset + (q + 1)]
        offset += q + 1
    for i in range(n - r):
        yield xs[offset : offset + q]
        offset += q
    assert offset == size

One should also sort the list of files beforehand via:

video_list = os.listdir(VIDEO_ROOT)
video_list.sort(key=lambda x: int(x.split(".")[0]))

EDIT: On the other hand, your method makes it so that the dataset is mostly processed "in order", assuming the threads are roughly synced.

fix chunking equal number of videos for each thread.

8ca20b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix chunking equal number of videos for each thread. #40

fix chunking equal number of videos for each thread. #40

Uh oh!

pulinagrawal commented Dec 14, 2019

Uh oh!

YodaEmbedding commented Jul 17, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix chunking equal number of videos for each thread. #40

Are you sure you want to change the base?

fix chunking equal number of videos for each thread. #40

Uh oh!

Conversation

pulinagrawal commented Dec 14, 2019

Uh oh!

YodaEmbedding commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YodaEmbedding commented Jul 17, 2020 •

edited

Loading