Do not block write when a released shared output tensor exists

The shared tensor mechanism introduced in #58 operates by providing several output tensors to each runner for writing their RunnerModel results. When all output tensors have been used, the runner goes back to the first tensor (like a ring buffer), and repeats the whole process. To ensure that an output tensor is not overwritten before any consumer reads from it, we've added a simple synchronization procedure -- a producer cannot access a shared tensor before any consumer releases its `Event`.

Unfortunately, the current implementation may lead to cases in which the writer runner is waiting on a certain shared tensor while another shared tensor has already been released. We have not implemented a way to randomly access any tensor in the ring buffer; a write to tensor N must always be followed by a write to tensor N+1 (mod `len(buffer)`). Thus, if the consumer runner that was reading from tensor N+1 releases the tensor before the consumer on tensor N finishes, then we may have a situation where the writer blocks on tensor N even though tensor N+1 is immediately accessible. This may lead to unnecessary wait latency, harming the overall performance of the job.

A simple solution would be to have yet another queue (within the local process) for managing "released" shared tensors, so that writes on tensors do not necessarily have to follow a certain order.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not block write when a released shared output tensor exists #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Do not block write when a released shared output tensor exists #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions