-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The shared tensor mechanism introduced in #58 operates by providing several output tensors to each runner for writing their RunnerModel results. When all output tensors have been used, the runner goes back to the first tensor (like a ring buffer), and repeats the whole process. To ensure that an output tensor is not overwritten before any consumer reads from it, we've added a simple synchronization procedure -- a producer cannot access a shared tensor before any consumer releases its Event.
Unfortunately, the current implementation may lead to cases in which the writer runner is waiting on a certain shared tensor while another shared tensor has already been released. We have not implemented a way to randomly access any tensor in the ring buffer; a write to tensor N must always be followed by a write to tensor N+1 (mod len(buffer)). Thus, if the consumer runner that was reading from tensor N+1 releases the tensor before the consumer on tensor N finishes, then we may have a situation where the writer blocks on tensor N even though tensor N+1 is immediately accessible. This may lead to unnecessary wait latency, harming the overall performance of the job.
A simple solution would be to have yet another queue (within the local process) for managing "released" shared tensors, so that writes on tensors do not necessarily have to follow a certain order.