Cache tensors on GPU in DAGs

At the current stage, RedisAI keeps tensors "at rest" on the CPU and transfers them to GPU when needed in the context of a run (while model weights stay on the GPU).

This feature would allow DAGs to keep intermediate tensors on the GPU, if possible. The possibility and opportunity to do so (e.g. tensor produced and later directly consumed by a TORCH model located on the same GPU) can be determined by analyzing the DAG with no additions to the current syntax.

The advantage of limiting this to the DAG execution context is that we allow optimizations without incurring in issues for invalidating the GPU cache. Since GPU memory is precious, we want to make sure RedisAI is not caching all tensors on the GPU.

Another area where we could cache tensors on the GPU is optimizing access to reference data (even outside DAGs). For this we could add a flag to `TENSORSET` that explicitly requests copying the tensor to the GPU and keeping it there. The user would be responsible for removing it from GPU at a later stage.

/cc @gkorland 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache tensors on GPU in DAGs #468

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache tensors on GPU in DAGs #468

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions