Description
At the current stage, RedisAI keeps tensors "at rest" on the CPU and transfers them to GPU when needed in the context of a run (while model weights stay on the GPU).
This feature would allow DAGs to keep intermediate tensors on the GPU, if possible. The possibility and opportunity to do so (e.g. tensor produced and later directly consumed by a TORCH model located on the same GPU) can be determined by analyzing the DAG with no additions to the current syntax.
The advantage of limiting this to the DAG execution context is that we allow optimizations without incurring in issues for invalidating the GPU cache. Since GPU memory is precious, we want to make sure RedisAI is not caching all tensors on the GPU.
Another area where we could cache tensors on the GPU is optimizing access to reference data (even outside DAGs). For this we could add a flag to TENSORSET
that explicitly requests copying the tensor to the GPU and keeping it there. The user would be responsible for removing it from GPU at a later stage.
/cc @gkorland