Hello, @ShaoxunZeng I read the code and have two questions:
- I find the
save_cuda_graph at https://github.com/ShaoxunZeng/PyTorch-Medusa/blob/de68b8092d45893e45489b39a219d66b7897c73d/aten/src/ATen/cuda/CUDAGraph.cpp#L216. This function reads the information of CUDAGraph from files generated in offline execution. Right?
- During online execution, the native capture still exists at https://github.com/ShaoxunZeng/PyTorch-Medusa/blob/de68b8092d45893e45489b39a219d66b7897c73d/aten/src/ATen/cuda/CUDAGraph.cpp#L128. Why Medusa can boot faster with additional reading operations?
Maybe I understand it wrong, can you explain it in more detail?