Chat performance #432
Replies: 1 comment
-
Hi @formfollows, thanks for opening a discussion for this and sharing performance details. What OS are you running your Khoj on? if you're on Windows, can you check the Windows Task Manager for GPU usage during chat? From my current understanding, Khoj's underlying library (
This indicates we could increase the number of threads or We've gotten Khoj chat to engage the GPU on M1+ Mac's and are looking into how we can support Nvidia or CUDA GPUs with Llama 2 (Khoj's offline chat AI model) |
Beta Was this translation helpful? Give feedback.
-
Hey,
I'm testing khoj's chat feature with a single 10 page PDF with no image, just text. The results are amazing, just as expected, but the performance is really bad (150 - 250 seconds per answer).
I'm using a rather powerful GPU (RTX6000) and upgraded the used Torch version in order to work witch my CUDA version
(pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121).
if torch.cuda.is_available(): in
\Lib\site-packages\khoj\utils\state.py
returns True so I assume GPU to be used?
Any other settings to be adjusted to gain better performance or any other way to crosscheck if GPU is being used?
All my hardware resource were under 5% during inference tasks..
Thanks a lot,
KR
Ralf
Beta Was this translation helpful? Give feedback.
All reactions