Chat performance #432

formfollows · 2023-08-08T17:10:10Z

formfollows
Aug 8, 2023

Hey,

I'm testing khoj's chat feature with a single 10 page PDF with no image, just text. The results are amazing, just as expected, but the performance is really bad (150 - 250 seconds per answer).

I'm using a rather powerful GPU (RTX6000) and upgraded the used Torch version in order to work witch my CUDA version
(pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121).

if torch.cuda.is_available(): in
\Lib\site-packages\khoj\utils\state.py
returns True so I assume GPU to be used?

Any other settings to be adjusted to gain better performance or any other way to crosscheck if GPU is being used?
All my hardware resource were under 5% during inference tasks..

Thanks a lot,
KR
Ralf

debanjum · 2023-08-09T08:22:41Z

debanjum
Aug 9, 2023
Maintainer

Hi @formfollows, thanks for opening a discussion for this and sharing performance details.

What OS are you running your Khoj on? if you're on Windows, can you check the Windows Task Manager for GPU usage during chat? From my current understanding, Khoj's underlying library (gpt4all) doesn't have CUDA (or OpenCL) support yet.

All my hardware resource were under 5% during inference tasks..

This indicates we could increase the number of threads or batch_size Khoj offline chat uses on your machine to improve chat latency.

We've gotten Khoj chat to engage the GPU on M1+ Mac's and are looking into how we can support Nvidia or CUDA GPUs with Llama 2 (Khoj's offline chat AI model)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat performance #432

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Chat performance #432

formfollows Aug 8, 2023

Replies: 1 comment

debanjum Aug 9, 2023 Maintainer

formfollows
Aug 8, 2023

debanjum
Aug 9, 2023
Maintainer