Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at output #66

Open
fahdmirza opened this issue May 30, 2024 · 6 comments
Open

Stuck at output #66

fahdmirza opened this issue May 30, 2024 · 6 comments

Comments

@fahdmirza
Copy link

Hi,
On ubuntu 22.04, It just stucks at generating output for hours:

Import the Llama class of llama-cpp-python and the LlamaCppPythonProvider of llama-cpp-agent

from llama_cpp import Llama
from llama_cpp_agent.providers import LlamaCppPythonProvider

Create an instance of the Llama class and load the model

llama_model = Llama(r"mistral-7b-instruct-v0.2.Q5_K_S.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)

Create the provider by passing the Llama class instance to the LlamaCppPythonProvider class

provider = LlamaCppPythonProvider(llama_model)

from llama_cpp_agent import LlamaCppAgent
from llama_cpp_agent import MessagesFormatterType

agent = LlamaCppAgent(provider, system_prompt="You are a helpful assistant.", predefined_messages_formatter_type=MessagesFormatterType.CHATML)

agent_output = agent.get_chat_response("Hello, World!")

Stucks here........

I have NVIDIA A6000 GPU and plenty of memory. I have also tried installing llama.cpp from source but still same issue. Any ideas

@Maximilian-Winter
Copy link
Owner

@fahdmirza I will look into this.

@fahdmirza
Copy link
Author

@fahdmirza I will look into this.

Thank you, I am doing a review of this for my channel https://www.youtube.com/@fahdmirza , as it looks so promising. Thanks.

@pabl-o-ce
Copy link
Collaborator

Hello @fahdmirza for this you have to install llama-cpp-python with cuda? we have tested now on A100 on HF spaces and work nice I don't understand your problem

@pabl-o-ce
Copy link
Collaborator

your code that share here you are using mistral models with CHATML message format I think you have to use MISTRAL for the chat template

@pabl-o-ce
Copy link
Collaborator

pabl-o-ce commented Jul 13, 2024

nevertheless check this we have all setup some examples In HF spaces if wanna make a review for your YouTube channel hehe https://huggingface.co/poscye

@pabl-o-ce
Copy link
Collaborator

pabl-o-ce commented Jul 13, 2024

This would work for you always use the correct chat template for the model

from llama_cpp_agent import LlamaCppAgent
from llama_cpp_agent import MessagesFormatterType

llama_model = Llama(r"mistral-7b-instruct-v0.2.Q5_K_S.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)
provider = LlamaCppPythonProvider(llama_model)

agent = LlamaCppAgent(provider, system_prompt="You are a helpful assistant.", predefined_messages_formatter_type=MessagesFormatterType.MISTRAL)

agent_output = agent.get_chat_response("Hello, World!")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants