-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
Feature Request
Loading a model is expensive. Therefore, being able to use the same model instance for use with all users is desirable. The current API is seems to be designed around 1 user interacting with the model and so the API is a little clunky for swapping message history in and out.
At the moment, for each user session you need to do something like:
def generate(
self, session_id: str, system_prompt: str, prompt_template: str, user_input: str
) -> str:
output: Final[str]
with self.model.chat_session(system_prompt, prompt_template):
if session_id in self.id_to_chat_messages:
# Load chat messages
chat_messages = self.id_to_chat_messages[session_id]
self.model.current_chat_session = chat_messages
# Generate output
output = self.__generate(user_input)
# Save chat messages
self.id_to_chat_messages[session_id] = self.model.current_chat_session
return outputThe above works and history is successfully swapped in and out (the model seems to behave correctly at least). However, it feels wrong to have to do this and I'm not sure what consequences it has on the backend given I imagine there's some optimisations it does with a long message history.
Having a model instance per user does not scale due the resources required, so having a cleaner API to swap message histories in and out would be super useful.