-
Notifications
You must be signed in to change notification settings - Fork 432
Open
Description
In the file conversation.py, the Llama-3 chat is given by the line 107
self.tokenizer.apply_chat_template(chat_template_messages, tokenize=False, add_generation_prompt=False)
which means the token <|start_header_id|> and <|end_header_id|> will be inserted automatically by the chat template of the tokenizer. However the token <|start_header_id|> is also in the roles as well (line 353)
roles=("<|start_header_id|>user", "<|start_header_id|>assistant"),
So the token <|start_header_id|> will be duplicated in the output like this:
<|start_header_id|><|start_header_id|>user<|end_header_id|>\n\n....<|eot_id|><|start_header_id|><|start_header_id|>assistant<|end_header_id|>\n\n...
Is this the correct behavior?
bofei5675
Metadata
Metadata
Assignees
Labels
No labels