Skip to content

<think> tags for thinking models #513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JoelNiklaus opened this issue Jan 24, 2025 · 2 comments · May be fixed by #617
Open

<think> tags for thinking models #513

JoelNiklaus opened this issue Jan 24, 2025 · 2 comments · May be fixed by #617
Labels
feature/enhancement New feature/request

Comments

@JoelNiklaus
Copy link
Contributor

Thinking models like DeepSeek-R1 emit thinking tags in the output. Is there a way to filter these out easily? Currently they make it directly into the output and so mess up metrics.

@clefourrier
Copy link
Member

Nope, not at the moment! We used to have regex parsers but they were underused so we removed them.

@mapmeld
Copy link
Contributor

mapmeld commented Mar 14, 2025

I had the same question and got something working. For now it's more of a hack, but hopefully this is a starting point to get it working generally. Here is my branch and demo notebook.

Notes:

  • could add a flag (similar to use_chat_template). I saw some code uses add_reasoning_prompt
  • <think> is already in tokenizer_config.json's chat_template field, so maybe instead of hardcoding, possible to detect common tokens there, and then insert it in the chat template again?
  • I needed to change my answer options from ["A", "B"...] to ["The answer is A", ...]
  • 2048 tokens seems like the right size
  • didn't test few-shot
  • this isn't maximally efficient because it works on one doc at a time, but I don't know how well large batches would fit

Here's the key section in prompt_manager:

        elif use_chat_template:
            chat_preview = self.model.tokenizer.apply_chat_template(
                output, tokenize=False, add_generation_prompt=True
            )
            tokenized = self.model.tokenizer(chat_preview, return_tensors="pt").to(self.model.device)
            prepared_batch = Batch(
                input_ids=tokenized["input_ids"],
                input_mask=tokenized["attention_mask"],
                input_lengths=[len(tokenized["input_ids"][0])],
                truncated=[False],
                padded=[False],
            )
            response = self.model._generate(
                batch=prepared_batch,
                max_new_tokens=2048,
                stop_tokens=["</think>"],
            )
            all_start = chat_preview + response[0].result[0] + "</think>"
            return all_start, num_effective_fewshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/enhancement New feature/request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants