Problem
OpenAICompatClient never captures reasoning/thinking content. Unlike VLLMClient, OllamaClient, and LlamafileClient — which attach reasoning to ToolCall.reasoning — OpenAICompatClient._parse_tool_calls (src/forge/clients/openai_compat.py:153-172) takes no reasoning parameter, and neither send() nor send_stream() ever look at a reasoning field or <think> tags. So for any reasoning model behind an OpenAI-compatible endpoint, the entire chain-of-thought is dropped on every tool call — no REASONING message downstream, breaking reasoning replay (full / keep-last).
This is the same class of bug as #110 (fixed for vLLM + Ollama in #113), but for the generic OpenAI-compat client. It was deliberately deferred from #113 because it needs more design surface than the other two clients.
Scope / design notes (decided during the #113 parity work)
- No raw-content fallback. vLLM/Ollama/llamafile fall back to raw
content as reasoning because they target local instruct models that narrate before a tool call. OpenAICompatClient is the deliberately provider-agnostic client (Groq/Together/OpenRouter/hosted instruct/etc.), where a content preamble alongside a tool call is routinely legitimate user-facing text, not chain-of-thought. Labeling it reasoning would mis-route it and, under reasoning_replay=none (the default), silently drop a real assistant turn. So capture reasoning only from (a) the canonical structured fields reasoning_content / reasoning / reasoning_text (see forge/core/reasoning.py:REASONING_MESSAGE_FIELDS) or (b) <think> tags via forge.prompts.think_tags.extract_think_tags — not bare content.
- No
think constructor flag. Unlike vLLM/Ollama, this client has no think flag today. Don't add one — the downstream reasoning_replay policy (default none) already controls whether reasoning is serialized.
- Strip
<think> tags from TextResponse content (parity with the other clients).
- Attach reasoning to the first
ToolCall only; make the new _parse_tool_calls reasoning param keyword-with-default to avoid breaking positional callers.
Files
Problem
OpenAICompatClientnever captures reasoning/thinking content. UnlikeVLLMClient,OllamaClient, andLlamafileClient— which attach reasoning toToolCall.reasoning—OpenAICompatClient._parse_tool_calls(src/forge/clients/openai_compat.py:153-172) takes noreasoningparameter, and neithersend()norsend_stream()ever look at a reasoning field or<think>tags. So for any reasoning model behind an OpenAI-compatible endpoint, the entire chain-of-thought is dropped on every tool call — no REASONING message downstream, breaking reasoning replay (full/keep-last).This is the same class of bug as #110 (fixed for vLLM + Ollama in #113), but for the generic OpenAI-compat client. It was deliberately deferred from #113 because it needs more design surface than the other two clients.
Scope / design notes (decided during the #113 parity work)
contentas reasoning because they target local instruct models that narrate before a tool call.OpenAICompatClientis the deliberately provider-agnostic client (Groq/Together/OpenRouter/hosted instruct/etc.), where acontentpreamble alongside a tool call is routinely legitimate user-facing text, not chain-of-thought. Labeling itreasoningwould mis-route it and, underreasoning_replay=none(the default), silently drop a real assistant turn. So capture reasoning only from (a) the canonical structured fieldsreasoning_content/reasoning/reasoning_text(seeforge/core/reasoning.py:REASONING_MESSAGE_FIELDS) or (b)<think>tags viaforge.prompts.think_tags.extract_think_tags— not bare content.thinkconstructor flag. Unlike vLLM/Ollama, this client has nothinkflag today. Don't add one — the downstreamreasoning_replaypolicy (defaultnone) already controls whether reasoning is serialized.<think>tags fromTextResponsecontent (parity with the other clients).ToolCallonly; make the new_parse_tool_callsreasoningparam keyword-with-default to avoid breaking positional callers.Files
src/forge/clients/openai_compat.py—_parse_tool_calls(153-172),send(210),send_stream(294)src/forge/clients/vllm.py,src/forge/clients/ollama.py(post-fix(clients): capture inline <think> reasoning in vLLM + Ollama #113)src/forge/prompts/think_tags.py(shared helper)