diff --git a/CHANGELOG.md b/CHANGELOG.md index f6961eb..a490d62 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,21 @@ All notable changes to forge are documented here. +## [0.7.6] — 2026-06-20 + +A bug-fix release for the Ollama backend and inline reasoning capture. Multi-turn tool sessions and multi-part message content no longer 400 against Ollama's native API, and chain-of-thought emitted inline in `content` is now captured on vLLM and Ollama as it already was on the structured-field path. + +### Added +- **16GB-tier MoE models** in the published eval set and dashboard (gen-3 regeneration). #107 + +### Changed +- **Think-tag parsing consolidated** into one shared helper (`forge.prompts.think_tags`), de-duplicating inline-reasoning extraction across the llamafile client and prompt templates. #112 +- **Scripted test doubles consolidated** into a shared `conftest` fixture, replacing the per-module `MockClient` stand-ins. #76 (thanks @SuperMarioYL). + +### Fixed +- **Inline `` reasoning is captured on vLLM and Ollama.** When a reasoning model emits its chain-of-thought inline in `content` (``) instead of a structured reasoning field, that reasoning is now extracted onto the first tool call — matching the behavior already present for structured reasoning fields. #110 +- **Ollama's native `/api/chat` accepts OpenAI-wire message shapes.** On the proxy's native-passthrough path the client's verbatim OpenAI messages reach Ollama's stricter native endpoint. Multi-part array `content` is now flattened to text, and assistant `tool_calls[].function.arguments` sent as a JSON string are coerced to objects — fixing 400s on multi-turn tool sessions and on clients that send array-shaped content. #111, #115 + ## [0.7.5] — 2026-06-11 Reasoning replay is now a measured, bounded policy. Reasoning-capable backends return hidden reasoning alongside tool calls, and forge previously re-serialized all of it into backend-facing history on every later turn. The new `reasoning_replay` knob bounds that — and after a full re-sweep of the published eval grid showed that dropping replayed reasoning is quality-free and token-cheaper, the default is `none`. The release also re-baselines the Claude eval tier with extended thinking enabled and adds Anthropic prompt caching with cache-aware cost accounting. diff --git a/pyproject.toml b/pyproject.toml index c42aa85..01215f4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "forge-guardrails" -version = "0.7.5" +version = "0.7.6" description = "A reliability layer for self-hosted LLM tool-calling. Guardrails, context management, and backend adapters for multi-step agentic workflows." requires-python = ">=3.12" license = "MIT"