You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Migrate the bookkeeping agent off the prompt-engineered tool-call envelope onto stoa's new native function-calling + response_format: json_schema protocol delivered by flarexio/stoa#46. Eliminate the "two JSON shapes" instruction block from the prompt and let find_accounts flow through the provider's native tools / tool_calls channel.
Background
Today the bookkeeping agent reaches the OpenAI adapter through stoa's ReasoningResult[Intent] envelope (agent/agent.go:85, agent/prompt.go:122-129). The prompt asks the model to "return JSON in ONE of these two shapes": a tool_calls envelope or an intent envelope. The adapter sets response_format: json_object (lenient mode) and parses whatever string the model emits.
Observed failure mode with GPT-5.4 mini: the model emits the same JSON object twice in one turn ({...}{...}), or fills both intent and tool_calls, or fills neither. JSONDecoder.Decode then rejects the payload and the run aborts. Larger models hide the same weakness behind better instruction-following; the underlying schema and protocol are the root cause.
flarexio/stoa#46 fixes this at the source: ReasoningResult[TIntent] is replaced by a discriminated ReasoningOutput[TIntent] that carries either Intent or ToolCalls (never both); the OpenAI adapter switches to json_schema strict mode for intents and registers tools through params.Tools. The breaking change lands in stoa as a v1.x → v2.0 bump.
This issue tracks the downstream cutover in accounting once that stoa release is available.
Replace agent.accountTools (map[string]loop.ToolHandler) with the new []loop.Tool shape, declaring find_accounts's args JSON schema alongside its handler.
Provide the Intent JSON schema to the OpenAI adapter (hand-written json.RawMessage per the stoa#46 v1 plan), kept next to the Intent type in bookkeeping/intent.go so it cannot drift from the Go shape.
Strip the "two shapes" instruction block from agent/prompt.go — drop toolCallJSONShape, intentEnvelopeShape, and the "Return JSON with this exact shape" text. The prompt should describe what to do, not how to format the answer.
Simplify bookkeeperSystemPrompt accordingly (no more "Output JSON only").
Update agent/tools.go and any tests that construct tool registrations.
Out of scope:
Changing the Intent discriminated-union shape or any domain validator behaviour.
Anthropic or other-provider adapters — track separately once stoa adds them.
Scripted-engine tests covering the old envelope; update them to the new contract but do not extend coverage.
Renderer changes beyond removing the format-shape text.
find_accounts is registered through params.Tools end-to-end; resp.Choices[0].Message.ToolCalls is the path that drives tool execution.
post_journal, reverse_journal, reject are emitted as JSON validated by an Intent JSON schema with strict: true.
The prompt no longer contains JSON shape examples for the model's output; intent payload skeletons (postJournalArgsShape etc.) may stay if they document the domain, but not as response-format instructions.
The integration test suite under agent/integration_test.go passes against GPT-5.4 mini without the "two JSON objects" failure mode (run manually; CI may still gate only on scripted engine).
agent/prompt.go no longer contains toolCallJSONShape or the "Return JSON with this exact shape" instruction text.
find_accounts is registered via the new loop.Tool mechanism and is invoked through OpenAI native tool-calling against a live model in manual verification.
go test ./... passes.
A manual run of the bookkeeping TUI against gpt-5.4-mini completes the canonical "現金銷售商品 / 含稅" scenario from issue discussion without parse errors.
PR description records which manual scenarios were run and the model used.
Verification
Automated:
go test ./...
Manual (record results in the PR):
Run the TUI against gpt-5.4-mini with a representative non-trivial post (e.g. taxable sales with multi-line debits/credits).
Run the same scenario against a larger model (e.g. gpt-5.4) to confirm no regression.
Trigger a deliberate validation failure (e.g. closed period) and confirm the model's reject intent flows through the new schema.
Expected Output
When complete, the worker should:
Create a branch.
Commit changes.
Push the branch.
Open exactly one PR linked to this issue for review.
Do not merge the PR.
Comment with the PR URL, summary, tests run, models verified against, and any remaining risks.
Dependencies
Blocked by flarexio/stoa#46. Do not start until the corresponding stoa release is tagged.
Goal
Migrate the bookkeeping agent off the prompt-engineered tool-call envelope onto stoa's new native function-calling +
response_format: json_schemaprotocol delivered by flarexio/stoa#46. Eliminate the "two JSON shapes" instruction block from the prompt and letfind_accountsflow through the provider's nativetools/tool_callschannel.Background
Today the bookkeeping agent reaches the OpenAI adapter through stoa's
ReasoningResult[Intent]envelope (agent/agent.go:85,agent/prompt.go:122-129). The prompt asks the model to "return JSON in ONE of these two shapes": atool_callsenvelope or anintentenvelope. The adapter setsresponse_format: json_object(lenient mode) and parses whatever string the model emits.Observed failure mode with GPT-5.4 mini: the model emits the same JSON object twice in one turn (
{...}{...}), or fills bothintentandtool_calls, or fills neither.JSONDecoder.Decodethen rejects the payload and the run aborts. Larger models hide the same weakness behind better instruction-following; the underlying schema and protocol are the root cause.flarexio/stoa#46 fixes this at the source:
ReasoningResult[TIntent]is replaced by a discriminatedReasoningOutput[TIntent]that carries eitherIntentorToolCalls(never both); the OpenAI adapter switches tojson_schemastrict mode for intents and registers tools throughparams.Tools. The breaking change lands in stoa as av1.x → v2.0bump.This issue tracks the downstream cutover in accounting once that stoa release is available.
Scope
Allowed changes:
github.com/flarexio/stoato the version that ships Move tool calls and intent output to provider-native structured outputs (revisits #29 deferral) stoa#46.agent.accountTools(map[string]loop.ToolHandler) with the new[]loop.Toolshape, declaringfind_accounts's args JSON schema alongside its handler.IntentJSON schema to the OpenAI adapter (hand-writtenjson.RawMessageper the stoa#46 v1 plan), kept next to theIntenttype inbookkeeping/intent.goso it cannot drift from the Go shape.agent/prompt.go— droptoolCallJSONShape,intentEnvelopeShape, and the "Return JSON with this exact shape" text. The prompt should describe what to do, not how to format the answer.bookkeeperSystemPromptaccordingly (no more "Output JSON only").agent/tools.goand any tests that construct tool registrations.Out of scope:
Intentdiscriminated-union shape or any domain validator behaviour.Requirements
find_accountsis registered throughparams.Toolsend-to-end;resp.Choices[0].Message.ToolCallsis the path that drives tool execution.post_journal,reverse_journal,rejectare emitted as JSON validated by anIntentJSON schema withstrict: true.postJournalArgsShapeetc.) may stay if they document the domain, but not as response-format instructions.agent/integration_test.gopasses against GPT-5.4 mini without the "two JSON objects" failure mode (run manually; CI may still gate only on scripted engine).Acceptance Criteria
go.modreferences the stoa release containing Move tool calls and intent output to provider-native structured outputs (revisits #29 deferral) stoa#46.agent/prompt.gono longer containstoolCallJSONShapeor the "Return JSON with this exact shape" instruction text.find_accountsis registered via the newloop.Toolmechanism and is invoked through OpenAI native tool-calling against a live model in manual verification.go test ./...passes.gpt-5.4-minicompletes the canonical "現金銷售商品 / 含稅" scenario from issue discussion without parse errors.Verification
Automated:
go test ./...Manual (record results in the PR):
gpt-5.4-miniwith a representative non-trivial post (e.g. taxable sales with multi-line debits/credits).gpt-5.4) to confirm no regression.rejectintent flows through the new schema.Expected Output
When complete, the worker should:
Dependencies
Blocked by flarexio/stoa#46. Do not start until the corresponding stoa release is tagged.