Cut over to stoa native tool calling and structured outputs (follow-up to flarexio/stoa#46)

## Goal

Migrate the bookkeeping agent off the prompt-engineered tool-call envelope onto stoa's new native function-calling + `response_format: json_schema` protocol delivered by flarexio/stoa#46. Eliminate the "two JSON shapes" instruction block from the prompt and let `find_accounts` flow through the provider's native `tools` / `tool_calls` channel.

## Background

Today the bookkeeping agent reaches the OpenAI adapter through stoa's `ReasoningResult[Intent]` envelope (`agent/agent.go:85`, `agent/prompt.go:122-129`). The prompt asks the model to "return JSON in ONE of these two shapes": a `tool_calls` envelope or an `intent` envelope. The adapter sets `response_format: json_object` (lenient mode) and parses whatever string the model emits.

Observed failure mode with GPT-5.4 mini: the model emits the same JSON object twice in one turn (`{...}{...}`), or fills both `intent` and `tool_calls`, or fills neither. `JSONDecoder.Decode` then rejects the payload and the run aborts. Larger models hide the same weakness behind better instruction-following; the underlying schema and protocol are the root cause.

flarexio/stoa#46 fixes this at the source: `ReasoningResult[TIntent]` is replaced by a discriminated `ReasoningOutput[TIntent]` that carries either `Intent` or `ToolCalls` (never both); the OpenAI adapter switches to `json_schema` strict mode for intents and registers tools through `params.Tools`. The breaking change lands in stoa as a `v1.x → v2.0` bump.

This issue tracks the downstream cutover in accounting once that stoa release is available.

## Scope

Allowed changes:

- Bump `github.com/flarexio/stoa` to the version that ships flarexio/stoa#46.
- Replace `agent.accountTools` (`map[string]loop.ToolHandler`) with the new `[]loop.Tool` shape, declaring `find_accounts`'s args JSON schema alongside its handler.
- Provide the `Intent` JSON schema to the OpenAI adapter (hand-written `json.RawMessage` per the stoa#46 v1 plan), kept next to the `Intent` type in `bookkeeping/intent.go` so it cannot drift from the Go shape.
- Strip the "two shapes" instruction block from `agent/prompt.go` — drop `toolCallJSONShape`, `intentEnvelopeShape`, and the "Return JSON with this exact shape" text. The prompt should describe *what* to do, not *how to format the answer*.
- Simplify `bookkeeperSystemPrompt` accordingly (no more "Output JSON only").
- Update `agent/tools.go` and any tests that construct tool registrations.

Out of scope:

- Changing the `Intent` discriminated-union shape or any domain validator behaviour.
- Anthropic or other-provider adapters — track separately once stoa adds them.
- Scripted-engine tests covering the old envelope; update them to the new contract but do not extend coverage.
- Renderer changes beyond removing the format-shape text.

## Requirements

- Bump the stoa dependency to the release containing flarexio/stoa#46.
- `find_accounts` is registered through `params.Tools` end-to-end; `resp.Choices[0].Message.ToolCalls` is the path that drives tool execution.
- `post_journal`, `reverse_journal`, `reject` are emitted as JSON validated by an `Intent` JSON schema with `strict: true`.
- The prompt no longer contains JSON shape examples for the model's *output*; intent payload skeletons (`postJournalArgsShape` etc.) may stay if they document the domain, but not as response-format instructions.
- The integration test suite under `agent/integration_test.go` passes against GPT-5.4 mini without the "two JSON objects" failure mode (run manually; CI may still gate only on scripted engine).
- No regression in scripted-engine tests.

## Acceptance Criteria

- [ ] `go.mod` references the stoa release containing flarexio/stoa#46.
- [ ] `agent/prompt.go` no longer contains `toolCallJSONShape` or the "Return JSON with this exact shape" instruction text.
- [ ] `find_accounts` is registered via the new `loop.Tool` mechanism and is invoked through OpenAI native tool-calling against a live model in manual verification.
- [ ] `go test ./...` passes.
- [ ] A manual run of the bookkeeping TUI against `gpt-5.4-mini` completes the canonical "現金銷售商品 / 含稅" scenario from issue discussion without parse errors.
- [ ] PR description records which manual scenarios were run and the model used.

## Verification

Automated:

```bash
go test ./...
```

Manual (record results in the PR):

1. Run the TUI against `gpt-5.4-mini` with a representative non-trivial post (e.g. taxable sales with multi-line debits/credits).
2. Run the same scenario against a larger model (e.g. `gpt-5.4`) to confirm no regression.
3. Trigger a deliberate validation failure (e.g. closed period) and confirm the model's `reject` intent flows through the new schema.

## Expected Output

When complete, the worker should:

1. Create a branch.
2. Commit changes.
3. Push the branch.
4. Open exactly one PR linked to this issue for review.
5. Do not merge the PR.
6. Comment with the PR URL, summary, tests run, models verified against, and any remaining risks.

## Dependencies

Blocked by flarexio/stoa#46. Do not start until the corresponding stoa release is tagged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cut over to stoa native tool calling and structured outputs (follow-up to flarexio/stoa#46) #6

Goal

Background

Scope

Requirements

Acceptance Criteria

Verification

Expected Output

Dependencies

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cut over to stoa native tool calling and structured outputs (follow-up to flarexio/stoa#46) #6

Description

Goal

Background

Scope

Requirements

Acceptance Criteria

Verification

Expected Output

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions