Skip to content

fix(ui/channels): avoid UTF-8 char-boundary panics in text truncation#1

Closed
NiuBlibing wants to merge 439 commits into
masterfrom
fix/utf8-char-boundary-truncation
Closed

fix(ui/channels): avoid UTF-8 char-boundary panics in text truncation#1
NiuBlibing wants to merge 439 commits into
masterfrom
fix/utf8-char-boundary-truncation

Conversation

@NiuBlibing

Copy link
Copy Markdown
Owner

Problem

dashboard::truncate sliced &first_line[..max] at a raw byte index. When max landed inside a multi-byte character it panicked:

panicked at apps/zerocode/src/dashboard.rs:2082:37:
end byte index 40 is not a char boundary; it is inside '列' (bytes 39..42)
of `用户询问桌面文件列表,助手列出了桌面上的文件夹和文件,包括名称和大小。`

A repo-wide sweep for the same unguarded byte-slice pattern found two more sites with the identical bug (all other slice sites are already guarded via is_char_boundary / char_indices().nth() / find() offsets):

  • linkedin: &text[..200] when building the image-generation prompt
  • bluesky: &message.content[..297] — additionally compared byte .len() against what the comment documents as a 300-character (grapheme) limit, so multi-byte content both mis-triggered and panicked

Fix

All three now use chars().count() / chars().take(n), which is char-boundary safe and matches the intended character-based semantics.

Notes / scope

  • LinkedIn: the 200 is not a platform limit — it only trims text inside an internal image prompt. Character-based is the natural reading.
  • Bluesky: the canonical limit is 300 graphemes (+3000 bytes) per the atproto app.bsky.feed.post lexicon. chars() (codepoints) is panic-safe and exact for ASCII/CJK, but slightly conservative for ZWJ/skin-tone emoji. Fully exact grapheme truncation would require adding unicode-segmentation (not currently a dependency); left out of this panic fix.

Testing

cargo check -p zeroclaw-tools -p zeroclaw-channels -p zerocode passes.

Yyukan and others added 30 commits May 10, 2026 09:56
…es (zeroclaw-labs#6534)

- e3103c7 fix(sop): call reload() after SopEngine construction at both call sites
- 032a26d test(sop): add reload contract regression tests
- 157afdd docs(sop): clarify that sops_dir is required for runtime SOP loading
- 730eca8 fix(channels): scope session key for channel tools
- 28a59e3 fix(channels): avoid ci-all tool instruction import lint
Native tool-capable providers already receive the tool catalog through provider-native tool specs, so the system prompt should not duplicate that catalog in prose.

Thread the native-spec decision through PromptContext and keep XML/delegate prompt paths on the existing textual tool section.

Tests cover both the modular Agent prompt path and the legacy prompt builder.

Related zeroclaw-labs#6074

Co-authored-by: smallwhite <12741016+whtiehack@users.noreply.github.com>
…claw-labs#6533)

default_config_dir() now checks ZEROCLAW_CONFIG_DIR before falling back
to ~/.zeroclaw, so all seven path-field defaults (knowledge.db_path,
workspace.workspaces_dir, plugins.dir, project_intel.report_dir,
security_ops.estop_state_file, playbooks_dir, report_output_dir) point
into the active profile when a custom config dir is set.
…s#6539)

Route ACP and web dashboard direct agents through a back-channel approval mode so bare shell calls cannot bypass runtime approval by setting approved=true in tool arguments.

Keep runtime-owned approved arguments aligned with approval policy for shell and cron-style command tools, including prior Always decisions and auto-approved tools.
…law-labs#6546)

Treat empty effective tool sets as a no-tools turn across prompt assembly, provider request shape, and parser execution.

Preserve reasoning-tag stripping while avoiding execution of tool-like output when no tools are available.

Add focused regressions for native request shape, XML text preservation, prompt scaffolding, and channel protocol prompt behavior.
- aa38b8b fix(channels,deps): bump matrix-sdk 0.16 → 0.17
- 53808a3 fix(channels): raise zeroclaw-channels recursion_limit to 256 for cha…
- d18ecf9 fix(install): default to thin LTO on Linux hosts under 12 GiB RAM
…abs#6114)

- 60d1562 fix(provider): strip media markers in auxiliary chat_with_system calls
- 5ab377b fix(provider): also strip [PHOTO:] markers in auxiliary calls
- 0dd59f4 fix(runtime/context): reconcile strip-markers helper with zeroclaw-labs#6189 vision contract
Wrap OpenRouter structured chat/history system messages in a single text content block with `cache_control: {"type": "ephemeral"}` so cache-aware upstream models can receive prompt-cache breakpoints through OpenRouter.

Map non-streaming OpenRouter `usage.prompt_tokens_details.cached_tokens` into `TokenUsage.cached_input_tokens` when present, while preserving absent, empty, and zero detail handling for providers that do not report cached-token usage.

Keep user, assistant, tool, and multimodal user message shapes unchanged outside the new system-message cache marker. The live PR discussion verified the new system-message array form against `openai/gpt-4o` through OpenRouter with matching prompt-token billing to the plain-string control.

Notes:
- One-shot `chat_with_system` still uses the older system-message shape.
- Streaming sends the cached request shape but does not surface cached-token usage through `StreamEvent`.

Related zeroclaw-labs#3977
Related zeroclaw-labs#5440
…5254)

- ffc6217 fix(provider): sanitize llama.cpp gemma4 tool schemas
- 9dd5b89 fix(provider): sanitize empty llama.cpp model schemas
…abs#6513)

- 1727307 feat(providers): add atomic-chat as local provider option
- ef8d9f3 Merge branch 'master' into feat/atomic-chat-add
- b3b5189 del ProviderInfo from Cloud AI endpoints
- b7b0c0f Merge branch 'master' into feat/atomic-chat-add
- 39ace0c fix: cron table
- b3316f0 Merge branch 'master' into fix/cron-table-6504
- f86f5ee fix: css classes
- 4db1105 Merge branch 'master' into fix/cron-table-6504
- 7760baa fix: cron table
- bcd950e Merge branch 'master' into fix/cron-table-6504
…eps (zeroclaw-labs#6570)

- Update all image references from Docker Hub (zeroclawlabs/zeroclaw) to GitHub Container Registry (ghcr.io/zeroclaw-labs/zeroclaw).
- Add the missing `zeroclaw onboard` step to the Compose section.
- Add a new "Re-authenticating after logout" section explaining how to regenerate a paircode with `zeroclaw gateway get-paircode --new`.

Closes zeroclaw-labs#6393
…w-labs#6567)

The v0.7.x workspace split moved most module implementations from
src/** to crates/zeroclaw-*/src/**, but labeler.yml still only globs
src/**. PRs that only touch crate files receive no area label.

Add corresponding crates/zeroclaw-*/src/** globs alongside every
existing src/** glob. Legacy src/** patterns are preserved so any
remaining shim code still matches.

Closes zeroclaw-labs#6359
…eroclaw-labs#6568)

Two `build_channel_by_id` telegram tests run unconditionally but the
corresponding dispatch arm is `#[cfg(feature = "channel-telegram")]`-gated.
Since `channel-telegram` is not in the default feature set, these tests
always hit the "Unknown channel" path and fail.

Add `#[cfg(feature = "channel-telegram")]` to both tests, matching the
existing pattern used by the voice-call tests in the same module.

Closes zeroclaw-labs#6347
…rgs (zeroclaw-labs#6569)

rust-analyzer's clippy check already passes --all-targets by default.
Including it in extraArgs causes the argument to be duplicated, which
makes cargo clippy fail with:

  error: the argument '--all-targets' cannot be used multiple times

Remove --all-targets from extraArgs so only -- -D warnings remains.

Closes zeroclaw-labs#5687
Recover the updater asset-selection behavior from zeroclaw-labs#4337 against current master. Exact-match installable release archives for the supported target, skip unusable download URLs, and fail closed for unsupported targets.

Co-authored-by: rareba <rareba@users.noreply.github.com>
Recover zeroclaw-labs#4573 by preserving Gemini usageMetadata through Provider::chat(). Gemini already parsed usageMetadata in send_generate_content(), but the structured chat path used the trait default and returned usage: None. Route Gemini chat through a usage-preserving helper, keep prompt-guided tool instructions, and preserve wrapped OAuth usage metadata.

Supersedes:

- zeroclaw-labs#4573 by @SpectreMercury

Integrated scope:

- Gemini provider: structured chat returns parsed token usage from zeroclaw-labs#4573

Co-authored-by: ERROR404 <11926244+SpectreMercury@users.noreply.github.com>
Detect DuckDuckGo 403 responses and verification /wr.do? flows before result parsing so automated block pages surface actionable provider guidance instead of generic failures or empty results.

Add request-path coverage for blocked statuses, verification redirects, verification form HTML, and normal empty-result handling.
zeroclaw-labs#6183)

- f4d4b08 fix(multimodal): normalize image markers across agent and tool history
- 2d887db review: address PR zeroclaw-labs#6183 feedback (truncation marker preservation, ch…
- 4569b2f fix(providers/multimodal): preserve native tool-result JSON during im…
…_mode=partial (zeroclaw-labs#6588)

- ca1ea64 Merge branch 'master' into fix/6415-tts-stream-mode-partial
- b0cdae7 fix(channels): extract TTS voice reply into shared helper for stream_mode=partial
…s#6573)

- 07d3e91 fix(providers/glm): mark GLM provider as vision-capable
- 2d6a749 Merge branch 'master' into fix/glm-supports-vision
)

- 5f2714e fix(channels): close Discord media send/receive gaps
- c77c302 fix(channels): do not cache thread-lookup failures
- fba1929 fix(channels): surface dropped Discord markers; require absolute paths
- 690cef2 fix(channels): upload file in Discord MultiMessage when paragraph collapses to marker-only
- 9a9da08 fix(channels): admit attachment-only Discord messages and bound thread lookup
…fix + docs fallback, Jordan trapdoor for features (zeroclaw-labs#6554)

- c00e5d6 docs(skills/pr-review): refine Phase 3.5 milestone alignment with bre…
- 1ae30df docs(skills/pr-review): fix scope-compare step wording; add Other typ…
…eroclaw-labs#6562)

- f4270da feat(nix): add multi-instance NixOS module + test
- 171e1fb feat(nix): tighten systemd hardening (DeviceAllow, MemoryDenyWriteExecute, RemoveIPC)
- 49fc6c9 feat(nix): add PrivateUsers=true to harden the unit's user namespace
- 39e0556 fix(nix): drop ExecStartPre chown — unit already owns its dataDir
- 5d68436 refactor(nix): drop extraServiceConfig escape hatch
- 9f190c7 style(nix): apply nixfmt-rfc-style
- 9945432 fix(nix): drop unused `config` arg from instanceModule
- 502190f style(nix): drop unused `config` arg from test machine signature
- c58a8a6 docs(nix): drop stale extraServiceConfig reference from README
- c8d60f2 fix(nix): expand $VAR placeholders + accept arbitrary dataDir paths
- a89e8d9 docs(nix): add missing `config` arg to README quick-start lambda
- f82c3a6 docs(nix): clean up stale StateDirectory prose
Audacity88 and others added 12 commits June 2, 2026 23:34
…st (zeroclaw-labs#7046)

- 4e65163 feat(hardware): add dev-sim feature with /tmp/zc-sim-* serial allowlist
- 4c82b36 address review suggestion: clarify dev-sim usage with hardware featur…
…eroclaw-labs#7023)

- 4a0a3ed feat(docs): implement versioned documentation deployment and version selector
- 7364dd2 feat(docs): enhance version sorting and validation in deployment workflow
- 9615c2d fix(docs): replace hardcoded "master" with DEFAULT_TAG in build process
- a7edd5a fix(docs): added a second checkout step
- bae35ce feat(docs): implement versioned documentation deployment and shared chrome extraction
- 8f2a846 refactor(docs): format scripts
- eeed5be refactor(build): simplify conditional checks in extract_shared_chrome function
- df2242a feat(docs): migrate documentation scripts to Rust and update deployment workflow
`dashboard::truncate` sliced `&first_line[..max]` at a raw byte index,
panicking when `max` landed inside a multi-byte character (e.g. CJK
session summaries). Switch to char-based truncation.

Two other sites had the same unguarded byte-slice bug:
- linkedin: `&text[..200]` when building the image-generation prompt
- bluesky: `&message.content[..297]`, which also compared byte `.len()`
  against what the comment documents as a 300-character (grapheme) limit

All three now count and take `chars()`, which is char-boundary safe and
matches the intended character-based semantics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.