chore: session bundling, example fixes, and test with prefill by EggerMarc · Pull Request #50 · EggerMarc/chat-rs

EggerMarc · 2026-06-07T20:43:35Z

Summary by CodeRabbit

Release Notes

New Features

Updated examples to interactive multi-turn chat mode with persistent conversation history
Session-based conversational flow for improved context management across turns
Prewarm capability to optimize response latency on macOS
Optional LoRA adapter support for model customization

coderabbitai · 2026-06-07T20:43:45Z

📝 Walkthrough

Walkthrough

This PR refactors the AppleFM provider from a stateless completion API to a persistent session-based architecture. The wire protocol, Swift bridge, and Rust client are unified around session lifecycle management: create a session once with configuration, reuse it across multiple turns with per-turn requests, and let the Rust client decide whether to reuse or rebuild sessions based on instruction and conversation hashes. Example applications are updated to demonstrate interactive chat workflows.

Changes

AppleFM Session-Based Architecture

Layer / File(s)	Summary
Session-based wire protocol types `providers/applefm/bridge/Sources/AppleFMBridge/WireTypes.swift`, `providers/applefm/src/api/types/mod.rs`	Wire protocol refactored from stateless `CompleteRequest`/`WireMessage` to `SessionConfig` (session initialization with optional instructions and LoRA), `TurnRequest` (per-turn message plus optional options), and `SessionCreated` (session id response).
Swift session storage and factory `providers/applefm/bridge/Sources/AppleFMBridge/SessionStore.swift`, `providers/applefm/bridge/Sources/AppleFMBridge/Session.swift`	`SessionStore` singleton provides locked in-memory session dictionary with handle generation, `SessionBox` wrapper enables async task safety, and `makeSession` refactored to build sessions from `SessionConfig` instead of requests.
Swift bridge session endpoints and exports `providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift`, `providers/applefm/bridge/Sources/AppleFMBridge/Exports.swift`, `providers/applefm/bridge/Sources/AppleFMBridge/Stream.swift`	Exports C API for session lifecycle (`afm_session_create`, `afm_session_respond`, `afm_session_respond_stream`, `afm_session_free`, `afm_prewarm`) with null-input guards; replaces prior stateless `afm_complete` endpoint; implements streaming callback with per-event JSON emission.
Rust request/response helpers `providers/applefm/src/api/types/request.rs`, `providers/applefm/src/api/types/response.rs`, `providers/applefm/src/builder.rs`	Introduces `ConvoEntry` and `prepare()` to flatten messages into instructions and conversation list; adds `TurnPlan` enum and hashing helpers for reuse vs rebuild decisions; provides `render_full()`, `session_config_json()`, and `turn_request_json()` serializers; implements `parse_session_created()` for session reply parsing; initializes client session state in builder.
Rust client session state management `providers/applefm/src/client.rs`	Implements `Session` wrapper with `SessionHandle`/`SessionState` for hash-based reuse/rebuild tracking; adds public `prewarm()` for OS resource staging and `enrich_metadata()` to populate response metadata with duration, creation timestamp, prefill reuse indicator, and LoRA info; includes unit tests for session plan transitions.
Rust FFI layer refactoring `providers/applefm/src/ffi.rs`	Replaces stateless `complete_json`/`stream_json` exports with session-based `session_create`, `session_respond`, `session_stream`, `session_free`, and `prewarm` wrappers; updates stub module to return "unavailable" equivalents; introduces JSON error helpers for null-pointer and NUL-byte failures.
Completion API implementation `providers/applefm/src/api/completion.rs`	Refactors `CompletionProvider::complete` to lock session and decide reuse/rebuild via `TurnPlan`; rebuilds session config and calls `session_create` on mismatch; responds via `session_respond`; parses reply and advances session on success; enriches metadata with reuse indicator.
Streaming API implementation `providers/applefm/src/api/stream.rs`	Refactors `stream()` to prepare instructions/conversation, lock session, decide reuse/rebuild via `session_create`, construct turn request, stream `WireStreamEvent` via background thread, emit `TextChunk` for deltas, advance session on `Done`, and invalidate on error.
Interactive example applications `examples/applefm/completion.rs`, `examples/applefm/stream.rs`, `examples/completions/completion.rs`	Updates all examples from one-shot transcript-correction flows to interactive REPL-based chat: reads stdin until `exit`/EOF, maintains persistent message history, supports optional `APPLEFM_LORA` environment variable, calls `complete`/`stream` per turn, prewarms before input, and prints metadata including `prefill` reuse indicator and first-token latency.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Swift and Rust dance the session waltz,
Where once lived stateless single calls,
Now persistent handles trace each turn,
Reuse and rebuild plans discern,
Chat flows in loops, no longer stalling—
Sessions bloom and answers calling! 🌿✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'session bundling, example fixes, and test with prefill' covers the three main aspects of this large refactoring: the session-based architecture changes, the example file updates, and the metadata/prefill improvements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/applefm-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift (1)

88-96: 💤 Low value

Finish reason is hardcoded to "stop".

The non-streaming path always reports finish: "stop", even when the response may have been truncated due to maximumResponseTokens. The FoundationModels respond(to:options:) API doesn't expose a finish reason directly, so this is a known limitation. If accurate finish reasons become important for downstream logic, you may need to track whether the response hit the token limit (e.g., by comparing response length heuristics or switching to the streaming path which could infer truncation).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift` around lines
88 - 96, The finish reason is hardcoded to "stop" after calling
box.session.respond; change it to detect likely truncation using
request.options.maximumResponseTokens and a simple token-length heuristic on
response.content (e.g., words-to-tokens or utf8/4 estimate) and set
CompleteReply's finish to "length_limited" when the heuristic indicates the
response likely hit the token limit, otherwise keep "stop"; update the block
handling box.session.respond (and use generationOptions/request.options and
CompleteReply) to compute the heuristic and choose the finish value accordingly,
falling back to "stop" if no maximum is available.

examples/completions/completion.rs (1)

58-71: ⚡ Quick win

Wrap blocking stdin read in spawn_blocking.

std::io::stdin().read_line() is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded via spawn_blocking to avoid blocking async worker threads.

♻️ Proposed refactor to use spawn_blocking

-        let mut user_input = String::new();
-        print!("\nUser:\t");
-        std::io::stdout().flush()?;
-        if std::io::stdin().read_line(&mut user_input)? == 0 {
-            break; // EOF (Ctrl-D)
-        }
+        let user_input = tokio::task::spawn_blocking(|| {
+            let mut input = String::new();
+            print!("\nUser:\t");
+            std::io::stdout().flush().ok();
+            std::io::stdin().read_line(&mut input).map(|_| input)
+        })
+        .await??;
+        if user_input.is_empty() {
+            break; // EOF (Ctrl-D)
+        }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/completions/completion.rs` around lines 58 - 71, The loop currently
calls the blocking std::io::stdin().read_line(&mut user_input) directly, which
will block the Tokio runtime; wrap the blocking read in
tokio::task::spawn_blocking and await its result instead (e.g., call
spawn_blocking to perform the read_line and return the read length or error),
then handle EOF and errors from the spawned task and continue using the trimmed
user_input as before; update the code around the loop and references to
std::io::stdin().read_line so the blocking operation is executed inside
spawn_blocking and its result is awaited on the async context.

examples/applefm/completion.rs (1)

59-63: ⚡ Quick win

Wrap blocking stdin read in spawn_blocking.

std::io::stdin().read_line() is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded via spawn_blocking to avoid blocking async worker threads.

♻️ Proposed refactor to use spawn_blocking

-        let mut user_input = String::new();
-        print!("\nUser:\t");
-        std::io::stdout().flush()?;
-        if std::io::stdin().read_line(&mut user_input)? == 0 {
-            break; // EOF (Ctrl-D)
-        }
+        let user_input = tokio::task::spawn_blocking(|| {
+            let mut input = String::new();
+            print!("\nUser:\t");
+            std::io::stdout().flush().ok();
+            std::io::stdin().read_line(&mut input).map(|_| input)
+        })
+        .await??;
+        if user_input.is_empty() {
+            break; // EOF (Ctrl-D)
+        }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/applefm/completion.rs` around lines 59 - 63, The synchronous call to
std::io::stdin().read_line(&mut user_input) should be moved into a Tokio
blocking task so it doesn't block the async runtime; replace the direct
read_line call in the loop (the print!/stdout().flush()/read_line/EOF check
block) with tokio::task::spawn_blocking(|| std::io::stdin().read_line(&mut
user_input)).await, unwrap/propagate the spawn result and the inner io::Result,
then keep the existing EOF check (== 0) and break accordingly; ensure you await
the spawned task and handle errors from both the JoinHandle and the io::Result
so the function's ? error propagation still works.

examples/applefm/stream.rs (1)

59-71: ⚡ Quick win

Wrap blocking stdin read in spawn_blocking.

std::io::stdin().read_line() is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded via spawn_blocking to avoid blocking async worker threads.

♻️ Proposed refactor to use spawn_blocking

-        let mut user_input = String::new();
-        print!("\nUser:\t");
-        std::io::stdout().flush()?;
-        if std::io::stdin().read_line(&mut user_input)? == 0 {
-            break; // EOF (Ctrl-D)
-        }
+        let user_input = tokio::task::spawn_blocking(|| {
+            let mut input = String::new();
+            print!("\nUser:\t");
+            std::io::stdout().flush().ok();
+            std::io::stdin().read_line(&mut input).map(|_| input)
+        })
+        .await??;
+        if user_input.is_empty() {
+            break; // EOF (Ctrl-D)
+        }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/applefm/stream.rs` around lines 59 - 71, The loop uses synchronous
std::io::stdin().read_line(&mut user_input) (and std::io::stdout().flush())
which will block the Tokio runtime; wrap the blocking prompt/flush and read_line
inside tokio::task::spawn_blocking and await its result instead of calling
read_line directly. Concretely, move creation of user_input and the print/flush
+ stdin.read_line into a spawn_blocking closure (capturing/mutating the String),
await the JoinHandle, handle any I/O error returned, then trim/use the returned
string as before (referencing the user_input variable, read_line, and
tokio::task::spawn_blocking).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/applefm/completion.rs`:
- Line 56: The loop in completion.rs currently calls
std::io::stdin().read_line(&mut user_input) directly inside the #[tokio::main]
runtime which will block the reactor; keep calling prewarmer.prewarm() as-is
(it's non-blocking) but move the blocking read_line off the async runtime by
executing it inside tokio::task::spawn_blocking (or swap to an async stdin
alternative). Locate the REPL loop where prewarmer.prewarm() and read_line are
used, wrap the read_line call in spawn_blocking and await its result, and ensure
the rest of the loop continues on the Tokio task (preserve user_input handling
after spawn_blocking completes).

---

Nitpick comments:
In `@examples/applefm/completion.rs`:
- Around line 59-63: The synchronous call to std::io::stdin().read_line(&mut
user_input) should be moved into a Tokio blocking task so it doesn't block the
async runtime; replace the direct read_line call in the loop (the
print!/stdout().flush()/read_line/EOF check block) with
tokio::task::spawn_blocking(|| std::io::stdin().read_line(&mut
user_input)).await, unwrap/propagate the spawn result and the inner io::Result,
then keep the existing EOF check (== 0) and break accordingly; ensure you await
the spawned task and handle errors from both the JoinHandle and the io::Result
so the function's ? error propagation still works.

In `@examples/applefm/stream.rs`:
- Around line 59-71: The loop uses synchronous std::io::stdin().read_line(&mut
user_input) (and std::io::stdout().flush()) which will block the Tokio runtime;
wrap the blocking prompt/flush and read_line inside tokio::task::spawn_blocking
and await its result instead of calling read_line directly. Concretely, move
creation of user_input and the print/flush + stdin.read_line into a
spawn_blocking closure (capturing/mutating the String), await the JoinHandle,
handle any I/O error returned, then trim/use the returned string as before
(referencing the user_input variable, read_line, and
tokio::task::spawn_blocking).

In `@examples/completions/completion.rs`:
- Around line 58-71: The loop currently calls the blocking
std::io::stdin().read_line(&mut user_input) directly, which will block the Tokio
runtime; wrap the blocking read in tokio::task::spawn_blocking and await its
result instead (e.g., call spawn_blocking to perform the read_line and return
the read length or error), then handle EOF and errors from the spawned task and
continue using the trimmed user_input as before; update the code around the loop
and references to std::io::stdin().read_line so the blocking operation is
executed inside spawn_blocking and its result is awaited on the async context.

In `@providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift`:
- Around line 88-96: The finish reason is hardcoded to "stop" after calling
box.session.respond; change it to detect likely truncation using
request.options.maximumResponseTokens and a simple token-length heuristic on
response.content (e.g., words-to-tokens or utf8/4 estimate) and set
CompleteReply's finish to "length_limited" when the heuristic indicates the
response likely hit the token limit, otherwise keep "stop"; update the block
handling box.session.respond (and use generationOptions/request.options and
CompleteReply) to compute the heuristic and choose the finish value accordingly,
falling back to "stop" if no maximum is available.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c54be08-98b1-4734-be09-f5f5d4865eaf

📥 Commits

Reviewing files that changed from the base of the PR and between c25e82a and 113fc20.

📒 Files selected for processing (17)

examples/applefm/completion.rs
examples/applefm/stream.rs
examples/completions/completion.rs
providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift
providers/applefm/bridge/Sources/AppleFMBridge/Exports.swift
providers/applefm/bridge/Sources/AppleFMBridge/Session.swift
providers/applefm/bridge/Sources/AppleFMBridge/SessionStore.swift
providers/applefm/bridge/Sources/AppleFMBridge/Stream.swift
providers/applefm/bridge/Sources/AppleFMBridge/WireTypes.swift
providers/applefm/src/api/completion.rs
providers/applefm/src/api/stream.rs
providers/applefm/src/api/types/mod.rs
providers/applefm/src/api/types/request.rs
providers/applefm/src/api/types/response.rs
providers/applefm/src/builder.rs
providers/applefm/src/client.rs
providers/applefm/src/ffi.rs

coderabbitai · 2026-06-07T20:50:44Z

+    loop {
+        // Stage the model while the user types, so the turn that follows
+        // the pause doesn't pay warm-up.
+        prewarmer.prewarm();


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the prewarm() implementation to determine if it's blocking rg -nP -A10 'pub fn prewarm' --type=rust

Repository: EggerMarc/chat-rs

Length of output: 2308

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect the FFI prewarm wrapper + extern declaration rg -n "afm_prewarm|prewarm\\(" providers/applefm/src/ffi.rs -n -S -g'*.rs' -C 3 # Show more context around the real prewarm wrapper and stub sed -n '90,190p' providers/applefm/src/ffi.rs # 2) Inspect the example REPL loop around the call site sed -n '1,120p' examples/applefm/completion.rs # 3) Inspect the Swift-bridge gating and whether prewarm is available on non-bridge builds rg -n "applefm_bridge|APPLEAN?FM_SKIP_BRIDGE|cfg\\(applefm_bridge\\)" providers/applefm/src/ffi.rs -S # 4) Inspect the client prewarm method (where the FFI call happens) sed -n '130,210p' providers/applefm/src/client.rs

Repository: EggerMarc/chat-rs

Length of output: 9567

Avoid blocking Tokio runtime in REPL loop

prewarmer.prewarm() isn’t blocking: providers/applefm/src/ffi.rs::prewarm is documented “Returns immediately; the bridge detaches the actual work”, and the non-bridge stub is a no-op.

std::io::stdin().read_line(...) blocks the Tokio runtime thread; wrap it in tokio::task::spawn_blocking or use async stdin.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/applefm/completion.rs` at line 56, The loop in completion.rs currently calls std::io::stdin().read_line(&mut user_input) directly inside the #[tokio::main] runtime which will block the reactor; keep calling prewarmer.prewarm() as-is (it's non-blocking) but move the blocking read_line off the async runtime by executing it inside tokio::task::spawn_blocking (or swap to an async stdin alternative). Locate the REPL loop where prewarmer.prewarm() and read_line are used, wrap the read_line call in spawn_blocking and await its result, and ensure the rest of the loop continues on the Tokio task (preserve user_input handling after spawn_blocking completes).

chore: session bundling, example fixes, and test with prefill

113fc20

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

EggerMarc merged commit 691d0b8 into main Jun 7, 2026
4 checks passed

EggerMarc deleted the chore/applefm-optimizations branch June 10, 2026 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: session bundling, example fixes, and test with prefill#50

chore: session bundling, example fixes, and test with prefill#50
EggerMarc merged 1 commit into
mainfrom
chore/applefm-optimizations

EggerMarc commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EggerMarc commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EggerMarc commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading