Skip to content

chore: session bundling, example fixes, and test with prefill#50

Merged
EggerMarc merged 1 commit into
mainfrom
chore/applefm-optimizations
Jun 7, 2026
Merged

chore: session bundling, example fixes, and test with prefill#50
EggerMarc merged 1 commit into
mainfrom
chore/applefm-optimizations

Conversation

@EggerMarc

@EggerMarc EggerMarc commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

Release Notes

New Features

  • Updated examples to interactive multi-turn chat mode with persistent conversation history
  • Session-based conversational flow for improved context management across turns
  • Prewarm capability to optimize response latency on macOS
  • Optional LoRA adapter support for model customization

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors the AppleFM provider from a stateless completion API to a persistent session-based architecture. The wire protocol, Swift bridge, and Rust client are unified around session lifecycle management: create a session once with configuration, reuse it across multiple turns with per-turn requests, and let the Rust client decide whether to reuse or rebuild sessions based on instruction and conversation hashes. Example applications are updated to demonstrate interactive chat workflows.

Changes

AppleFM Session-Based Architecture

Layer / File(s) Summary
Session-based wire protocol types
providers/applefm/bridge/Sources/AppleFMBridge/WireTypes.swift, providers/applefm/src/api/types/mod.rs
Wire protocol refactored from stateless CompleteRequest/WireMessage to SessionConfig (session initialization with optional instructions and LoRA), TurnRequest (per-turn message plus optional options), and SessionCreated (session id response).
Swift session storage and factory
providers/applefm/bridge/Sources/AppleFMBridge/SessionStore.swift, providers/applefm/bridge/Sources/AppleFMBridge/Session.swift
SessionStore singleton provides locked in-memory session dictionary with handle generation, SessionBox wrapper enables async task safety, and makeSession refactored to build sessions from SessionConfig instead of requests.
Swift bridge session endpoints and exports
providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift, providers/applefm/bridge/Sources/AppleFMBridge/Exports.swift, providers/applefm/bridge/Sources/AppleFMBridge/Stream.swift
Exports C API for session lifecycle (afm_session_create, afm_session_respond, afm_session_respond_stream, afm_session_free, afm_prewarm) with null-input guards; replaces prior stateless afm_complete endpoint; implements streaming callback with per-event JSON emission.
Rust request/response helpers
providers/applefm/src/api/types/request.rs, providers/applefm/src/api/types/response.rs, providers/applefm/src/builder.rs
Introduces ConvoEntry and prepare() to flatten messages into instructions and conversation list; adds TurnPlan enum and hashing helpers for reuse vs rebuild decisions; provides render_full(), session_config_json(), and turn_request_json() serializers; implements parse_session_created() for session reply parsing; initializes client session state in builder.
Rust client session state management
providers/applefm/src/client.rs
Implements Session wrapper with SessionHandle/SessionState for hash-based reuse/rebuild tracking; adds public prewarm() for OS resource staging and enrich_metadata() to populate response metadata with duration, creation timestamp, prefill reuse indicator, and LoRA info; includes unit tests for session plan transitions.
Rust FFI layer refactoring
providers/applefm/src/ffi.rs
Replaces stateless complete_json/stream_json exports with session-based session_create, session_respond, session_stream, session_free, and prewarm wrappers; updates stub module to return "unavailable" equivalents; introduces JSON error helpers for null-pointer and NUL-byte failures.
Completion API implementation
providers/applefm/src/api/completion.rs
Refactors CompletionProvider::complete to lock session and decide reuse/rebuild via TurnPlan; rebuilds session config and calls session_create on mismatch; responds via session_respond; parses reply and advances session on success; enriches metadata with reuse indicator.
Streaming API implementation
providers/applefm/src/api/stream.rs
Refactors stream() to prepare instructions/conversation, lock session, decide reuse/rebuild via session_create, construct turn request, stream WireStreamEvent via background thread, emit TextChunk for deltas, advance session on Done, and invalidate on error.
Interactive example applications
examples/applefm/completion.rs, examples/applefm/stream.rs, examples/completions/completion.rs
Updates all examples from one-shot transcript-correction flows to interactive REPL-based chat: reads stdin until exit/EOF, maintains persistent message history, supports optional APPLEFM_LORA environment variable, calls complete/stream per turn, prewarms before input, and prints metadata including prefill reuse indicator and first-token latency.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Swift and Rust dance the session waltz,
Where once lived stateless single calls,
Now persistent handles trace each turn,
Reuse and rebuild plans discern,
Chat flows in loops, no longer stalling—
Sessions bloom and answers calling! 🌿✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'session bundling, example fixes, and test with prefill' covers the three main aspects of this large refactoring: the session-based architecture changes, the example file updates, and the metadata/prefill improvements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/applefm-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift (1)

88-96: 💤 Low value

Finish reason is hardcoded to "stop".

The non-streaming path always reports finish: "stop", even when the response may have been truncated due to maximumResponseTokens. The FoundationModels respond(to:options:) API doesn't expose a finish reason directly, so this is a known limitation. If accurate finish reasons become important for downstream logic, you may need to track whether the response hit the token limit (e.g., by comparing response length heuristics or switching to the streaming path which could infer truncation).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift` around lines
88 - 96, The finish reason is hardcoded to "stop" after calling
box.session.respond; change it to detect likely truncation using
request.options.maximumResponseTokens and a simple token-length heuristic on
response.content (e.g., words-to-tokens or utf8/4 estimate) and set
CompleteReply's finish to "length_limited" when the heuristic indicates the
response likely hit the token limit, otherwise keep "stop"; update the block
handling box.session.respond (and use generationOptions/request.options and
CompleteReply) to compute the heuristic and choose the finish value accordingly,
falling back to "stop" if no maximum is available.
examples/completions/completion.rs (1)

58-71: ⚡ Quick win

Wrap blocking stdin read in spawn_blocking.

std::io::stdin().read_line() is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded via spawn_blocking to avoid blocking async worker threads.

♻️ Proposed refactor to use spawn_blocking
-        let mut user_input = String::new();
-        print!("\nUser:\t");
-        std::io::stdout().flush()?;
-        if std::io::stdin().read_line(&mut user_input)? == 0 {
-            break; // EOF (Ctrl-D)
-        }
+        let user_input = tokio::task::spawn_blocking(|| {
+            let mut input = String::new();
+            print!("\nUser:\t");
+            std::io::stdout().flush().ok();
+            std::io::stdin().read_line(&mut input).map(|_| input)
+        })
+        .await??;
+        if user_input.is_empty() {
+            break; // EOF (Ctrl-D)
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/completions/completion.rs` around lines 58 - 71, The loop currently
calls the blocking std::io::stdin().read_line(&mut user_input) directly, which
will block the Tokio runtime; wrap the blocking read in
tokio::task::spawn_blocking and await its result instead (e.g., call
spawn_blocking to perform the read_line and return the read length or error),
then handle EOF and errors from the spawned task and continue using the trimmed
user_input as before; update the code around the loop and references to
std::io::stdin().read_line so the blocking operation is executed inside
spawn_blocking and its result is awaited on the async context.
examples/applefm/completion.rs (1)

59-63: ⚡ Quick win

Wrap blocking stdin read in spawn_blocking.

std::io::stdin().read_line() is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded via spawn_blocking to avoid blocking async worker threads.

♻️ Proposed refactor to use spawn_blocking
-        let mut user_input = String::new();
-        print!("\nUser:\t");
-        std::io::stdout().flush()?;
-        if std::io::stdin().read_line(&mut user_input)? == 0 {
-            break; // EOF (Ctrl-D)
-        }
+        let user_input = tokio::task::spawn_blocking(|| {
+            let mut input = String::new();
+            print!("\nUser:\t");
+            std::io::stdout().flush().ok();
+            std::io::stdin().read_line(&mut input).map(|_| input)
+        })
+        .await??;
+        if user_input.is_empty() {
+            break; // EOF (Ctrl-D)
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/applefm/completion.rs` around lines 59 - 63, The synchronous call to
std::io::stdin().read_line(&mut user_input) should be moved into a Tokio
blocking task so it doesn't block the async runtime; replace the direct
read_line call in the loop (the print!/stdout().flush()/read_line/EOF check
block) with tokio::task::spawn_blocking(|| std::io::stdin().read_line(&mut
user_input)).await, unwrap/propagate the spawn result and the inner io::Result,
then keep the existing EOF check (== 0) and break accordingly; ensure you await
the spawned task and handle errors from both the JoinHandle and the io::Result
so the function's ? error propagation still works.
examples/applefm/stream.rs (1)

59-71: ⚡ Quick win

Wrap blocking stdin read in spawn_blocking.

std::io::stdin().read_line() is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded via spawn_blocking to avoid blocking async worker threads.

♻️ Proposed refactor to use spawn_blocking
-        let mut user_input = String::new();
-        print!("\nUser:\t");
-        std::io::stdout().flush()?;
-        if std::io::stdin().read_line(&mut user_input)? == 0 {
-            break; // EOF (Ctrl-D)
-        }
+        let user_input = tokio::task::spawn_blocking(|| {
+            let mut input = String::new();
+            print!("\nUser:\t");
+            std::io::stdout().flush().ok();
+            std::io::stdin().read_line(&mut input).map(|_| input)
+        })
+        .await??;
+        if user_input.is_empty() {
+            break; // EOF (Ctrl-D)
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/applefm/stream.rs` around lines 59 - 71, The loop uses synchronous
std::io::stdin().read_line(&mut user_input) (and std::io::stdout().flush())
which will block the Tokio runtime; wrap the blocking prompt/flush and read_line
inside tokio::task::spawn_blocking and await its result instead of calling
read_line directly. Concretely, move creation of user_input and the print/flush
+ stdin.read_line into a spawn_blocking closure (capturing/mutating the String),
await the JoinHandle, handle any I/O error returned, then trim/use the returned
string as before (referencing the user_input variable, read_line, and
tokio::task::spawn_blocking).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/applefm/completion.rs`:
- Line 56: The loop in completion.rs currently calls
std::io::stdin().read_line(&mut user_input) directly inside the #[tokio::main]
runtime which will block the reactor; keep calling prewarmer.prewarm() as-is
(it's non-blocking) but move the blocking read_line off the async runtime by
executing it inside tokio::task::spawn_blocking (or swap to an async stdin
alternative). Locate the REPL loop where prewarmer.prewarm() and read_line are
used, wrap the read_line call in spawn_blocking and await its result, and ensure
the rest of the loop continues on the Tokio task (preserve user_input handling
after spawn_blocking completes).

---

Nitpick comments:
In `@examples/applefm/completion.rs`:
- Around line 59-63: The synchronous call to std::io::stdin().read_line(&mut
user_input) should be moved into a Tokio blocking task so it doesn't block the
async runtime; replace the direct read_line call in the loop (the
print!/stdout().flush()/read_line/EOF check block) with
tokio::task::spawn_blocking(|| std::io::stdin().read_line(&mut
user_input)).await, unwrap/propagate the spawn result and the inner io::Result,
then keep the existing EOF check (== 0) and break accordingly; ensure you await
the spawned task and handle errors from both the JoinHandle and the io::Result
so the function's ? error propagation still works.

In `@examples/applefm/stream.rs`:
- Around line 59-71: The loop uses synchronous std::io::stdin().read_line(&mut
user_input) (and std::io::stdout().flush()) which will block the Tokio runtime;
wrap the blocking prompt/flush and read_line inside tokio::task::spawn_blocking
and await its result instead of calling read_line directly. Concretely, move
creation of user_input and the print/flush + stdin.read_line into a
spawn_blocking closure (capturing/mutating the String), await the JoinHandle,
handle any I/O error returned, then trim/use the returned string as before
(referencing the user_input variable, read_line, and
tokio::task::spawn_blocking).

In `@examples/completions/completion.rs`:
- Around line 58-71: The loop currently calls the blocking
std::io::stdin().read_line(&mut user_input) directly, which will block the Tokio
runtime; wrap the blocking read in tokio::task::spawn_blocking and await its
result instead (e.g., call spawn_blocking to perform the read_line and return
the read length or error), then handle EOF and errors from the spawned task and
continue using the trimmed user_input as before; update the code around the loop
and references to std::io::stdin().read_line so the blocking operation is
executed inside spawn_blocking and its result is awaited on the async context.

In `@providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift`:
- Around line 88-96: The finish reason is hardcoded to "stop" after calling
box.session.respond; change it to detect likely truncation using
request.options.maximumResponseTokens and a simple token-length heuristic on
response.content (e.g., words-to-tokens or utf8/4 estimate) and set
CompleteReply's finish to "length_limited" when the heuristic indicates the
response likely hit the token limit, otherwise keep "stop"; update the block
handling box.session.respond (and use generationOptions/request.options and
CompleteReply) to compute the heuristic and choose the finish value accordingly,
falling back to "stop" if no maximum is available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c54be08-98b1-4734-be09-f5f5d4865eaf

📥 Commits

Reviewing files that changed from the base of the PR and between c25e82a and 113fc20.

📒 Files selected for processing (17)
  • examples/applefm/completion.rs
  • examples/applefm/stream.rs
  • examples/completions/completion.rs
  • providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift
  • providers/applefm/bridge/Sources/AppleFMBridge/Exports.swift
  • providers/applefm/bridge/Sources/AppleFMBridge/Session.swift
  • providers/applefm/bridge/Sources/AppleFMBridge/SessionStore.swift
  • providers/applefm/bridge/Sources/AppleFMBridge/Stream.swift
  • providers/applefm/bridge/Sources/AppleFMBridge/WireTypes.swift
  • providers/applefm/src/api/completion.rs
  • providers/applefm/src/api/stream.rs
  • providers/applefm/src/api/types/mod.rs
  • providers/applefm/src/api/types/request.rs
  • providers/applefm/src/api/types/response.rs
  • providers/applefm/src/builder.rs
  • providers/applefm/src/client.rs
  • providers/applefm/src/ffi.rs

loop {
// Stage the model while the user types, so the turn that follows
// the pause doesn't pay warm-up.
prewarmer.prewarm();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the prewarm() implementation to determine if it's blocking
rg -nP -A10 'pub fn prewarm' --type=rust

Repository: EggerMarc/chat-rs

Length of output: 2308


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the FFI prewarm wrapper + extern declaration
rg -n "afm_prewarm|prewarm\\(" providers/applefm/src/ffi.rs -n -S -g'*.rs' -C 3

# Show more context around the real prewarm wrapper and stub
sed -n '90,190p' providers/applefm/src/ffi.rs

# 2) Inspect the example REPL loop around the call site
sed -n '1,120p' examples/applefm/completion.rs

# 3) Inspect the Swift-bridge gating and whether prewarm is available on non-bridge builds
rg -n "applefm_bridge|APPLEAN?FM_SKIP_BRIDGE|cfg\\(applefm_bridge\\)" providers/applefm/src/ffi.rs -S

# 4) Inspect the client prewarm method (where the FFI call happens)
sed -n '130,210p' providers/applefm/src/client.rs

Repository: EggerMarc/chat-rs

Length of output: 9567


Avoid blocking Tokio runtime in REPL loop

  • prewarmer.prewarm() isn’t blocking: providers/applefm/src/ffi.rs::prewarm is documented “Returns immediately; the bridge detaches the actual work”, and the non-bridge stub is a no-op.
  • std::io::stdin().read_line(...) blocks the Tokio runtime thread; wrap it in tokio::task::spawn_blocking or use async stdin.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/applefm/completion.rs` at line 56, The loop in completion.rs
currently calls std::io::stdin().read_line(&mut user_input) directly inside the
#[tokio::main] runtime which will block the reactor; keep calling
prewarmer.prewarm() as-is (it's non-blocking) but move the blocking read_line
off the async runtime by executing it inside tokio::task::spawn_blocking (or
swap to an async stdin alternative). Locate the REPL loop where
prewarmer.prewarm() and read_line are used, wrap the read_line call in
spawn_blocking and await its result, and ensure the rest of the loop continues
on the Tokio task (preserve user_input handling after spawn_blocking completes).

@EggerMarc EggerMarc merged commit 691d0b8 into main Jun 7, 2026
4 checks passed
@EggerMarc EggerMarc deleted the chore/applefm-optimizations branch June 10, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant