chore: session bundling, example fixes, and test with prefill#50
Conversation
📝 WalkthroughWalkthroughThis PR refactors the AppleFM provider from a stateless completion API to a persistent session-based architecture. The wire protocol, Swift bridge, and Rust client are unified around session lifecycle management: create a session once with configuration, reuse it across multiple turns with per-turn requests, and let the Rust client decide whether to reuse or rebuild sessions based on instruction and conversation hashes. Example applications are updated to demonstrate interactive chat workflows. ChangesAppleFM Session-Based Architecture
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift (1)
88-96: 💤 Low valueFinish reason is hardcoded to
"stop".The non-streaming path always reports
finish: "stop", even when the response may have been truncated due tomaximumResponseTokens. The FoundationModelsrespond(to:options:)API doesn't expose a finish reason directly, so this is a known limitation. If accurate finish reasons become important for downstream logic, you may need to track whether the response hit the token limit (e.g., by comparing response length heuristics or switching to the streaming path which could infer truncation).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift` around lines 88 - 96, The finish reason is hardcoded to "stop" after calling box.session.respond; change it to detect likely truncation using request.options.maximumResponseTokens and a simple token-length heuristic on response.content (e.g., words-to-tokens or utf8/4 estimate) and set CompleteReply's finish to "length_limited" when the heuristic indicates the response likely hit the token limit, otherwise keep "stop"; update the block handling box.session.respond (and use generationOptions/request.options and CompleteReply) to compute the heuristic and choose the finish value accordingly, falling back to "stop" if no maximum is available.examples/completions/completion.rs (1)
58-71: ⚡ Quick winWrap blocking stdin read in
spawn_blocking.
std::io::stdin().read_line()is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded viaspawn_blockingto avoid blocking async worker threads.♻️ Proposed refactor to use spawn_blocking
- let mut user_input = String::new(); - print!("\nUser:\t"); - std::io::stdout().flush()?; - if std::io::stdin().read_line(&mut user_input)? == 0 { - break; // EOF (Ctrl-D) - } + let user_input = tokio::task::spawn_blocking(|| { + let mut input = String::new(); + print!("\nUser:\t"); + std::io::stdout().flush().ok(); + std::io::stdin().read_line(&mut input).map(|_| input) + }) + .await??; + if user_input.is_empty() { + break; // EOF (Ctrl-D) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/completions/completion.rs` around lines 58 - 71, The loop currently calls the blocking std::io::stdin().read_line(&mut user_input) directly, which will block the Tokio runtime; wrap the blocking read in tokio::task::spawn_blocking and await its result instead (e.g., call spawn_blocking to perform the read_line and return the read length or error), then handle EOF and errors from the spawned task and continue using the trimmed user_input as before; update the code around the loop and references to std::io::stdin().read_line so the blocking operation is executed inside spawn_blocking and its result is awaited on the async context.examples/applefm/completion.rs (1)
59-63: ⚡ Quick winWrap blocking stdin read in
spawn_blocking.
std::io::stdin().read_line()is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded viaspawn_blockingto avoid blocking async worker threads.♻️ Proposed refactor to use spawn_blocking
- let mut user_input = String::new(); - print!("\nUser:\t"); - std::io::stdout().flush()?; - if std::io::stdin().read_line(&mut user_input)? == 0 { - break; // EOF (Ctrl-D) - } + let user_input = tokio::task::spawn_blocking(|| { + let mut input = String::new(); + print!("\nUser:\t"); + std::io::stdout().flush().ok(); + std::io::stdin().read_line(&mut input).map(|_| input) + }) + .await??; + if user_input.is_empty() { + break; // EOF (Ctrl-D) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/applefm/completion.rs` around lines 59 - 63, The synchronous call to std::io::stdin().read_line(&mut user_input) should be moved into a Tokio blocking task so it doesn't block the async runtime; replace the direct read_line call in the loop (the print!/stdout().flush()/read_line/EOF check block) with tokio::task::spawn_blocking(|| std::io::stdin().read_line(&mut user_input)).await, unwrap/propagate the spawn result and the inner io::Result, then keep the existing EOF check (== 0) and break accordingly; ensure you await the spawned task and handle errors from both the JoinHandle and the io::Result so the function's ? error propagation still works.examples/applefm/stream.rs (1)
59-71: ⚡ Quick winWrap blocking stdin read in
spawn_blocking.
std::io::stdin().read_line()is synchronous blocking I/O. On a Tokio runtime, blocking calls should be offloaded viaspawn_blockingto avoid blocking async worker threads.♻️ Proposed refactor to use spawn_blocking
- let mut user_input = String::new(); - print!("\nUser:\t"); - std::io::stdout().flush()?; - if std::io::stdin().read_line(&mut user_input)? == 0 { - break; // EOF (Ctrl-D) - } + let user_input = tokio::task::spawn_blocking(|| { + let mut input = String::new(); + print!("\nUser:\t"); + std::io::stdout().flush().ok(); + std::io::stdin().read_line(&mut input).map(|_| input) + }) + .await??; + if user_input.is_empty() { + break; // EOF (Ctrl-D) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/applefm/stream.rs` around lines 59 - 71, The loop uses synchronous std::io::stdin().read_line(&mut user_input) (and std::io::stdout().flush()) which will block the Tokio runtime; wrap the blocking prompt/flush and read_line inside tokio::task::spawn_blocking and await its result instead of calling read_line directly. Concretely, move creation of user_input and the print/flush + stdin.read_line into a spawn_blocking closure (capturing/mutating the String), await the JoinHandle, handle any I/O error returned, then trim/use the returned string as before (referencing the user_input variable, read_line, and tokio::task::spawn_blocking).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/applefm/completion.rs`:
- Line 56: The loop in completion.rs currently calls
std::io::stdin().read_line(&mut user_input) directly inside the #[tokio::main]
runtime which will block the reactor; keep calling prewarmer.prewarm() as-is
(it's non-blocking) but move the blocking read_line off the async runtime by
executing it inside tokio::task::spawn_blocking (or swap to an async stdin
alternative). Locate the REPL loop where prewarmer.prewarm() and read_line are
used, wrap the read_line call in spawn_blocking and await its result, and ensure
the rest of the loop continues on the Tokio task (preserve user_input handling
after spawn_blocking completes).
---
Nitpick comments:
In `@examples/applefm/completion.rs`:
- Around line 59-63: The synchronous call to std::io::stdin().read_line(&mut
user_input) should be moved into a Tokio blocking task so it doesn't block the
async runtime; replace the direct read_line call in the loop (the
print!/stdout().flush()/read_line/EOF check block) with
tokio::task::spawn_blocking(|| std::io::stdin().read_line(&mut
user_input)).await, unwrap/propagate the spawn result and the inner io::Result,
then keep the existing EOF check (== 0) and break accordingly; ensure you await
the spawned task and handle errors from both the JoinHandle and the io::Result
so the function's ? error propagation still works.
In `@examples/applefm/stream.rs`:
- Around line 59-71: The loop uses synchronous std::io::stdin().read_line(&mut
user_input) (and std::io::stdout().flush()) which will block the Tokio runtime;
wrap the blocking prompt/flush and read_line inside tokio::task::spawn_blocking
and await its result instead of calling read_line directly. Concretely, move
creation of user_input and the print/flush + stdin.read_line into a
spawn_blocking closure (capturing/mutating the String), await the JoinHandle,
handle any I/O error returned, then trim/use the returned string as before
(referencing the user_input variable, read_line, and
tokio::task::spawn_blocking).
In `@examples/completions/completion.rs`:
- Around line 58-71: The loop currently calls the blocking
std::io::stdin().read_line(&mut user_input) directly, which will block the Tokio
runtime; wrap the blocking read in tokio::task::spawn_blocking and await its
result instead (e.g., call spawn_blocking to perform the read_line and return
the read length or error), then handle EOF and errors from the spawned task and
continue using the trimmed user_input as before; update the code around the loop
and references to std::io::stdin().read_line so the blocking operation is
executed inside spawn_blocking and its result is awaited on the async context.
In `@providers/applefm/bridge/Sources/AppleFMBridge/Complete.swift`:
- Around line 88-96: The finish reason is hardcoded to "stop" after calling
box.session.respond; change it to detect likely truncation using
request.options.maximumResponseTokens and a simple token-length heuristic on
response.content (e.g., words-to-tokens or utf8/4 estimate) and set
CompleteReply's finish to "length_limited" when the heuristic indicates the
response likely hit the token limit, otherwise keep "stop"; update the block
handling box.session.respond (and use generationOptions/request.options and
CompleteReply) to compute the heuristic and choose the finish value accordingly,
falling back to "stop" if no maximum is available.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c54be08-98b1-4734-be09-f5f5d4865eaf
📒 Files selected for processing (17)
examples/applefm/completion.rsexamples/applefm/stream.rsexamples/completions/completion.rsproviders/applefm/bridge/Sources/AppleFMBridge/Complete.swiftproviders/applefm/bridge/Sources/AppleFMBridge/Exports.swiftproviders/applefm/bridge/Sources/AppleFMBridge/Session.swiftproviders/applefm/bridge/Sources/AppleFMBridge/SessionStore.swiftproviders/applefm/bridge/Sources/AppleFMBridge/Stream.swiftproviders/applefm/bridge/Sources/AppleFMBridge/WireTypes.swiftproviders/applefm/src/api/completion.rsproviders/applefm/src/api/stream.rsproviders/applefm/src/api/types/mod.rsproviders/applefm/src/api/types/request.rsproviders/applefm/src/api/types/response.rsproviders/applefm/src/builder.rsproviders/applefm/src/client.rsproviders/applefm/src/ffi.rs
| loop { | ||
| // Stage the model while the user types, so the turn that follows | ||
| // the pause doesn't pay warm-up. | ||
| prewarmer.prewarm(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the prewarm() implementation to determine if it's blocking
rg -nP -A10 'pub fn prewarm' --type=rustRepository: EggerMarc/chat-rs
Length of output: 2308
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect the FFI prewarm wrapper + extern declaration
rg -n "afm_prewarm|prewarm\\(" providers/applefm/src/ffi.rs -n -S -g'*.rs' -C 3
# Show more context around the real prewarm wrapper and stub
sed -n '90,190p' providers/applefm/src/ffi.rs
# 2) Inspect the example REPL loop around the call site
sed -n '1,120p' examples/applefm/completion.rs
# 3) Inspect the Swift-bridge gating and whether prewarm is available on non-bridge builds
rg -n "applefm_bridge|APPLEAN?FM_SKIP_BRIDGE|cfg\\(applefm_bridge\\)" providers/applefm/src/ffi.rs -S
# 4) Inspect the client prewarm method (where the FFI call happens)
sed -n '130,210p' providers/applefm/src/client.rsRepository: EggerMarc/chat-rs
Length of output: 9567
Avoid blocking Tokio runtime in REPL loop
prewarmer.prewarm()isn’t blocking:providers/applefm/src/ffi.rs::prewarmis documented “Returns immediately; the bridge detaches the actual work”, and the non-bridge stub is a no-op.std::io::stdin().read_line(...)blocks the Tokio runtime thread; wrap it intokio::task::spawn_blockingor use async stdin.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/applefm/completion.rs` at line 56, The loop in completion.rs
currently calls std::io::stdin().read_line(&mut user_input) directly inside the
#[tokio::main] runtime which will block the reactor; keep calling
prewarmer.prewarm() as-is (it's non-blocking) but move the blocking read_line
off the async runtime by executing it inside tokio::task::spawn_blocking (or
swap to an async stdin alternative). Locate the REPL loop where
prewarmer.prewarm() and read_line are used, wrap the read_line call in
spawn_blocking and await its result, and ensure the rest of the loop continues
on the Tokio task (preserve user_input handling after spawn_blocking completes).
Summary by CodeRabbit
Release Notes
New Features