QwenLM
diff --git a/‎docs/design/prompt-suggestion/prompt-suggestion-design.md‎
Lines changed: 211 additions & 0 deletions b/‎docs/design/prompt-suggestion/prompt-suggestion-design.md‎
Lines changed: 211 additions & 0 deletions
diff --git a/‎docs/design/prompt-suggestion/prompt-suggestion-implementation.md‎
Lines changed: 85 additions & 0 deletions b/‎docs/design/prompt-suggestion/prompt-suggestion-implementation.md‎
Lines changed: 85 additions & 0 deletions
@@ -0,0 +1,211 @@
+# Prompt Suggestion (NES) Design
+
+> Predicts what the user would naturally type next after the AI completes a response, showing it as ghost text in the input prompt.
+>
+> Implementation status: `prompt-suggestion-implementation.md`. Speculation engine: `speculation-design.md`.
+
+## Overview
+
+A **prompt suggestion** (Next-step Suggestion / NES) is a short prediction (2-12 words) of the user's next input, generated by an LLM call after each AI response. It appears as ghost text in the input prompt. The user can accept it with Tab/Enter/Right Arrow or dismiss it by typing.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  AppContainer (CLI)                                         │
+│                                                             │
+│  Responding → Idle transition                               │
+│       │                                                     │
+│       ▼                                                     │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │  Guard Conditions (11 categories)                    │    │
+│  │  settings, interactive, sdk, plan mode, dialogs,    │    │
+│  │  elicitation, API error                             │    │
+│  └────────────────────┬────────────────────────────────┘    │
+│                       │                                     │
+│                       ▼                                     │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │  generatePromptSuggestion()                         │    │
+│  │                                                     │    │
+│  │  ┌─── CacheSafeParams available? ───┐               │    │
+│  │  │                                  │               │    │
+│  │  ▼ YES                         NO ▼                 │    │
+│  │  runForkedQuery()      BaseLlmClient.generateJson() │    │
+│  │  (cache-aware)         (standalone fallback)        │    │
+│  │                                                     │    │
+│  │  ──── SUGGESTION_PROMPT ────                        │    │
+│  │  ──── 12 filter rules ──────                        │    │
+│  │  ──── getFilterReason() ────                        │    │
+│  └────────────────────┬────────────────────────────────┘    │
+│                       │                                     │
+│                       ▼                                     │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │  FollowupController (framework-agnostic)            │    │
+│  │  300ms delay → show as ghost text                   │    │
+│  │                                                     │    │
+│  │  Tab    → accept (fill input)                       │    │
+│  │  Enter  → accept + submit                           │    │
+│  │  Right  → accept (fill input)                       │    │
+│  │  Type   → dismiss + abort speculation               │    │
+│  └─────────────────────────────────────────────────────┘    │
+│                                                             │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │  Telemetry (PromptSuggestionEvent)                  │    │
+│  │  outcome, accept_method, timing, similarity,        │    │
+│  │  keystroke, focus, suppression reason, prompt_id     │    │
+│  └─────────────────────────────────────────────────────┘    │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Suggestion Generation
+
+### LLM Prompt
+
+```
+[SUGGESTION MODE: Suggest what the user might naturally type next.]
+
+Your job is to predict what THEY would type - not what you think they should do.
+THE TEST: Would they think "I was just about to type that"?
+
+EXAMPLES:
+User asked "fix the bug and run tests", bug is fixed → "run the tests"
+After code written → "try it out"
+Task complete, obvious follow-up → "commit this" or "push it"
+
+Format: 2-12 words, match the user's style. Or nothing.
+Reply with ONLY the suggestion, no quotes or explanation.
+```
+
+### Filter Rules (12)
+
+| Rule               | Example blocked                                  |
+| ------------------ | ------------------------------------------------ |
+| done               | "done"                                           |
+| meta_text          | "nothing found", "no suggestion", "silence"      |
+| meta_wrapped       | "(silence)", "[no suggestion]"                   |
+| error_message      | "api error: 500"                                 |
+| prefixed_label     | "Suggestion: commit"                             |
+| too_few_words      | "hmm" (but allows "yes", "commit", "push" etc.)  |
+| too_many_words     | > 12 words                                       |
+| too_long           | >= 100 chars                                     |
+| multiple_sentences | "Run tests. Then commit."                        |
+| has_formatting     | newlines, markdown bold                          |
+| evaluative         | "looks good", "thanks" (with \b word boundaries) |
+| ai_voice           | "Let me...", "I'll...", "Here's..."              |
+
+### Guard Conditions
+
+**AppContainer useEffect (13 checks in code):**
+
+| Guard                | Check                                               |
+| -------------------- | --------------------------------------------------- |
+| Settings toggle      | `enableFollowupSuggestions`                         |
+| Non-interactive      | `config.isInteractive()`                            |
+| SDK mode             | `!config.getSdkMode()`                              |
+| Streaming transition | `Responding → Idle` (2 checks)                      |
+| API error (history)  | `historyManager.history[last]?.type !== 'error'`    |
+| API error (pending)  | `!pendingGeminiHistoryItems.some(type === 'error')` |
+| Confirmation dialogs | shell + general + loop detection (3 checks)         |
+| Permission dialog    | `isPermissionsDialogOpen`                           |
+| Elicitation          | `settingInputRequests.length === 0`                 |
+| Plan mode            | `ApprovalMode.PLAN`                                 |
+
+**Inside generatePromptSuggestion():**
+
+| Guard              | Check            |
+| ------------------ | ---------------- |
+| Early conversation | `modelTurns < 2` |
+
+**Separate feature flags (not in guard block):**
+
+| Flag                 | Controls                                                |
+| -------------------- | ------------------------------------------------------- |
+| `enableCacheSharing` | Whether to use forked query or fallback to generateJson |
+| `enableSpeculation`  | Whether to start speculation on suggestion display      |
+
+## State Management
+
+### FollowupState
+
+```typescript
+interface FollowupState {
+  suggestion: string | null;
+  isVisible: boolean;
+  shownAt: number; // timestamp for telemetry
+}
+```
+
+### FollowupController
+
+Framework-agnostic controller shared by CLI (Ink) and WebUI (React):
+
+- `setSuggestion(text)` — 300ms delayed show, null clears immediately
+- `accept(method)` — clears state, fires `onAccept` via microtask, 100ms debounce lock
+- `dismiss()` — clears state, logs `ignored` telemetry
+- `clear()` — hard reset all state + timers
+- `Object.freeze(INITIAL_FOLLOWUP_STATE)` prevents accidental mutation
+
+## Keyboard Interaction
+
+| Key         | CLI                         | WebUI                                |
+| ----------- | --------------------------- | ------------------------------------ |
+| Tab         | Fill input (no submit)      | Fill input (no submit)               |
+| Enter       | Fill + submit               | Fill + submit (`explicitText` param) |
+| Right Arrow | Fill input (no submit)      | Fill input (no submit)               |
+| Typing      | Dismiss + abort speculation | Dismiss                              |
+| Paste       | Dismiss + abort speculation | Dismiss                              |
+
+### Key Binding Note
+
+The Tab handler uses `key.name === 'tab'` explicitly (not `ACCEPT_SUGGESTION` matcher) because `ACCEPT_SUGGESTION` also matches Enter, which must fall through to the SUBMIT handler.
+
+## Telemetry
+
+### PromptSuggestionEvent
+
+| Field                      | Type                        | Description                         |
+| -------------------------- | --------------------------- | ----------------------------------- |
+| outcome                    | accepted/ignored/suppressed | Final outcome                       |
+| prompt_id                  | string                      | Default: 'user_intent'              |
+| accept_method              | tab/enter/right             | How user accepted                   |
+| time_to_accept_ms          | number                      | Time from shown to accept           |
+| time_to_ignore_ms          | number                      | Time from shown to dismiss          |
+| time_to_first_keystroke_ms | number                      | Time to first keystroke while shown |
+| suggestion_length          | number                      | Character count                     |
+| similarity                 | number                      | 1.0 for accept, 0.0 for ignore      |
+| was_focused_when_shown     | boolean                     | Terminal had focus                  |
+| reason                     | string                      | For suppressed: filter rule name    |
+
+### SpeculationEvent
+
+| Field                    | Type                    | Description               |
+| ------------------------ | ----------------------- | ------------------------- |
+| outcome                  | accepted/aborted/failed | Speculation result        |
+| turns_used               | number                  | API round-trips           |
+| files_written            | number                  | Files in overlay          |
+| tool_use_count           | number                  | Tools executed            |
+| duration_ms              | number                  | Wall-clock time           |
+| boundary_type            | string                  | What stopped speculation  |
+| had_pipelined_suggestion | boolean                 | Next suggestion generated |
+
+## Feature Flags and Settings
+
+| Setting                     | Type    | Default | Description                                                                      |
+| --------------------------- | ------- | ------- | -------------------------------------------------------------------------------- |
+| `enableFollowupSuggestions` | boolean | true    | Master toggle for prompt suggestions                                             |
+| `enableCacheSharing`        | boolean | true    | Use cache-aware forked queries                                                   |
+| `enableSpeculation`         | boolean | false   | Predictive execution engine                                                      |
+| `fastModel` (top-level)     | string  | ""      | Model for all background tasks (empty = use main model). Set via `/model --fast` |
+
+### Thinking Mode
+
+Thinking/reasoning is explicitly disabled (`thinkingConfig: { includeThoughts: false }`) for all background task paths:
+
+- **Forked query path** (`createForkedChat`) — overrides `thinkingConfig` in the cloned `generationConfig`, covering both suggestion generation and speculation
+- **BaseLlm fallback path** (`generateViaBaseLlm`) — per-request config overrides base content generator's thinking settings
+
+This is safe because:
+
+- Cache prefix is determined by systemInstruction + tools + history, not `thinkingConfig` — cache hits are unaffected
+- All backends (Gemini, OpenAI-compatible, Anthropic) handle `includeThoughts: false` by omitting the thinking field — no API errors on models without thinking support
+- Suggestion generation and speculation don't benefit from reasoning tokens
@@ -0,0 +1,85 @@
+# Prompt Suggestion Implementation Status
+
+> Tracks the implementation status of the prompt suggestion (NES) feature across all packages.
+
+## Core Module (`packages/core/src/followup/`)
+
+| Component                | Status  | Lines | Description                                                   |
+| ------------------------ | ------- | ----- | ------------------------------------------------------------- |
+| `followupState.ts`       | ✅ Done | ~230  | Framework-agnostic controller with timer/debounce             |
+| `suggestionGenerator.ts` | ✅ Done | ~260  | LLM generation + 12 filter rules + forked query support       |
+| `forkedQuery.ts`         | ✅ Done | ~240  | CacheSafeParams + createForkedChat + runForkedQuery           |
+| `overlayFs.ts`           | ✅ Done | ~140  | Copy-on-write overlay filesystem                              |
+| `speculationToolGate.ts` | ✅ Done | ~150  | Tool boundary enforcement with AST shell parser               |
+| `speculation.ts`         | ✅ Done | ~540  | Speculation engine with pipelined suggestion + model override |
+
+## CLI Integration (`packages/cli/`)
+
+| Component                    | Status  | Description                                                |
+| ---------------------------- | ------- | ---------------------------------------------------------- |
+| `AppContainer.tsx`           | ✅ Done | Suggestion generation, speculation lifecycle, UI rendering |
+| `InputPrompt.tsx`            | ✅ Done | Tab/Enter/Right Arrow acceptance, dismiss + abort          |
+| `Composer.tsx`               | ✅ Done | Props threading                                            |
+| `UIStateContext.tsx`         | ✅ Done | promptSuggestion + dismissPromptSuggestion                 |
+| `useFollowupSuggestions.tsx` | ✅ Done | React hook with telemetry + keystroke tracking             |
+| `settingsSchema.ts`          | ✅ Done | 3 feature flags + fastModel setting                        |
+| `settings.schema.json`       | ✅ Done | VSCode settings schema                                     |
+
+## WebUI Integration (`packages/webui/`)
+
+| Component                   | Status  | Description                                 |
+| --------------------------- | ------- | ------------------------------------------- |
+| `InputForm.tsx`             | ✅ Done | Tab/Enter/Right Arrow + explicitText submit |
+| `useFollowupSuggestions.ts` | ✅ Done | React hook with onOutcome support           |
+| `followup.ts`               | ✅ Done | Subpath entry                               |
+| `components.css`            | ✅ Done | Ghost text styling                          |
+| `vite.config.followup.ts`   | ✅ Done | Separate build config                       |
+
+## Telemetry (`packages/core/src/telemetry/`)
+
+| Component               | Status  | Description          |
+| ----------------------- | ------- | -------------------- |
+| `PromptSuggestionEvent` | ✅ Done | 10 fields            |
+| `SpeculationEvent`      | ✅ Done | 7 fields             |
+| `logPromptSuggestion()` | ✅ Done | OpenTelemetry logger |
+| `logSpeculation()`      | ✅ Done | OpenTelemetry logger |
+
+## Test Coverage
+
+| Test File                     | Tests | Description                                                     |
+| ----------------------------- | ----- | --------------------------------------------------------------- |
+| `followupState.test.ts`       | 14    | Controller timer, debounce, accept callback, onOutcome, clear   |
+| `suggestionGenerator.test.ts` | 16    | All 12 filter rules + edge cases + false positives              |
+| `overlayFs.test.ts`           | 15    | COW write, read resolution, apply, cleanup, path traversal      |
+| `speculationToolGate.test.ts` | 27    | Tool categories, approval mode, shell AST, path rewrite         |
+| `forkedQuery.test.ts`         | 6     | Cache params save/get/clear, deep clone, version detection      |
+| `speculation.test.ts`         | 7     | ensureToolResultPairing edge cases                              |
+| `smoke.test.ts`               | 21    | Cross-module E2E: filter + overlay + toolGate + cache + pairing |
+| `InputPrompt.test.tsx`        | 4     | Tab, Enter+submit, Right Arrow, completion guard                |
+
+## Audit History
+
+| Round           | Issues Found | Issues Fixed                                             |
+| --------------- | ------------ | -------------------------------------------------------- |
+| R1-R4           | 10           | 10 (rule engine → LLM, state simplification)             |
+| R5-R6           | 2            | 2 (Enter keybinding conflict, Right Arrow telemetry)     |
+| R7-R8           | 3            | 3 (WebUI telemetry, dead type, test coverage)            |
+| R9              | 0            | — (convergence)                                          |
+| R10-R11         | 1            | 1 (historyManager dep)                                   |
+| R12-R13         | 1            | 1 (evaluative regex word boundaries)                     |
+| Phase 1+2 R1-R4 | 20+          | 20+ (permission bypass, overlay safety, race conditions) |
+| **Total**       | **37+**      | **37+**                                                  |
+
+## Claude Code Alignment
+
+| Feature                          | Alignment | Notes                                 |
+| -------------------------------- | --------- | ------------------------------------- |
+| Prompt text                      | 100%      | Identical (brand name only)           |
+| 12 filter rules                  | 100%+     | \b word boundaries improvement        |
+| UI interaction (Tab/Enter/Right) | 100%      |                                       |
+| Guard conditions                 | 100%      | 13 checks                             |
+| Telemetry                        | 100%      | 10+7 fields                           |
+| Cache sharing                    | ✅        | DashScope cache_control               |
+| Speculation                      | ✅        | COW overlay + tool gating             |
+| Pipelined suggestion             | ✅        | Generated after speculation completes |
+| State management                 | 100%+     | Controller pattern, Object.freeze     |