-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Feature: Browser Action Recording & Intelligent Replay #285
Description
Summary
Proposing a behavior recording and intelligent replay feature for Page Agent Extension. This allows users to:
- Record browser interactions (clicks, typing, navigation, tab management, etc.) as structured step sequences
- Replay recorded workflows with LLM-guided adaptive execution — the agent reads the recorded plan + observes the actual page state → smart execution that adapts to UI changes
- Parameterize recorded values (e.g., search keywords, form inputs) for reuse with different data
- Modify workflows via natural language instructions before replay
Motivation
Currently, Page Agent requires users to describe tasks from scratch every time via natural language. For repetitive workflows (e.g., "search Bilibili for a keyword and play the first video", "fill out a daily report form"), this is inefficient.
A recording capability lets users demonstrate once, replay many times — with the added intelligence of LLM-guided execution that adapts to page changes (unlike brittle DOM-index-based replay).
Design Highlights
Recording Format (~10 tokens/step for LLM consumption)
- Semantic element descriptors: text > ariaLabel > role > placeholder > selector (multi-signal, not fragile DOM indices)
- Compact LLM format:
[1] click "搜索" (input role="search") @bilibili.com - Parameterization:
[PARAM:searchKeyword]markers on editable values
Replay Architecture (Zero changes to PageAgentCore)
- Recording → compact plan injected via
config.systemInstruction - LLM reads plan + observes actual page → adaptive execution
- Handles UI changes, dynamic content, timing differences automatically
- Parameter overrides + natural language modification before replay
Recording Engine
- Content script captures: click, input (debounced), select, scroll, keypress, navigation
- Background script aggregates multi-tab events (tab create/activate/close/navigate)
- Shadow DOM, contenteditable, file upload support
- Service Worker restart recovery via
chrome.storage.session
UI Components
- ⏺ Record button in header — start/stop recording
- 🎬 Recording list — browse, delete, replay saved recordings
- Recording detail — edit name/description, parameter editor, NL modification input, step viewer, JSON export, replay trigger
- Recording indicator — live step count during recording
- i18n — Chinese (zh-CN) and English (en-US) support
Files Changed
| Category | Files | Description |
|---|---|---|
| Types & Storage | recording-types.ts, db.ts |
Core types, IndexedDB v2 with recordings store |
| Recording Engine | EventRecorder.content.ts, EventRecorder.background.ts |
Content script event capture + background aggregation |
| Recording Hook | useRecording.ts |
React hook for recording state management |
| Replay Engine | RecordingReplayAgent.ts, replay_prompt.md |
Recording → task+systemInstruction conversion |
| UI Components | RecordingIndicator.tsx, RecordingStepCard.tsx, RecordingList.tsx, RecordingDetail.tsx, ParamEditor.tsx |
Full recording/replay UI |
| Integration | App.tsx, content.ts, background.ts, wxt.config.js |
Routing, initialization, permissions |
| i18n | i18n.ts |
Lightweight zh-CN/en-US translation system |
Total: 12 new files, 5 modified files, +2483 lines
Example: Recording a Bilibili Video Search
Recording: "Search and play video on Bilibili" (5 steps, ~150 tokens)
[1] click search input (placeholder="搜索") @bilibili.com
[2] type "TypeScript教程" → search input [PARAM:searchKeyword]
[3] press Enter → search input
[4] click "TypeScript 从零开始完整教程" (link) in search results
[5] click "播放" button (aria-label="播放") in video player controls
Users can then replay with different searchKeyword values, or add NL modifications like "skip step 5, I don't need to click play".
Related PR
A complete implementation is provided in the accompanying PR.