Skip to content

Feature: Browser Action Recording & Intelligent Replay #285

@mango766

Description

@mango766

Summary

Proposing a behavior recording and intelligent replay feature for Page Agent Extension. This allows users to:

  1. Record browser interactions (clicks, typing, navigation, tab management, etc.) as structured step sequences
  2. Replay recorded workflows with LLM-guided adaptive execution — the agent reads the recorded plan + observes the actual page state → smart execution that adapts to UI changes
  3. Parameterize recorded values (e.g., search keywords, form inputs) for reuse with different data
  4. Modify workflows via natural language instructions before replay

Motivation

Currently, Page Agent requires users to describe tasks from scratch every time via natural language. For repetitive workflows (e.g., "search Bilibili for a keyword and play the first video", "fill out a daily report form"), this is inefficient.

A recording capability lets users demonstrate once, replay many times — with the added intelligence of LLM-guided execution that adapts to page changes (unlike brittle DOM-index-based replay).

Design Highlights

Recording Format (~10 tokens/step for LLM consumption)

  • Semantic element descriptors: text > ariaLabel > role > placeholder > selector (multi-signal, not fragile DOM indices)
  • Compact LLM format: [1] click "搜索" (input role="search") @bilibili.com
  • Parameterization: [PARAM:searchKeyword] markers on editable values

Replay Architecture (Zero changes to PageAgentCore)

  • Recording → compact plan injected via config.systemInstruction
  • LLM reads plan + observes actual page → adaptive execution
  • Handles UI changes, dynamic content, timing differences automatically
  • Parameter overrides + natural language modification before replay

Recording Engine

  • Content script captures: click, input (debounced), select, scroll, keypress, navigation
  • Background script aggregates multi-tab events (tab create/activate/close/navigate)
  • Shadow DOM, contenteditable, file upload support
  • Service Worker restart recovery via chrome.storage.session

UI Components

  • Record button in header — start/stop recording
  • 🎬 Recording list — browse, delete, replay saved recordings
  • Recording detail — edit name/description, parameter editor, NL modification input, step viewer, JSON export, replay trigger
  • Recording indicator — live step count during recording
  • i18n — Chinese (zh-CN) and English (en-US) support

Files Changed

Category Files Description
Types & Storage recording-types.ts, db.ts Core types, IndexedDB v2 with recordings store
Recording Engine EventRecorder.content.ts, EventRecorder.background.ts Content script event capture + background aggregation
Recording Hook useRecording.ts React hook for recording state management
Replay Engine RecordingReplayAgent.ts, replay_prompt.md Recording → task+systemInstruction conversion
UI Components RecordingIndicator.tsx, RecordingStepCard.tsx, RecordingList.tsx, RecordingDetail.tsx, ParamEditor.tsx Full recording/replay UI
Integration App.tsx, content.ts, background.ts, wxt.config.js Routing, initialization, permissions
i18n i18n.ts Lightweight zh-CN/en-US translation system

Total: 12 new files, 5 modified files, +2483 lines

Example: Recording a Bilibili Video Search

Recording: "Search and play video on Bilibili" (5 steps, ~150 tokens)

[1] click search input (placeholder="搜索") @bilibili.com
[2] type "TypeScript教程" → search input [PARAM:searchKeyword]
[3] press Enter → search input
[4] click "TypeScript 从零开始完整教程" (link) in search results
[5] click "播放" button (aria-label="播放") in video player controls

Users can then replay with different searchKeyword values, or add NL modifications like "skip step 5, I don't need to click play".

Related PR

A complete implementation is provided in the accompanying PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions