ScreenMuse

AI Agent Recorder

Capture what AI sees and does.

ScreenMuse is the first screen recorder purpose-built for AI agents. When your AI agent runs a Playwright script, fixes a bug, or automates a workflow, ScreenMuse captures video proof of what happened.

Built for:

🤖 AI coding agents that record PRs and demonstrate fixes
🎭 Playwright & browser automation with video on test failure
🔄 Agentic workflows that prove task completion
👀 Computer Use models for observing agent behavior

Not built for:

❌ Manual screen recording (use Loom)
❌ Video editing (use ScreenFlow)
❌ Live streaming (use OBS)

Demo

$ curl -X POST localhost:7823/start -d '{"name":"demo"}'
{"session_id":"...","status":"recording"}

$ curl -X POST localhost:7823/chapter -d '{"name":"Step 1"}'
{"status":"ok"}

$ curl -X POST localhost:7823/stop
{"path":"/Users/.../demo.mp4","duration":8.2}

$ curl -X POST localhost:7823/export -d '{"format":"gif"}'
{"path":"/Users/.../demo.gif","frames":82}

Want to generate the real GIF? Run vhs docs/demo.tape with the server running on macOS. See docs/CONTRIBUTING-DEMO.md for full instructions.

Why ScreenMuse?

The Problem:

Your AI agent just spent 10 minutes automating a task. It says "Done!"

But did it work? What did it actually do? Can you share proof?

The Solution:

# Before agent runs
curl -X POST http://localhost:7823/start -d '{"name": "agent-task"}'

# Agent does its thing (Playwright, CLI tools, whatever)
# ...

# After agent finishes
curl -X POST http://localhost:7823/stop
# Returns video path: /Users/you/Movies/ScreenMuse/agent-task.mp4

Now you have a timestamped video of exactly what the agent did.

Quick Start

git clone https://github.com/hnshah/screenmuse
cd screenmuse
./scripts/dev-run.sh

Grant Screen Recording permission when prompted, then relaunch.

Use dev-run.sh or Xcode, not swift build. Ad-hoc signed binaries get a new code signature hash on every rebuild. macOS TCC identifies apps by hash, so screen recording permission needs re-granting after each swift build. The script uses xcodebuild for a consistent signature. If permissions get stuck, run ./scripts/reset-permissions.sh

Troubleshooting

Common failure modes and how to fix them:

Symptom	Cause	Fix
All curl requests time out	Port 7823 already in use	`lsof -i :7823` to find the conflicting process, then kill it or set `SCREENMUSE_PORT` to a different port
`/start` returns 403	Screen Recording permission not granted	System Settings → Privacy & Security → Screen Recording → enable ScreenMuse → relaunch the app (required after granting permission)
Output file is 0 bytes or has no video track	TCC timing race — permission granted but not yet active	Run `./scripts/reset-permissions.sh` then relaunch
Permissions loop on every rebuild	Code signature changes with each `swift build`	Always use `./scripts/dev-run.sh` (not `swift build`) — see note in Quick Start

Quick diagnostic checklist

# 1. Is the server running?
curl http://localhost:7823/health

# 2. Is port 7823 in use by something else?
lsof -i :7823

# 3. Reset stuck permissions
./scripts/reset-permissions.sh

# 4. Check server logs (if running via dev-run.sh)
# Logs appear in the terminal where you ran dev-run.sh

Screen Recording permission

After granting Screen Recording permission in System Settings, you must relaunch ScreenMuse. The permission takes effect on next launch — POST /start will return 403 until you do this.

If the app keeps asking for permission after each swift build, switch to ./scripts/dev-run.sh. The build script uses a consistent code signature so macOS only needs to grant permission once.

Core Features

🎯 API First Design

40+ HTTP endpoints on localhost:7823
OpenAPI spec at /openapi
Zero UI for controlling everything via HTTP
Designed for code, not humans

🤖 Agent Aware

Activity detection knows when agent is idle
Click tracking captures cursor events with timestamps
Keystroke overlay shows what agent typed
Chapter markers structure long recordings
Highlight mode auto zooms on important moments

📤 Export Pipeline

GIF with custom encoder (10fps default)
WebP that's smaller than GIF with better quality
Trim with frame accurate or fast stream copy
Speed ramp auto speeds idle sections
Crop, thumbnail, concatenate for post-processing

👁️ Vision/OCR

On Device OCR using Apple Vision
Fast mode for real-time processing
Accurate mode for quality processing
No API key required

🪟 Window Management

Focus, position, hide-others using native macOS
Multi Window PiP records 2 windows simultaneously
Works where Playwright can't via Accessibility API

📡 Real Time Streaming

SSE frame stream in JPEG or PNG
Configurable FPS and scale
Multiple clients supported

Example: Playwright Integration

The screenmuse-playwright package makes recording Playwright runs zero-config:

cd packages/screenmuse-playwright
npm install

const { ScreenMuse } = require('screenmuse-playwright');

const sm = new ScreenMuse();

// Wrap any async function for automatic recording
const result = await sm.record(async (page) => {
  await page.goto('https://example.com');
  await page.click('button');
  // ScreenMuse is capturing everything
});

console.log(result.video_path);  // .../recording.mp4
console.log(result.gif_path);    // .../recording.gif (if enabled)

Playwright Test fixture for automatic video on failure:

test('my test', async ({ page, screenMuse }) => {
  await page.goto('https://example.com');
  await expect(page.locator('h1')).toBeVisible();
  // If test fails, video automatically saved
});

See packages/screenmuse-playwright/ for full docs and examples.

Example: AI Coding Agent

Record a coding agent's work:

import subprocess, requests

# Start recording
requests.post("http://localhost:7823/start", json={"name": "fix-bug-123"})

# Mark chapter for each step
requests.post("http://localhost:7823/chapter", json={"name": "Reading code"})

# Agent does its work
subprocess.run(["aider", "--yes", "Fix the authentication bug"])

requests.post("http://localhost:7823/chapter", json={"name": "Running tests"})
subprocess.run(["pytest", "tests/test_auth.py"])

# Stop and get video
response = requests.post("http://localhost:7823/stop").json()
print(f"Recording saved: {response['video_path']}")

# Export as GIF for sharing
requests.post("http://localhost:7823/export", json={
    "format": "gif",
    "fps": 10,
    "scale": 800
})

Now you can attach the video to your PR showing exactly what the agent did.

Example: Computer Use Model

Track what a computer-use AI model sees and does:

# Agent uses computer-use API (Anthropic, OpenAI, etc.)
import anthropic, requests

client = anthropic.Anthropic()

# Start recording
requests.post("http://localhost:7823/start", json={"name": "computer-use-session"})

messages = [{"role": "user", "content": "Go to example.com and click the sign-up button"}]

while True:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        messages=messages,
        tools=[...computer_use_tools...],
        max_tokens=4096
    )
    
    # Agent takes actions (mouse, keyboard, etc.)
    for block in response.content:
        if block.type == "tool_use":
            # Mark significant moments
            requests.post("http://localhost:7823/chapter", 
                         json={"name": f"Tool: {block.name}"})
    
    if response.stop_reason == "end_turn":
        break

# Stop and get recording
result = requests.post("http://localhost:7823/stop").json()
print(f"Session recorded: {result['video_path']}")

Result is a complete video documentation of the agent's session.

Architecture

Native macOS with zero dependencies:

ScreenCaptureKit for screen capture (requires macOS 14+)
AVFoundation for video encoding
Metal for GPU accelerated effects (click ripples, zoom)
Vision for on-device OCR
Swift 6 with modern concurrency (actors, async/await)

11,980 lines of Swift, all in tree, no external frameworks.

Sources/
├── ScreenMuseCore/
│   ├── AgentAPI/        # HTTP server (40+ endpoints)
│   ├── Recording/       # ScreenCaptureKit integration
│   ├── Effects/         # Click ripples, zoom, keystroke overlay
│   ├── Export/          # GIF, WebP, trim, speedramp
│   ├── Capture/         # Screenshot manager
│   ├── Streaming/       # SSE frame stream
│   ├── System/          # Window management, clipboard
│   ├── Timeline/        # Chapter markers, event log
│   └── Permissions/     # TCC permission management
├── ScreenMuseApp/       # macOS app (menu bar + viewer)
└── ScreenMuseCLI/       # Command line tool

API Reference

Full OpenAPI spec:

curl http://localhost:7823/openapi > screenmuse-api.json

Load into Postman, Cursor, Claude Desktop, or any OpenAPI compatible tool.

Quick Reference

Recording:

POST /start starts recording (name, region, audio, webhook)
POST /stop stops and finalizes video
POST /pause / POST /resume pauses and resumes
POST /chapter marks a named chapter
POST /highlight flags next click for zoom
POST /note drops timestamped note
POST /screenshot captures a frame (no recording)

Export:

POST /export creates GIF or WebP (fps, scale, quality, range)
POST /trim trims to time range
POST /speedramp auto speeds idle sections
POST /crop crops rectangular region
POST /thumbnail extracts frame at timecode
POST /concat combines recordings

Multi Window:

POST /start/pip records 2 windows (PiP or side-by-side)

Window Management:

POST /window/focus brings app to front
POST /window/position sets size and position (requires Accessibility)
POST /window/hide-others hides all other apps

System:

GET /system/clipboard reads clipboard
GET /system/active-window shows which window has focus
GET /system/running-apps lists running apps

Vision/OCR:

POST /ocr performs on-device OCR (fast or accurate mode)

Streaming:

GET /stream provides SSE frame stream (JPEG/PNG)
GET /stream/status shows active client count

Metadata:

GET /status shows recording status
GET /version shows version info
GET /openapi provides full API spec

Use Cases

1. AI Coding Agents

Record the agent's IDE session including file edits, terminal commands, and browser tests.

Attach video to PR with caption "Here's what the agent did to fix the bug."

2. Playwright/Selenium Tests

Get automatic video recording on test failure.

Debug test flakes by seeing exactly what happened before the failure.

3. Agentic Workflows

Record multi-step autonomous tasks.

Create audit trail with video proof of what the agent accomplished.

4. Computer Use Models

Document AI's interaction with desktop apps.

Safety monitoring provides visual log of agent actions.

5. RPA (Robotic Process Automation)

Capture automated business workflows.

Compliance requires video evidence of process execution.

6. API Demos

Programmatically generate demo videos.

Marketing automation creates consistent, repeatable demos.

Comparison: ScreenMuse vs. Traditional Recorders

Feature	ScreenMuse	Loom	ScreenFlow	QuickTime
API Control	✅ 40+ endpoints	❌	❌	❌
Zero UI	✅	❌	❌	❌
Agent Aware	✅ Activity detection	❌	❌	❌
Programmatic Export	✅ GIF, WebP, trim	❌	⚠️ Manual	❌
Real Time Streaming	✅ SSE	❌	❌	❌
Multi Window PiP	✅	❌	⚠️ Manual	❌
On Device OCR	✅ Vision	❌	❌	❌
Chapter Markers	✅ API	⚠️ Manual	⚠️ Manual	❌
Dependencies	✅ Zero	?	?	✅ Zero
Open Source	✅	❌	❌	❌

ScreenMuse combines Loom's recording quality with Playwright's automation and Apple's Vision framework.

Pairing with Other Tools

ScreenMuse + Peekaboo

Peekaboo provides screenshots and OCR for reading what's on screen. ScreenMuse records what happens next.

# Peekaboo reads, then ScreenMuse records response
peekaboo image --mode screen --analyze "Is login form visible?"
curl -X POST http://localhost:7823/start -d '{"name": "login-attempt"}'
# ... agent fills form ...
curl -X POST http://localhost:7823/stop

ScreenMuse + Anthropic Computer Use

Anthropic provides AI that controls the computer. ScreenMuse records what it did.

Perfect for safety monitoring and debugging computer-use agents.

ScreenMuse + MCP (Model Context Protocol)

MCP Server exposes ScreenMuse to Claude Desktop and Cursor. Claude controls recording via tool calls.

Quickest setup (npx, no install):

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "screenmuse": {
      "command": "npx",
      "args": ["screenmuse-mcp"],
      "env": {
        "SCREENMUSE_API_KEY": "your-key-here"
      }
    }
  }
}

Find your API key: cat ~/.screenmuse/api_key

See mcp-server/INSTALL.md for global install and other options.

Requirements

macOS 14 (Sonoma) or later
Screen Recording permission (System Settings → Privacy & Security)
Accessibility permission (optional, for window positioning)
Swift 6 (included with Xcode 16+)

No external dependencies required.

Installation

Option 1: Homebrew (coming soon)

brew install hnshah/screenmuse/screenmuse

Option 2: Download from GitHub Releases (coming soon)

Download the latest universal binary (arm64 + x86_64) from GitHub Releases:

# Download and unzip
curl -LO https://github.com/hnshah/screenmuse/releases/latest/download/screenmuse-<version>.zip
unzip screenmuse-<version>.zip -d /usr/local/bin/

# Verify
screenmuse --help

Option 3: Build from Source

git clone https://github.com/hnshah/screenmuse
cd screenmuse
./scripts/dev-run.sh

Option 4: Build CLI

swift build -c release
.build/release/screenmuse --help

Option 5: Xcode

open Package.swift
# Build and run ScreenMuseApp target

Configuration

Default settings:

Port is 7823
Output goes to ~/Movies/ScreenMuse/
Quality is Medium (10 Mbps)
Format is H.264 MP4

Environment variables:

export SCREENMUSE_PORT=7823
export SCREENMUSE_OUTPUT_DIR=~/Desktop/recordings
export SCREENMUSE_QUALITY=high

Or configure via ~/.screenmuse/config.json:

{
  "port": 7823,
  "output_dir": "~/Movies/ScreenMuse",
  "default_quality": "medium",
  "auto_start": false
}

Development

# Run tests
swift test

# Build release
swift build -c release

# Run linter (if installed)
swiftlint

# Reset permissions (if stuck)
./scripts/reset-permissions.sh

CI/CD runs via GitHub Actions on every push (Build + Test).

FAQ

Q: Why not just use OBS or Loom?
A: Those are built for humans. ScreenMuse is built for code with 40+ API endpoints, zero UI, and agent-aware features.

Q: Can I use this for YouTube videos?
A: You could, but there are better tools like ScreenFlow or Camtasia. ScreenMuse is optimized for programmatic recording, not manual editing.

Q: Does it work on Windows/Linux?
A: Not yet. Currently macOS-only because it requires ScreenCaptureKit. Cross platform support is in the roadmap.

Q: How big are the video files?
A: Approximately 5-10 MB per minute at medium quality (10 Mbps). This is configurable via the quality setting.

Q: Can I record without the menu bar app?
A: Yes! Use the CLI (screenmuse) or control via HTTP API directly.

Q: Is there a cloud/SaaS version?
A: No. ScreenMuse is local-first by design. Your recordings never leave your machine unless you explicitly upload them.

Q: Can I contribute?
A: Yes! See CONTRIBUTING.md for guidelines.

Security

ScreenMuse's HTTP API is local-only by default (127.0.0.1:7823) and supports API key authentication via X-ScreenMuse-Key. The /script endpoint runs recording pipeline commands only — it does not execute shell commands or arbitrary code. See SECURITY.md for the full security model and recommendations.

License

MIT License. See LICENSE for details.

Credits

Built by @hnshah with contributions from the open-source community.

Powered by:

Apple ScreenCaptureKit
AVFoundation
Metal
Vision Framework
Swift 6

Links

GitHub: https://github.com/hnshah/screenmuse
Issues: https://github.com/hnshah/screenmuse/issues
Discussions: https://github.com/hnshah/screenmuse/discussions
Twitter: @hnshah

ScreenMuse. Because AI agents need screen recorders too. 🎬🤖

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.build		.build
.github/workflows		.github/workflows
Sources		Sources
Tests		Tests
clients		clients
demo		demo
docs		docs
examples		examples
mcp-server		mcp-server
packages		packages
scripts		scripts
test/api		test/api
.gitignore		.gitignore
.swiftlint.yml		.swiftlint.yml
ARCHITECTURE.md		ARCHITECTURE.md
BACKLOG.md		BACKLOG.md
CHANGELOG.md		CHANGELOG.md
COMMIT-PLAN.md		COMMIT-PLAN.md
COMPETITIVE-ANALYSIS.md		COMPETITIVE-ANALYSIS.md
IMPROVEMENT-PLAN.md		IMPROVEMENT-PLAN.md
IMPROVEMENTS-STARTED.md		IMPROVEMENTS-STARTED.md
Package.swift		Package.swift
QA-TEST-REPORT.md		QA-TEST-REPORT.md
README.md		README.md
RELEASING.md		RELEASING.md
ROADMAP.md		ROADMAP.md
SCREENMUSE-9.5-COMPLETE.md		SCREENMUSE-9.5-COMPLETE.md
SECURITY.md		SECURITY.md
ScreenMuse.entitlements		ScreenMuse.entitlements
TESTING.md		TESTING.md
WEEK-2-COMPLETE.md		WEEK-2-COMPLETE.md
WORK-SUMMARY.md		WORK-SUMMARY.md
screenmuse-media-uploader.sh		screenmuse-media-uploader.sh
screenmuse-send-to-telegram.sh		screenmuse-send-to-telegram.sh
screenmuse-viewer.py		screenmuse-viewer.py
test-qa-analyzer.swift		test-qa-analyzer.swift
test-qa-direct.swift		test-qa-direct.swift
test-qa.sh		test-qa.sh

Folders and files

Latest commit

History

Repository files navigation

ScreenMuse

Demo

Why ScreenMuse?

Quick Start

Troubleshooting

Quick diagnostic checklist

Screen Recording permission

Core Features

🎯 API First Design

🤖 Agent Aware

📤 Export Pipeline

👁️ Vision/OCR

🪟 Window Management

📡 Real Time Streaming

Example: Playwright Integration

Example: AI Coding Agent

Example: Computer Use Model

Architecture

API Reference

Quick Reference

Use Cases

1. AI Coding Agents

2. Playwright/Selenium Tests

3. Agentic Workflows

4. Computer Use Models

5. RPA (Robotic Process Automation)

6. API Demos

Comparison: ScreenMuse vs. Traditional Recorders

Pairing with Other Tools

ScreenMuse + Peekaboo

ScreenMuse + Anthropic Computer Use

ScreenMuse + MCP (Model Context Protocol)

Requirements

Installation

Option 1: Homebrew (coming soon)

Option 2: Download from GitHub Releases (coming soon)

Option 3: Build from Source

Option 4: Build CLI

Option 5: Xcode

Configuration

Development

FAQ

Security

License

Credits

Links

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages