diff --git a/claude.md b/claude.md index c027ac7..f167e93 100644 --- a/claude.md +++ b/claude.md @@ -4,23 +4,6 @@ Heatseeker is a navigation puzzle game designed to follow ARC-AGI-3 specifications. Players must navigate from the bottom-left corner to the top-right corner of a grid while avoiding hidden lava squares, using color-coded heat signatures to detect proximity to danger. -## Game Rules - -- **Objective**: Navigate from bottom-left to top-right corner without stepping on lava -- **Controls**: Arrow keys (desktop) or touch buttons (mobile) -- **Heat Signatures**: Colors indicate nearby lava count: - - Light Grey = 0 adjacent lava squares - - Light Yellow = 1 adjacent lava square - - Yellow = 2 adjacent lava squares - - Bright Yellow = 3 adjacent lava squares - - Light Yellow-Orange = 4 adjacent lava squares - - Deep Yellow-Orange = 5 adjacent lava squares - - Light Orange-Red = 6 adjacent lava squares - - Light Red = 7 adjacent lava squares - - Neon Pink = 8 adjacent lava squares -- **Failure**: Stepping on a lava square (turns black) ends the game -- **Success**: Reaching the green target square completes the level - ## Level Progression 1. **Level 1**: 10x10 grid, 1-5 lava squares @@ -62,10 +45,12 @@ heatseeker-game/ ## Setup Instructions ### Prerequisites + - Node.js (v16 or higher) - npm or yarn ### Installation + ```bash # Create new React app npx create-react-app heatseeker-game @@ -82,6 +67,7 @@ npx tailwindcss init -p ``` ### Running the Game + ```bash npm start ``` @@ -91,6 +77,7 @@ npm start The game is implemented as a single React component (`HeatSeekerGame`) with the following key features: ### State Management + - `currentLevel`: Current level (0-9) - `playerPos`: Player position {x, y} - `lavaSquares`: Set of lava square coordinates @@ -101,6 +88,7 @@ The game is implemented as a single React component (`HeatSeekerGame`) with the - `gameStarted`: Whether game has been started ### Core Functions + - `generateLavaSquares()`: Creates random lava placement for current level - `calculateHeat(x, y, lavaSet)`: Counts adjacent lava squares - `countAdjacentLava(x, y)`: Heat calculation using current game state @@ -110,6 +98,7 @@ The game is implemented as a single React component (`HeatSeekerGame`) with the - `getHeatColor(count)`: Maps heat count to color class ### Key Implementation Details + - Starting square immediately shows correct heat signature (no flickering) - Mobile controls with D-pad style button layout - Keyboard controls with arrow key support @@ -120,11 +109,13 @@ The game is implemented as a single React component (`HeatSeekerGame`) with the ## Development Notes ### Recent Bug Fixes + 1. **Starting square flickering**: Fixed by using unified heat calculation logic 2. **Mobile controls not working**: Fixed by ensuring proper function dependencies 3. **Incorrect starting heat**: Fixed by using same calculation for init and movement ### Design Decisions + - Heat signature colors follow intuitive temperature gradient (cool → warm) - Progressive difficulty scaling across 10 levels - Mobile-first design with touch-friendly controls @@ -134,6 +125,7 @@ The game is implemented as a single React component (`HeatSeekerGame`) with the ## Technical Architecture ### Color System + ```javascript const getHeatColor = (adjacentLavaCount) => { switch (adjacentLavaCount) { @@ -152,6 +144,7 @@ const getHeatColor = (adjacentLavaCount) => { ``` ### Grid Rendering + - Dynamic grid sizing based on level requirements - Minimum cell size for mobile compatibility - CSS Grid layout for precise square alignment @@ -161,6 +154,7 @@ const getHeatColor = (adjacentLavaCount) => { ## Future Enhancements Potential improvements that could be made: + - Add sound effects for movement and events - Implement replay system to review completed levels - Add difficulty settings (more/fewer lava squares) @@ -172,6 +166,7 @@ Potential improvements that could be made: ## Testing Considerations Key areas to test: + - Starting square heat signature accuracy - Mobile touch controls responsiveness - Keyboard controls across different browsers @@ -180,4 +175,4 @@ Key areas to test: - Lava placement randomization - Game state persistence through level changes -The game is designed to be a complete, playable implementation of the ARC-AGI-3 specification with robust error handling and cross-platform compatibility. \ No newline at end of file +The game is designed to be a complete, playable implementation of the ARC-AGI-3 specification with robust error handling and cross-platform compatibility. diff --git a/docs/computer_use/claude-api.md b/docs/computer_use/claude-api.md new file mode 100644 index 0000000..0f80090 --- /dev/null +++ b/docs/computer_use/claude-api.md @@ -0,0 +1,52 @@ +# Anthropic Claude Computer-Use Integration + +This guide describes how to run the Claude integration that plays Heatseeker via Anthropic's official computer-use API and how to execute the accompanying tests. + +## Prerequisites + +- Python 3.11 or newer +- An Anthropic API key with computer-use beta access (`ANTHROPIC_API_KEY`) +- [Playwright browsers](https://playwright.dev/python/docs/browsers) installed locally + +It is recommended to work inside a virtual environment: + +```bash +python -m venv .venv +source .venv/bin/activate +``` + +## Installation + +Install the Python dependencies for the Claude integration and download Playwright's browser binaries: + +```bash +pip install -r models/claude/requirements.txt +playwright install chromium +``` + +## Running Claude on Heatseeker + +The integration streams Claude's actions while it plays the live production build at `https://heatseeker-one.vercel.app`. Export your API key and invoke the helper module: + +```bash +export ANTHROPIC_API_KEY="sk-ant-..." +python -m models.claude.run +``` + +The script prints each event as Claude interacts with the remote workstation (for example: opening the site, clicking on-screen controls, or summarising the run). Customise token and sampling behaviour with `--max-output-tokens` and `--temperature` if needed. + +## Running Tests + +The automated tests rely on `pytest` and Playwright: + +```bash +pytest models/claude/tests +``` + +The suite validates both the prompt content—using Playwright to parse the rendered instructions—and the HTTP payload emitted to Anthropic's API. + +## Troubleshooting + +- **403 or network proxy errors**: ensure the environment allows outbound HTTPS traffic to `api.anthropic.com`. +- **Playwright errors about missing browsers**: rerun `playwright install chromium` inside the active virtual environment. +- **Missing API key**: set `ANTHROPIC_API_KEY` before running `python -m models.claude.run`; the client raises a descriptive error otherwise. diff --git a/models/__init__.py b/models/__init__.py new file mode 100644 index 0000000..89f2eca --- /dev/null +++ b/models/__init__.py @@ -0,0 +1 @@ +"""Model integrations for Heatseeker.""" diff --git a/models/claude/__init__.py b/models/claude/__init__.py new file mode 100644 index 0000000..ea1bc44 --- /dev/null +++ b/models/claude/__init__.py @@ -0,0 +1,11 @@ +"""Claude computer-use integration for playing Heatseeker.""" +from .client import AnthropicAPIError, ClaudeComputerUseClient +from .events import ComputerUseEvent +from .prompt import HeatseekerClaudePlayer + +__all__ = [ + "AnthropicAPIError", + "ClaudeComputerUseClient", + "ComputerUseEvent", + "HeatseekerClaudePlayer", +] diff --git a/models/claude/client.py b/models/claude/client.py new file mode 100644 index 0000000..161977c --- /dev/null +++ b/models/claude/client.py @@ -0,0 +1,163 @@ +"""HTTP client used to talk to Anthropic's computer-use API.""" +from __future__ import annotations + +import json +import os +from typing import Dict, Generator, Optional + +import httpx + +from . import config +from .events import ComputerUseEvent + + +class AnthropicAPIError(RuntimeError): + """Raised when the Anthropic API returns an unexpected response.""" + + +class ClaudeComputerUseClient: + """Thin wrapper around Anthropic's official Messages API.""" + + def __init__( + self, + *, + api_key: Optional[str] = None, + model: str = config.DEFAULT_MODEL, + base_url: str = config.ANTHROPIC_API_URL, + beta_header: str = config.COMPUTER_USE_BETA_HEADER, + tool_type: str = config.COMPUTER_TOOL_TYPE, + tool_name: str = config.COMPUTER_TOOL_NAME, + http_client: Optional[httpx.Client] = None, + request_timeout: float = 90.0, + ) -> None: + self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY") + if not self.api_key: + raise ValueError("An Anthropic API key must be supplied via the constructor or ANTHROPIC_API_KEY.") + + self.model = model + self.base_url = base_url.rstrip("/") + self.beta_header = beta_header + self.tool_type = tool_type + self.tool_name = tool_name + self._client = http_client or httpx.Client(timeout=request_timeout) + + def close(self) -> None: + """Release underlying HTTP resources.""" + + self._client.close() + + def __enter__(self) -> "ClaudeComputerUseClient": + return self + + def __exit__(self, exc_type, exc, tb) -> None: # type: ignore[override] + self.close() + + # Headers for SSE streaming requests. + def _headers(self) -> Dict[str, str]: + return { + "x-api-key": self.api_key, + "anthropic-version": config.ANTHROPIC_VERSION, + "anthropic-beta": self.beta_header, + "content-type": "application/json", + "accept": "text/event-stream", + } + + def _payload( + self, + *, + system_prompt: str, + user_prompt: str, + max_output_tokens: int, + temperature: float, + metadata: Optional[Dict[str, str]] = None, + ) -> Dict[str, object]: + payload: Dict[str, object] = { + "model": self.model, + "max_output_tokens": max_output_tokens, + "system": system_prompt, + "messages": [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": user_prompt, + } + ], + } + ], + "tools": [ + { + "type": self.tool_type, + "name": self.tool_name, + } + ], + "tool_choice": {"type": "tool", "name": self.tool_name}, + "temperature": temperature, + } + + if metadata: + payload["metadata"] = metadata + + return payload + + def stream_computer_use( + self, + *, + system_prompt: str, + user_prompt: str, + max_output_tokens: int = 4096, + temperature: float = 0.0, + metadata: Optional[Dict[str, str]] = None, + ) -> Generator[ComputerUseEvent, None, None]: + """Send a message request and yield computer-use streaming events.""" + + payload = self._payload( + system_prompt=system_prompt, + user_prompt=user_prompt, + max_output_tokens=max_output_tokens, + temperature=temperature, + metadata=metadata, + ) + + url = f"{self.base_url}/v1/messages" + with self._client.stream("POST", url, headers=self._headers(), json=payload) as response: + try: + response.raise_for_status() + except httpx.HTTPStatusError as exc: # pragma: no cover - defensive path + raise AnthropicAPIError(str(exc)) from exc + + for raw_line in response.iter_lines(): + if not raw_line: + continue + if isinstance(raw_line, bytes): + decoded = raw_line.decode("utf-8") + else: + decoded = raw_line + if not decoded.startswith("data: "): + continue + data = decoded[len("data: "):].strip() + if data == "[DONE]": + break + try: + event_payload = json.loads(data) + except json.JSONDecodeError as exc: # pragma: no cover - defensive path + raise AnthropicAPIError(f"Invalid JSON event: {data}") from exc + yield ComputerUseEvent.from_stream(event_payload) + + def build_play_payload( + self, + *, + system_prompt: str, + user_prompt: str, + **kwargs: object, + ) -> Dict[str, object]: + """Expose the request payload for testing and documentation.""" + + return self._payload( + system_prompt=system_prompt, + user_prompt=user_prompt, + max_output_tokens=int(kwargs.get("max_output_tokens", 4096)), + temperature=float(kwargs.get("temperature", 0.0)), + metadata=kwargs.get("metadata"), + ) diff --git a/models/claude/config.py b/models/claude/config.py new file mode 100644 index 0000000..c02aefd --- /dev/null +++ b/models/claude/config.py @@ -0,0 +1,23 @@ +"""Configuration values for the Claude computer-use integration.""" +from __future__ import annotations + +ANTHROPIC_API_URL: str = "https://api.anthropic.com" +"""The base URL for Anthropic's public API.""" + +ANTHROPIC_VERSION: str = "2023-06-01" +"""Stable API version required by Anthropic.""" + +COMPUTER_USE_BETA_HEADER: str = "computer-use-2024-10-22" +"""Beta header required to unlock the computer-use tool.""" + +DEFAULT_MODEL: str = "claude-3-5-sonnet-20241022" +"""Claude model that currently supports computer use.""" + +COMPUTER_TOOL_TYPE: str = "computer_20241022" +"""Identifier for the computer tool supplied in the messages payload.""" + +COMPUTER_TOOL_NAME: str = "computer" +"""Name used to select the computer tool in the tool_choice field.""" + +HEATSEEKER_URL: str = "https://heatseeker-one.vercel.app" +"""Production deployment that Claude must use while playing Heatseeker.""" diff --git a/models/claude/events.py b/models/claude/events.py new file mode 100644 index 0000000..6ee59b7 --- /dev/null +++ b/models/claude/events.py @@ -0,0 +1,33 @@ +"""Utilities for normalising Anthropic's streaming events.""" +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any, Dict + + +@dataclass(frozen=True) +class ComputerUseEvent: + """Represents an event emitted while Claude is using the computer tool.""" + + event_type: str + payload: Dict[str, Any] + + @classmethod + def from_stream(cls, message: Dict[str, Any]) -> "ComputerUseEvent": + """Create an event from the JSON dictionary emitted by Anthropic. + + Anthropic wraps most values in an object with a ``type`` field. + The rest of the keys contain the payload for the specific event. + """ + + event_type = message.get("type", "unknown") + payload = {key: value for key, value in message.items() if key != "type"} + return cls(event_type=event_type, payload=payload) + + def is_tool_use(self) -> bool: + """Return ``True`` when the event begins a computer-tool action.""" + + if self.event_type != "content_block_start": + return False + block = self.payload.get("content_block") or {} + return block.get("type") == "tool_use" diff --git a/models/claude/prompt.py b/models/claude/prompt.py new file mode 100644 index 0000000..1fa5eaf --- /dev/null +++ b/models/claude/prompt.py @@ -0,0 +1,98 @@ +"""High-level orchestration for having Claude play Heatseeker via computer use.""" +from __future__ import annotations + +import textwrap +from dataclasses import dataclass +from typing import Generator, Optional + +from . import config +from .client import ClaudeComputerUseClient +from .events import ComputerUseEvent + + +SYSTEM_PROMPT = textwrap.dedent( + """ + You are Claude with access to Anthropic's official computer-use tool. Operate the + provided workstation responsibly: keep activity inside the provided sandbox, avoid + downloading untrusted binaries, and never interact with local resources outside the + virtual machine. Follow the user's plan precisely. + """ +) + +USER_INSTRUCTIONS = textwrap.dedent( + """ + Play Heatseeker at {url} until the run naturally ends. Use only the public website + so you have no special advantage over other players. + + Minimum requirements: + • Open the production site in the default browser. + • Use the on-screen or keyboard controls to move the explorer from the + bottom-left starting square toward the goal in the top-right. + • Reveal heat levels by stepping on tiles and avoid lava based on the color hints. + • When a run ends, summarise the attempt including level reached and whether + the avatar survived. + + Helpful details about the UI: + • Click the "Start Game" button on the landing screen to reveal the board. + • The D-pad shows four buttons with arrow glyphs (↑, ←, →, ↓) that map to moves. + • Game summaries present either a green "Level Complete" panel or a red "Game Over" panel. + • After finishing or losing, "Play Again" returns to the menu while "Retry Level" restarts. + • Keyboard arrow keys are also supported if you prefer them. + + Narrate important decisions so observers can follow your reasoning. Do not attempt + to script the game or call local project files—interact strictly through the live + site. + """ +) + + +@dataclass +class HeatseekerClaudePlayer: + """Runs a Heatseeker session through Claude's computer-use interface.""" + + client: ClaudeComputerUseClient + heatseeker_url: str = config.HEATSEEKER_URL + + def build_user_prompt(self) -> str: + """Return the full user prompt used for the computer session.""" + + return USER_INSTRUCTIONS.format(url=self.heatseeker_url) + + def build_system_prompt(self) -> str: + """Return the default system prompt for computer-use runs.""" + + return SYSTEM_PROMPT + + def play( + self, + *, + max_output_tokens: int = 4096, + temperature: float = 0.0, + metadata: Optional[dict[str, str]] = None, + ) -> Generator[ComputerUseEvent, None, None]: + """Start a streaming session instructing Claude to play Heatseeker.""" + + return self.client.stream_computer_use( + system_prompt=self.build_system_prompt(), + user_prompt=self.build_user_prompt(), + max_output_tokens=max_output_tokens, + temperature=temperature, + metadata=metadata, + ) + + def generate_payload( + self, + *, + max_output_tokens: int = 4096, + temperature: float = 0.0, + metadata: Optional[dict[str, str]] = None, + ) -> dict[str, object]: + """Expose the JSON payload that will be submitted to the API.""" + + return self.client.build_play_payload( + system_prompt=self.build_system_prompt(), + user_prompt=self.build_user_prompt(), + max_output_tokens=max_output_tokens, + temperature=temperature, + metadata=metadata, + ) diff --git a/models/claude/requirements.txt b/models/claude/requirements.txt new file mode 100644 index 0000000..368ec06 --- /dev/null +++ b/models/claude/requirements.txt @@ -0,0 +1,3 @@ +httpx>=0.27,<0.28 +pytest>=7.4,<8.3 +playwright>=1.48,<1.49 diff --git a/models/claude/run.py b/models/claude/run.py new file mode 100644 index 0000000..9cb1ff0 --- /dev/null +++ b/models/claude/run.py @@ -0,0 +1,37 @@ +"""Entry point for launching a Heatseeker computer-use session.""" +from __future__ import annotations + +import argparse +import json +import sys +from typing import Iterable + +from .client import ClaudeComputerUseClient +from .prompt import HeatseekerClaudePlayer + + +def format_event(event) -> str: + """Pretty-print an event for terminal output.""" + + payload = {"type": event.event_type, **event.payload} + return json.dumps(payload, indent=2, sort_keys=True) + + +def main(argv: Iterable[str] | None = None) -> int: + parser = argparse.ArgumentParser(description="Stream Claude computer-use events while it plays Heatseeker.") + parser.add_argument("--max-output-tokens", type=int, default=4096, help="Maximum output tokens to request from Claude.") + parser.add_argument("--temperature", type=float, default=0.0, help="Sampling temperature for the completion.") + args = parser.parse_args(argv) + + with ClaudeComputerUseClient() as client: + player = HeatseekerClaudePlayer(client) + for event in player.play( + max_output_tokens=args.max_output_tokens, + temperature=args.temperature, + ): + print(format_event(event)) + return 0 + + +if __name__ == "__main__": # pragma: no cover - convenience entry point + sys.exit(main()) diff --git a/models/claude/tests/__init__.py b/models/claude/tests/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/models/claude/tests/conftest.py b/models/claude/tests/conftest.py new file mode 100644 index 0000000..901dbf0 --- /dev/null +++ b/models/claude/tests/conftest.py @@ -0,0 +1,29 @@ +"""Shared pytest fixtures for the Claude integration tests.""" +from __future__ import annotations + +import pytest +from playwright.sync_api import Browser, Page, sync_playwright, Error + + +@pytest.fixture(scope="session") +def browser() -> Browser: + with sync_playwright() as playwright: + browser: Browser | None = None + try: + browser = playwright.chromium.launch() + except Error as exc: + pytest.skip(f"Playwright Chromium browser is unavailable: {exc}") + try: + yield browser + finally: + if browser is not None: + browser.close() + + +@pytest.fixture() +def page(browser: Browser) -> Page: + page = browser.new_page() + try: + yield page + finally: + page.close() diff --git a/models/claude/tests/test_heatseeker_player.py b/models/claude/tests/test_heatseeker_player.py new file mode 100644 index 0000000..acf0fd7 --- /dev/null +++ b/models/claude/tests/test_heatseeker_player.py @@ -0,0 +1,81 @@ +"""Tests covering the Claude computer-use integration.""" +from __future__ import annotations + +from typing import Dict, List + +import pytest +from unittest.mock import Mock + +from models.claude.client import ClaudeComputerUseClient +from models.claude.events import ComputerUseEvent +from models.claude.prompt import HeatseekerClaudePlayer +from models.claude import config + + +class _FakeStreamResponse: + """Minimal stub that mimics ``httpx.Client.stream`` for unit tests.""" + + def __init__(self, lines: List[str]) -> None: + self._lines = lines + + def __enter__(self) -> "_FakeStreamResponse": # pragma: no cover - behaviour verified indirectly + return self + + def __exit__(self, exc_type, exc, tb) -> None: # pragma: no cover - behaviour verified indirectly + return None + + def iter_lines(self): + for line in self._lines: + yield line + + def raise_for_status(self) -> None: + return None + + +def test_user_prompt_mentions_production_url(page) -> None: + """Ensure the player tells Claude to use the production Heatseeker deployment.""" + + player = HeatseekerClaudePlayer(client=Mock(spec=ClaudeComputerUseClient)) # type: ignore[arg-type] + prompt = player.build_user_prompt() + + # Use Playwright to assert the instructions mention the canonical URL and controls. + page.set_content(f"
{prompt}
") + assert page.locator(f"text='{config.HEATSEEKER_URL}'").count() == 1 + assert page.locator("text='Start Game'").count() == 1 + assert page.locator("text='↑'").count() == 1 + + +def test_stream_payload_and_events(monkeypatch) -> None: + """``play`` should send a valid API payload and parse streaming events.""" + + captured: Dict[str, object] = {} + + def fake_stream(method, url, *, headers, json, **kwargs): # noqa: ANN001 + captured["method"] = method + captured["url"] = url + captured["headers"] = headers + captured["json"] = json + lines = [ + "data: {\"type\": \"message_start\", \"message\": {\"id\": \"msg_1\"}}", + "data: {\"type\": \"content_block_start\", \"content_block\": {\"type\": \"tool_use\", \"name\": \"computer\"}}", + "data: {\"type\": \"content_block_delta\", \"delta\": {\"type\": \"input_text\", \"text\": \"open_url\"}}", + "data: [DONE]", + ] + return _FakeStreamResponse(lines) + + client = ClaudeComputerUseClient(api_key="test", http_client=Mock()) # type: ignore[arg-type] + monkeypatch.setattr(client._client, "stream", fake_stream) + + player = HeatseekerClaudePlayer(client) + events = list(player.play()) + + assert captured["method"] == "POST" + assert captured["url"].endswith("/v1/messages") + assert captured["headers"]["accept"] == "text/event-stream" + user_message = captured["json"]["messages"][0]["content"][0]["text"] + assert config.HEATSEEKER_URL in user_message + + assert len(events) == 3 # The [DONE] marker should be excluded. + assert events[0].event_type == "message_start" + assert events[1].is_tool_use() + assert isinstance(events[2], ComputerUseEvent)