Prerequisites: Understand the agent architecture in doc/main/README.md (start with
agent/loop.mdandagent/overview.md).
Visual debugging approach for ClosePaw's ReAct loop using screenshots + logs.
| Category | Symptoms |
|---|---|
| Perception | Agent doesn't see visible elements, wrong element indices |
| Reasoning | LLM chooses wrong action despite correct perception |
| Execution | Action fails or targets wrong element |
| Observation | Post-action state not captured correctly |
./scripts/debug-run.sh "Open Chrome"Output in debug-output/run_<timestamp>/:
debug-output/run_<timestamp>/
├── turn_NNN_n<turn>.png # Screenshot at each captured turn-start
├── turn_NNN_n<turn>_log.txt # Log excerpt around that turn
├── logcat_full.log # Raw logcat stream
├── agent.log # Filtered: Agent|Turn|LLMClient|ToolRouter|SessionServices
├── system.log # Filtered: AgentService|AccessibilityPlatform|AgentSession
└── trace/ # JSONL trace + replay artifacts (compiled by replay_compiler.py)
For each turn, compare:
| Check | Source | Look For |
|---|---|---|
| Actual screen | turn_NNN_n<turn>.png |
What's visible, is target there? |
| Perceived elements | turn_NNN_n<turn>_log.txt |
Element indices, missing elements |
| Action chosen | Log: ACTION |
Does action match goal? |
| Result | Log: ActionResult |
Success/failure, observation |
Symptoms: Same action repeating, no progress
Debug:
- Compare consecutive screenshots - is screen actually changing?
- Check observation after action - did agent see the change?
- Check LLM reasoning - is history providing correct context?
Symptoms: Target visible but agent does something else
Debug:
- Check
Perceptoroutput - is target element in the list? - Check element index - does it match what agent referenced?
- Check LLM prompt - does system prompt have enough guidance?
Example:
Screen shows Chrome at element_index=10
Agent action: {"tool": "back"} ← WRONG
Should be: {"tool": "click", "element_index": 10}
Symptoms: Tool returns error or no effect
Debug:
- Check element bounds - is target actually clickable?
- Check timing - did UI change during action?
- Check
ActionResult- what error was returned?
In Agent.kt or Turn.kt:
// Log perception before LLM call
private fun logPerception(snapshot: ScreenSnapshot) {
Log.d(TAG, "=== PERCEPTION ===")
Log.d(TAG, Perceptor.toPromptJson(snapshot))
}
// Log LLM input/output
private fun logTurn(input: String, result: TurnResult) {
Log.d(TAG, "=== LLM INPUT ===\n$input")
Log.d(TAG, "=== LLM OUTPUT ===\n${result.content}")
Log.d(TAG, "=== TOOL CALLS ===\n${result.toolCalls}")
}# Capture current screen
adb exec-out screencap -p > /tmp/check.png
open /tmp/check.png
# Compare with what Perceptor saw# Pick the latest run dir
RUN=$(ls -1dt debug-output/run_* | head -n 1)
# Actions taken
grep -E "click|type|scroll|swipe|back|home" "$RUN/agent.log"
# Tool results
grep "ActionResult\|ToolCallResult" "$RUN/agent.log"
# Errors
grep "ERROR\|Exception" "$RUN/agent.log" "$RUN/system.log"
# Turn markers
grep "=== TURN" "$RUN/agent.log"Problem: Agent keeps pressing back instead of clicking Chrome
# 1. Run debug
./scripts/debug-run.sh "Open Chrome"
# 2. Check turn_002_n2.png - Chrome icon visible
# 3. Check turn_002_n2_log.txt
# PERCEPTION: element_index=10, text="Chrome", clickable=true
# ACTION: {"tool": "back"} ← BUG
# 4. Issue: LLM reasoning wrong despite correct perception
# Fix: Update system prompt in agent/definition/DefaultAgentDef.kt
# 5. Verify
./scripts/debug-run.sh "Open Chrome"See doc/main/README.md for full architecture (entry to
agent/,infra/,protocol/,ui/docs).
| File | Purpose |
|---|---|
app/src/main/kotlin/ai/closepaw/agent/Agent.kt |
ReAct loop (Perceive → Think → Act → Observe) |
app/src/main/kotlin/ai/closepaw/agent/Turn.kt |
Single LLM call with streaming |
app/src/main/kotlin/ai/closepaw/perception/Perceptor.kt |
Accessibility tree → ScreenSnapshot |
app/src/main/kotlin/ai/closepaw/tool/ToolRouter.kt |
Tool execution state machine |
app/src/main/kotlin/ai/closepaw/platform/AccessibilityPlatform.kt |
Screen capture and actions |