From 90d50c954166f4263a9735f515726046b0656736 Mon Sep 17 00:00:00 2001 From: Mariozada Date: Mon, 23 Mar 2026 20:36:19 +1100 Subject: [PATCH 1/2] fix: prevent ui-state hallucination from truncated a11y tree responses --- mobilerun/SKILL.md | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/mobilerun/SKILL.md b/mobilerun/SKILL.md index 364b6fc..09342d0 100644 --- a/mobilerun/SKILL.md +++ b/mobilerun/SKILL.md @@ -291,6 +291,46 @@ To tap "Chrome": bounds are (533,1222,706,1422), so tap at x=(533+706)/2=619, y= Use `filter=true` for a cleaner tree focused on actionable elements (filters out non-interactive containers). +**Never dump the full UI state with `jq '.'`** — the a11y tree is deeply nested and can be tens of thousands of lines on screens with lists, grids, or feeds. Output will be truncated from the beginning, and you will silently lose data without knowing it. + +**Always use targeted jq filters:** + +```bash +# Get all clickable/interactive elements +curl -s "https://api.mobilerun.ai/v1/devices/{id}/ui-state?filter=true" \ + -H "Authorization: Bearer $MOBILERUN_API_KEY" | \ + jq '[.a11y_tree | recurse(.children[]?) | + select(.isClickable==true or .isEditable==true) | + {text: (.text // .contentDescription), bounds: .boundsInScreen, isEditable}]' + +# Get elements with content descriptions (media items, icons, etc.) +curl -s "https://api.mobilerun.ai/v1/devices/{id}/ui-state?filter=true" \ + -H "Authorization: Bearer $MOBILERUN_API_KEY" | \ + jq '[.a11y_tree | recurse(.children[]?) | + select(.contentDescription != null and .contentDescription != "") | + {desc: .contentDescription, bounds: .boundsInScreen, isClickable}]' + +# Get editable fields only +curl -s "https://api.mobilerun.ai/v1/devices/{id}/ui-state?filter=true" \ + -H "Authorization: Bearer $MOBILERUN_API_KEY" | \ + jq '[.a11y_tree | recurse(.children[]?) | + select(.isEditable==true) | {text, resourceId, bounds: .boundsInScreen}]' + +# Get phone state + screen bounds only (lightweight, always safe) +curl -s "https://api.mobilerun.ai/v1/devices/{id}/ui-state?filter=true" \ + -H "Authorization: Bearer $MOBILERUN_API_KEY" | \ + jq '{phone_state, device_context}' +``` + +**Truncation detection:** If a ui-state response does not begin with `{` or appears to start mid-JSON, the output was truncated. Do not attempt to infer or reconstruct missing data — re-fetch with a targeted jq filter. + +**After extracting elements, sanity-check the result:** +- **Count** — does the number of items make sense for what's on screen? A 3-column grid with 4 visible rows should yield ~12 items, not 8. +- **Bounds continuity** — items in a grid should tile from x=0. If your first item starts at x=361 instead of x=0, you are missing the left column. +- **Sort order** — if items should be chronological, verify dates are sequential. A gap or jump suggests missing entries. + +If the count or bounds look wrong, re-fetch with a more specific filter before proceeding. + --- ## Device Actions @@ -606,6 +646,8 @@ You don't see the phone screen -- the agent on the device does. Write prompts th - Good: `"Open Settings in the Chrome app and enable Dark Mode"` - You don't know what the screen looks like. The on-device agent can see it -- let it handle the navigation. +**Don't pass your own UI observations into a task instruction.** If you've already read the ui-state and identified an exact element, tap it directly — don't describe it to the agent. If you're submitting a task, describe the *goal* and let the agent do its own observation. Mixing the two — observing yourself, then handing off a screen-state description — means the agent acts on your interpretation of the screen rather than its own, and any error in your observation (truncated data, wrong coordinates) gets baked in silently. + **Be specific about the important details:** - Name the exact app (not "the browser" -- say "Chrome") - Specify exact text to type or send @@ -824,6 +866,9 @@ Content-Type: application/json ## Common Patterns +**Data integrity — never fill gaps:** +If extracted data is incomplete (truncated output, missing items, suspicious count), re-query with a better jq filter. Never infer or fabricate missing entries. If you cannot get a complete result after two attempts, tell the user what's missing and why before proceeding. + **Observe-Act Loop:** Most phone control tasks follow this cycle: 1. Take a screenshot and/or read the UI state From 8242773cc22642c47ff27e5402d2200d19617713 Mon Sep 17 00:00:00 2001 From: Mariozada Date: Mon, 23 Mar 2026 20:56:15 +1100 Subject: [PATCH 2/2] add jq as required binary --- mobilerun/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mobilerun/SKILL.md b/mobilerun/SKILL.md index 09342d0..6bd716a 100644 --- a/mobilerun/SKILL.md +++ b/mobilerun/SKILL.md @@ -6,7 +6,7 @@ description: > the screen, managing apps, running AI agent tasks on a phone, or any form of phone/mobile automation. Also load when the user mentions Mobilerun, Droidrun, or phone control. Requires a Mobilerun API key (prefixed dr_sk_) and a connected device. -metadata: { "openclaw": { "emoji": "📱", "primaryEnv": "MOBILERUN_API_KEY", "requires": { "env": ["MOBILERUN_API_KEY"], "bins": ["curl"] } } } +metadata: { "openclaw": { "emoji": "📱", "primaryEnv": "MOBILERUN_API_KEY", "requires": { "env": ["MOBILERUN_API_KEY"], "bins": ["curl", "jq"] } } } --- # Mobilerun