Skip to content

Commit f71a975

Browse files
committed
Add new tasks and agents
1 parent f582817 commit f71a975

File tree

70 files changed

+1501
-147
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+1501
-147
lines changed

crab-benchmark-v0/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,7 @@ After setting up the environment, you can start the experiment. A brief overview
2929
2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.
3030
3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](./dataset/handmade_tasks.py). The task is: "Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description."
3131
4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.
32+
33+
#### Model
34+
35+
For open source models, we use [VLLM](https://github.com/vllm-project/vllm) to host Pixtral model, check [here](https://docs.vllm.ai/en/latest/models/vlm.html#online-inference) for the setup commands; [SGLang](https://github.com/sgl-project/sglang) to host LLaVa-OneVision model, check [here](https://github.com/sgl-project/sglang?tab=readme-ov-file#supported-models) for the setup commands.

crab-benchmark-v0/android_env.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from crab import EnvironmentConfig
1515
from crab.actions.android_actions import (
1616
key_press,
17+
long_tap,
1718
open_app_drawer,
1819
screenshot,
1920
setup,
@@ -24,7 +25,7 @@
2425

2526
ANDROID_ENV = EnvironmentConfig(
2627
name="android",
27-
action_space=[tap, key_press, write_text, swipe, open_app_drawer],
28+
action_space=[tap, key_press, long_tap, write_text, swipe, open_app_drawer],
2829
observation_space=[screenshot],
2930
description="""A Google Pixel smartphone runs on the Android operating system. \
3031
The interface displays a current screenshot at each step and primarily \
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{
2+
"description": "In the Android operating system, use the \"Google Map\" app to find the city name corresponding to the postal code \"63002\" in South Korea, then use the \"Calendar\" app to add a new all-day event for 1 January 2025 with the text of the found city name.",
3+
"tasks": [
4+
{
5+
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
6+
"attribute": {
7+
"number": "63002",
8+
"country": "South Korea"
9+
},
10+
"output": "Jeju"
11+
},
12+
{
13+
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ac",
14+
"attribute": {
15+
"content": "Jeju",
16+
"date": "1 January 2025"
17+
},
18+
"output": null
19+
}
20+
],
21+
"adjlist": "0 1\n1",
22+
"id": "1005c437-50d1-465a-b3fc-833098b22bfc"
23+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"description": "In Android, use the \"Google Map\" app to find the city name for the postal code \"2770885\" in Japan, and then, using the \"Keep Notes\" app, create a new note without a title to record the city name you found.",
3+
"tasks": [
4+
{
5+
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
6+
"attribute": {
7+
"number": "2770885",
8+
"country": "Japan"
9+
},
10+
"output": "Chiba"
11+
},
12+
{
13+
"task": "eb92a1e6-4c86-4d56-baac-95fc8397732e",
14+
"attribute": {
15+
"content": "Chiba"
16+
},
17+
"output": null
18+
}
19+
],
20+
"adjlist": "0 1\n1",
21+
"id": "12333aa0-e76d-4a5c-8657-9f897f62f62d"
22+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"description": "In Android, using the \"Contacts\" app, find the email of the contact named John Lauphin, then using the \"Gmail\" app, send an email to that contact with the subject \"Hello John.\"",
3+
"tasks": [
4+
{
5+
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ap",
6+
"attribute": {
7+
"name": "John Lauphin"
8+
},
9+
"output": "[email protected]"
10+
},
11+
{
12+
"task": "0090f116-e02b-4562-a20d-b5df38be963a",
13+
"attribute": {
14+
"content": "Hello John",
15+
16+
},
17+
"output": null
18+
}
19+
],
20+
"adjlist": "0 1\n1",
21+
"id": "2ade6a13-c7a6-4df7-8c62-77382687369e"
22+
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"description": "In Android, Using Google Map app, Find the city name of corresponding post code \"1010021\" in the country \"Japan\".",
3+
"tasks": [
4+
{
5+
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
6+
"attribute": {
7+
"country": "Japan",
8+
"number": "101-0021"
9+
},
10+
"output": "Tokyo"
11+
}
12+
],
13+
"adjlist": "0",
14+
"id": "4190c90c-b28c-4bb3-ab5c-af3c4fde0a3d"
15+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"description": "Open the calendar app in the Android system and find the title of an event on the date \"17 August 2024,\" then using the \"Google Drive\" app on the same Android device, create a new folder with the founded name",
3+
"tasks": [
4+
{
5+
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
6+
"attribute": {
7+
"date": "17 August 2024"
8+
},
9+
"output": "Travel to Paris"
10+
},
11+
{
12+
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ar",
13+
"attribute": {
14+
"content": "Travel to Paris"
15+
},
16+
"output": null
17+
}
18+
],
19+
"adjlist": "0 1\n1",
20+
"id": "483fbf9c-dc78-4ac2-9264-53c4f617f6cc"
21+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"description": "In the Android system, use the calendar app to find the title of an event on the date \"16 July 2024,\".",
3+
"tasks": [
4+
{
5+
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
6+
"attribute": {
7+
"date": "16 July 2024"
8+
},
9+
"output": "Japan"
10+
}
11+
],
12+
"adjlist": "0",
13+
"id": "4893a9b0-6477-495d-a73c-32503326e24a"
14+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"description": "In the Android system, use the calendar app to find the title of an event on the date \"16 July 2024,\" then, using the Google Map app, find the city name of the corresponding post code \"113-8654\" in the country with same name as title.",
3+
"tasks": [
4+
{
5+
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
6+
"attribute": {
7+
"date": "16 July 2024"
8+
},
9+
"output": "Japan"
10+
},
11+
{
12+
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
13+
"attribute": {
14+
"number": "113-8654",
15+
"country": "Japan"
16+
},
17+
"output": null
18+
}
19+
],
20+
"adjlist": "0 1\n1",
21+
"id": "53010c40-dce4-4d72-a856-842c21059e2b"
22+
}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{
2+
"description": "Using the \"Google Map\" app on Android, find the distance of the shortest route from \"National University of Singapore\" to \"Nanyang Technology University,\" then using the \"Calendar\" app, add a new event with the text representing the found distance on the date 21 June 2024 as an all-day event.",
3+
"tasks": [
4+
{
5+
"task": "1a1b72d7-78c9-4027-8278-86083ae01045",
6+
"attribute": {
7+
"place_name_1": "National University of Singapore",
8+
"place_name_2": "Nanyang Technology University"
9+
},
10+
"output": "13km"
11+
},
12+
{
13+
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ac",
14+
"attribute": {
15+
"content": "13km",
16+
"date": "21 June 2024"
17+
},
18+
"output": null
19+
}
20+
],
21+
"adjlist": "0 1\n1",
22+
"id": "71ef7fd2-0ae3-49c8-8238-06b7aa985d25"
23+
}

0 commit comments

Comments
 (0)