Skip to content

Commit b139235

Browse files
Update ADK doc according to issue #796 - 2 (#800)
* docs: Add documentation for ComputerUseToolset * docs: update Computer Use Toolset docs draft * furhter updates --------- Co-authored-by: Joe Fernandez <[email protected]> Co-authored-by: Joe Fernandez <[email protected]>
1 parent 0db5a9a commit b139235

File tree

2 files changed

+103
-0
lines changed

2 files changed

+103
-0
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Computer Use Toolset with Gemini
2+
3+
The Computer Use Toolset allows an agent to operate a user interface
4+
of a computer, such as a browsers, to complete tasks. This tool uses
5+
a specific Gemini model and the [Playwright](https://playwright.dev/)
6+
testing tool to control a Chromium browser and can interact with
7+
web pages by taking screenshots, clicking, typing, and navigating.
8+
9+
For more information about the computer use model, see
10+
Gemini API [Computer use](https://ai.google.dev/gemini-api/docs/computer-use)
11+
or the Google Cloud Vertex AI API
12+
[Computer use](https://cloud.google.com/vertex-ai/generative-ai/docs/computer-use).
13+
14+
!!! example "Preview release"
15+
The Computer Use model and tool is a Preview release. For
16+
more information, see the
17+
[launch stage descriptions](https://cloud.google.com/products#product-launch-stages).
18+
19+
## Setup
20+
21+
You must install Playwright and its dependencies, including Chromium,
22+
to be able to use the Computer Use Toolset.
23+
24+
??? tip "Recommended: create and activate a Python virtual environment"
25+
26+
Create a Python virtual environment:
27+
28+
```shell
29+
python -m venv .venv
30+
```
31+
32+
Activate the Python virtual environment:
33+
34+
=== "Windows CMD"
35+
36+
```console
37+
.venv\Scripts\activate.bat
38+
```
39+
40+
=== "Windows Powershell"
41+
42+
```console
43+
.venv\Scripts\Activate.ps1
44+
```
45+
46+
=== "MacOS / Linux"
47+
48+
```bash
49+
source .venv/bin/activate
50+
```
51+
52+
To set up the required software libraries for the Computer Use Toolset:
53+
54+
1. Install Python dependencies:
55+
```console
56+
pip install termcolor==3.1.0
57+
pip install playwright==1.52.0
58+
pip install browserbase==1.3.0
59+
pip install rich
60+
```
61+
2. Install the Playwright dependencies, including the Chromium browser:
62+
```console
63+
playwright install-deps chromium
64+
playwright install chromium
65+
```
66+
67+
## Use the tool
68+
69+
Use the Computer Use Toolset by adding it as a tool to your agent. When you
70+
configure the tool, you must provide a implementation of the `BaseComputer`
71+
class which defines an interface for an agent to use a computer. In the
72+
following example, the `PlaywrightComputer` class is defined for this purpose.
73+
You can find the code for this implementation in `playwright.py` file of the
74+
[computer_use](https://github.com/google/adk-python/blob/main/contributing/samples/computer_use/playwright.py)
75+
agent sample project.
76+
77+
```python
78+
from google.adk import Agent
79+
from google.adk.models.google_llm import Gemini
80+
from google.adk.tools.computer_use.computer_use_toolset import ComputerUseToolset
81+
from typing_extensions import override
82+
83+
from .playwright import PlaywrightComputer
84+
85+
root_agent = Agent(
86+
model='gemini-2.5-computer-use-preview-10-2025',
87+
name='hello_world_agent',
88+
description=(
89+
'computer use agent that can operate a browser on a computer to finish'
90+
' user tasks'
91+
),
92+
instruction='you are a computer use agent',
93+
tools=[
94+
ComputerUseToolset(computer=PlaywrightComputer(screen_size=(1280, 936)))
95+
],
96+
)
97+
```
98+
99+
For a complete code example, see the
100+
[computer_use](https://github.com/google/adk-python/tree/main/contributing/samples/computer_use)
101+
agent sample project.

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,8 @@ nav:
154154
- Tools for Agents:
155155
- tools/index.md
156156
- Built-in tools: tools/built-in-tools.md
157+
- Gemini API tools:
158+
- Computer use: tools/gemini-api/computer-use.md
157159
- Google Cloud tools:
158160
- Overview: tools/google-cloud-tools.md
159161
- Code Execution with Agent Engine: tools/google-cloud/code-exec-agent-engine.md

0 commit comments

Comments
 (0)