|
| 1 | +# Computer Use Toolset with Gemini |
| 2 | + |
| 3 | +The Computer Use Toolset allows an agent to operate a user interface |
| 4 | +of a computer, such as a browsers, to complete tasks. This tool uses |
| 5 | +a specific Gemini model and the [Playwright](https://playwright.dev/) |
| 6 | +testing tool to control a Chromium browser and can interact with |
| 7 | +web pages by taking screenshots, clicking, typing, and navigating. |
| 8 | + |
| 9 | +For more information about the computer use model, see |
| 10 | +Gemini API [Computer use](https://ai.google.dev/gemini-api/docs/computer-use) |
| 11 | +or the Google Cloud Vertex AI API |
| 12 | +[Computer use](https://cloud.google.com/vertex-ai/generative-ai/docs/computer-use). |
| 13 | + |
| 14 | +!!! example "Preview release" |
| 15 | + The Computer Use model and tool is a Preview release. For |
| 16 | + more information, see the |
| 17 | + [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). |
| 18 | + |
| 19 | +## Setup |
| 20 | + |
| 21 | +You must install Playwright and its dependencies, including Chromium, |
| 22 | +to be able to use the Computer Use Toolset. |
| 23 | + |
| 24 | +??? tip "Recommended: create and activate a Python virtual environment" |
| 25 | + |
| 26 | + Create a Python virtual environment: |
| 27 | + |
| 28 | + ```shell |
| 29 | + python -m venv .venv |
| 30 | + ``` |
| 31 | + |
| 32 | + Activate the Python virtual environment: |
| 33 | + |
| 34 | + === "Windows CMD" |
| 35 | + |
| 36 | + ```console |
| 37 | + .venv\Scripts\activate.bat |
| 38 | + ``` |
| 39 | + |
| 40 | + === "Windows Powershell" |
| 41 | + |
| 42 | + ```console |
| 43 | + .venv\Scripts\Activate.ps1 |
| 44 | + ``` |
| 45 | + |
| 46 | + === "MacOS / Linux" |
| 47 | + |
| 48 | + ```bash |
| 49 | + source .venv/bin/activate |
| 50 | + ``` |
| 51 | + |
| 52 | +To set up the required software libraries for the Computer Use Toolset: |
| 53 | + |
| 54 | +1. Install Python dependencies: |
| 55 | + ```console |
| 56 | + pip install termcolor==3.1.0 |
| 57 | + pip install playwright==1.52.0 |
| 58 | + pip install browserbase==1.3.0 |
| 59 | + pip install rich |
| 60 | + ``` |
| 61 | +2. Install the Playwright dependencies, including the Chromium browser: |
| 62 | + ```console |
| 63 | + playwright install-deps chromium |
| 64 | + playwright install chromium |
| 65 | + ``` |
| 66 | + |
| 67 | +## Use the tool |
| 68 | + |
| 69 | +Use the Computer Use Toolset by adding it as a tool to your agent. When you |
| 70 | +configure the tool, you must provide a implementation of the `BaseComputer` |
| 71 | +class which defines an interface for an agent to use a computer. In the |
| 72 | +following example, the `PlaywrightComputer` class is defined for this purpose. |
| 73 | +You can find the code for this implementation in `playwright.py` file of the |
| 74 | +[computer_use](https://github.com/google/adk-python/blob/main/contributing/samples/computer_use/playwright.py) |
| 75 | +agent sample project. |
| 76 | + |
| 77 | +```python |
| 78 | +from google.adk import Agent |
| 79 | +from google.adk.models.google_llm import Gemini |
| 80 | +from google.adk.tools.computer_use.computer_use_toolset import ComputerUseToolset |
| 81 | +from typing_extensions import override |
| 82 | + |
| 83 | +from .playwright import PlaywrightComputer |
| 84 | + |
| 85 | +root_agent = Agent( |
| 86 | + model='gemini-2.5-computer-use-preview-10-2025', |
| 87 | + name='hello_world_agent', |
| 88 | + description=( |
| 89 | + 'computer use agent that can operate a browser on a computer to finish' |
| 90 | + ' user tasks' |
| 91 | + ), |
| 92 | + instruction='you are a computer use agent', |
| 93 | + tools=[ |
| 94 | + ComputerUseToolset(computer=PlaywrightComputer(screen_size=(1280, 936))) |
| 95 | + ], |
| 96 | +) |
| 97 | +``` |
| 98 | + |
| 99 | +For a complete code example, see the |
| 100 | +[computer_use](https://github.com/google/adk-python/tree/main/contributing/samples/computer_use) |
| 101 | +agent sample project. |
0 commit comments