Skip to content

Latest commit

 

History

History
72 lines (52 loc) · 3.05 KB

README.md

File metadata and controls

72 lines (52 loc) · 3.05 KB

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

📢News

🔥[2025-03-12] The code has been updated.

🔥[2025-02-21] We have released an updated version of PC-Agent. Check the paper for details. The code will be updated soon.

🔥[2024-08-23] We have released the code of PC-Agent, supporting both Mac and Windows platforms.

📺Demo

Download.paper.from.Chorme.mp4
Search.NBA.FMVP.and.send.to.friend.mp4
Write.an.introduction.of.Alibaba.in.Word.mp4

📋Introduction

  • PC-Agent is a multi-agent collaboration system, which can achieve automated control of productivity scenarios (e.g. Chrome, Word, and WeChat) based on user instructions.
  • Active perception module designed for dense and diverse interactive elements are better adapted to the PC platform.
  • The hierarchical multi-agent cooperative structure improves the success rate of more complex task sequences.

🔧Getting Started

Installation

Now both Windows and Mac are supported.

conda create --name pcagent python=3.10
source activate pcagent

# For Windows
pip install -r requirements.txt

# For Mac
pip install -r requirements_mac.txt

git clone https://github.com/Topdu/OpenOCR.git
pip install openocr-python

Configuration

Edit config.json to add your API keys and customize settings:

# API configuration
{
  "vl_model_name": "GPT-4o",
  "llm_model_name": "GPT-4o",
  "token": "sk-...", # Replace with your actual API key
  "url": "https://api.openai.com/v1"
}

Test on your computer

  1. Run the run.py with your instruction and your GPT-4o api token. For example,
# For Windows
python run.py --instruction="Open Chrome and search the PC-Agent paper." --Mac 0

# For Mac
python run.py --instruction="Open Chrome and search the PC-Agent paper." --Mac 1
  1. Optionally, you can add specific operational knowledge via the --add_info option to help PC-Agent operate more accurately.

  2. To further improve the operation efficiency of PC-Agent, you can set --disable_reflection to skip the reflection process. Note that this may reduce the success rate of the operation.

  3. If the task is not very complex, you can set --simple 1 to skip the task decomposition.