A comprehensive robotic dishwashing system combining computer vision, 3D geometry reconstruction, and manipulation planning.
DishBot is a complete pipeline for autonomous dishwashing robots that includes:
- Vision Module: Semantic dish detection using Qwen2-VL vision-language model
- 3D Reconstruction: RGBD to point cloud conversion with Open3D
- Grasp Planning: Multiple grasp strategies (top-down, side, rim, pinch)
- Simulation: NVIDIA Isaac Sim integration with domain randomization
- Robot Control: Franka Panda arm control with inverse kinematics
- Training Pipeline: ML-based grasp success prediction
Pipeline Flow:
Vision (Qwen2-VL) → 3D Reconstruction → Grasp Planning → Sim-to-Real → Robot Control
dishbot/
├── src/dishbot/
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration management
│ ├── vision_module.py # Qwen2-VL + 3D reconstruction
│ ├── grasp_planning.py # Grasp pose generation
│ ├── isaac_sim_env.py # Isaac Sim environment
│ ├── robot_controller.py # Robot arm control
│ ├── training_pipeline.py # ML training
│ └── main.py # Entry point
├── configs/ # Configuration files
├── data/ # Training data
├── checkpoints/ # Model checkpoints
├── tests/ # Unit tests
├── pyproject.toml # Project configuration
└── README.md # This file
- CPython 3.10+ (PyPy is NOT supported - PyTorch requires CPython)
- CUDA 11.8+ (for GPU acceleration on Linux/Windows)
- NVIDIA Isaac Sim (optional, for real simulation)
# Clone the repository
git clone https://github.com/dishbot/dishbot.git
cd dishbot
# Create virtual environment with CPython (not PyPy!)
# If you have multiple Python versions, specify the path explicitly:
# /usr/local/bin/python3.10 -m venv .venv
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Verify you're using CPython (should NOT say PyPy)
python --version
# Install PyTorch FIRST (required before installing dishbot)
# See https://pytorch.org/get-started/locally/ for other configurations
# macOS (CPU only):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Linux/Windows with CUDA 11.8:
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Linux/Windows with CUDA 12.1:
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install the package
pip install -e .# PyTorch must be installed first (see above)
pip install -e ".[dev]"# PyTorch must be installed first (see above)
pip install -e ".[visualization]"# PyTorch must be installed first (see above)
pip install -e ".[all]"Isaac Sim requires Linux with an NVIDIA GPU. There are two options:
Use our Docker setup to run Isaac Sim in a container. This is the recommended approach for:
- macOS users (Isaac Sim doesn't support macOS natively)
- Windows users without WSL2 GPU support
- Clean isolation of the simulation environment
- CI/CD pipelines
See Docker Setup below for detailed instructions.
For native Linux installation with NVIDIA GPU:
- Download and install NVIDIA Omniverse
- Install Isaac Sim from the Omniverse Launcher
- Follow the Isaac Sim Python setup guide
The Docker setup allows running Isaac Sim on any machine with access to an NVIDIA GPU, including remote Linux servers.
- NVIDIA GPU: A CUDA-capable GPU (RTX 2070 or better recommended)
- NVIDIA Driver: Version 525.60 or higher
- Docker: Version 19.03 or higher
- NVIDIA Container Toolkit: For GPU access in containers
# Add NVIDIA package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
# Verify GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi# Build the image
docker compose build
# Or build directly with docker
docker build -t dishbot:latest .# Show available commands
docker compose run dishbot --help
# Run demo with Isaac Sim
docker compose run dishbot demo
# Generate training data
docker compose run dishbot train --generate-data --num-samples 1000
# Train the model
docker compose run training
# Interactive development shell
docker compose run dev
# Start with WebRTC livestream (view at http://localhost:8211)
docker compose up livestream# Run demo
docker run --gpus all -v $(pwd)/data:/workspace/data dishbot:latest demo
# Interactive shell
docker run --gpus all -it dishbot:latest shell
# Run custom Python script
docker run --gpus all -v $(pwd):/workspace dishbot:latest python my_script.py
# With livestream enabled
docker run --gpus all -p 8211:8211 dishbot:latest livestreamIf you're on macOS and have access to a Linux server with an NVIDIA GPU:
-
On the Linux server, clone the repository and build the Docker image:
git clone https://github.com/dishbot/dishbot.git cd dishbot docker compose build -
SSH tunnel for livestream access:
# On your Mac ssh -L 8211:localhost:8211 user@linux-server -
Run with livestream on the server:
docker compose up livestream
-
View in browser at
http://localhost:8211on your Mac
| Variable | Description | Default |
|---|---|---|
DISHBOT_HEADLESS |
Run without display | 1 |
DISHBOT_LIVESTREAM |
Enable WebRTC streaming | 0 |
DISHBOT_CONFIG |
Path to config file | - |
The following directories are mounted by default:
| Host Path | Container Path | Purpose |
|---|---|---|
./src |
/workspace/src |
Source code |
./configs |
/workspace/configs |
Configuration files |
./data |
/workspace/data |
Training data |
./checkpoints |
/workspace/checkpoints |
Model weights |
./outputs |
/workspace/outputs |
Generated outputs |
GPU not detected:
# Verify NVIDIA driver
nvidia-smi
# Check Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smiPermission issues (Linux):
# Ensure your user can run Docker
sudo usermod -aG docker $USER
# Log out and back in
# Fix volume permissions (optional; UID/GID match host)
export UID=$(id -u)
export GID=$(id -g)
docker compose buildWindows – "group id / GID already exists":
The Dockerfile is set up so that if the base image already has a group with the given GID (e.g. 1000), the build reuses that group instead of creating a new one. You can build without setting UID/GID; defaults (1000) work. If you still see the error, set a different GID before building (e.g. in PowerShell: $env:GID=10000 then docker compose build).
Out of memory:
# Increase shared memory for training
docker compose run --shm-size=16gb trainingdishbot demodishbot demo --real-simdishbot train --generate-data --num-samples 10000dishbot train --num-epochs 100dishbot evaluate --checkpoint checkpoints/checkpoint_best.ptdishbot runDishBot uses a hierarchical configuration system. You can customize settings via:
- YAML files: Pass
--config path/to/config.yaml - Environment variables: Use
DISHBOT_prefix (e.g.,DISHBOT_VISION__MODEL_NAME) - Command line arguments: Override specific settings
# configs/default.yaml
vision:
model_name: "Qwen/Qwen2-VL-7B-Instruct"
device: "auto"
torch_dtype: "float16"
voxel_size: 0.005
dbscan_eps: 0.02
grasp:
gripper_width: 0.08
approach_distance: 0.1
stability_weight: 0.4
simulation:
headless: false
physics_dt: 0.00833
enable_domain_randomization: true
training:
batch_size: 32
learning_rate: 0.0001
num_epochs: 100from dishbot import (
DishBotConfig,
DishVisionSystem,
GraspPlanner,
IsaacSimDishwashingEnv,
DishwashingRobotController,
)
# Initialize configuration
config = DishBotConfig()
# Create vision system
vision = DishVisionSystem(
vision_config=config.vision,
camera_config=config.camera,
)
# Reconstruct 3D from RGBD
point_cloud = vision.reconstruct_3d_geometry(rgb_image, depth_image)
# Segment dishes
dishes = vision.segment_individual_dishes(point_cloud)
# Plan grasps
planner = GraspPlanner(config=config.grasp)
for dish in dishes:
grasps = planner.compute_grasp_candidates(dish)
best_grasp = planner.select_best_grasp(grasps)from dishbot import GraspTrainingPipeline
# Load trained model
pipeline = GraspTrainingPipeline()
pipeline.load_model("checkpoints/checkpoint_best.pt")
# Predict grasp success
success_prob = pipeline.predict(
grasp_pose,
object_center,
object_extent,
object_type_id,
)pytest tests/ -vblack src/
ruff check src/ --fixmypy src/dishbot/The DishVisionSystem class provides:
- Semantic dish detection using Qwen2-VL
- RGBD to point cloud conversion
- DBSCAN clustering for dish segmentation
- Dish type classification from geometry
The GraspPlanner supports multiple strategies:
- Top-down: For flat objects (plates)
- Side grasp: For tall objects (cups, glasses)
- Rim grasp: For containers (bowls)
- Pinch grasp: For thin objects (utensils)
The IsaacSimDishwashingEnv provides:
- Configurable sink scene
- Random dish spawning
- RGBD camera observations
- Domain randomization
- Mock environment for development
The DishwashingRobotController offers:
- Forward/inverse kinematics
- Trajectory generation
- Grasp execution
- Pick and place operations
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use DishBot in your research, please cite:
@software{dishbot2026,
title={DishBot: Robotic Dishwashing with Vision and 3D Reconstruction},
year={2026},
url={https://github.com/dishbot/dishbot}
}- Qwen2-VL for vision-language understanding
- Open3D for 3D geometry processing
- NVIDIA Isaac Sim for robot simulation
- Franka Emika for the Panda robot model