DishBot: Robotic Dishwashing with Vision and 3D Reconstruction

A comprehensive robotic dishwashing system combining computer vision, 3D geometry reconstruction, and manipulation planning.

Overview

DishBot is a complete pipeline for autonomous dishwashing robots that includes:

Vision Module: Semantic dish detection using Qwen2-VL vision-language model
3D Reconstruction: RGBD to point cloud conversion with Open3D
Grasp Planning: Multiple grasp strategies (top-down, side, rim, pinch)
Simulation: NVIDIA Isaac Sim integration with domain randomization
Robot Control: Franka Panda arm control with inverse kinematics
Training Pipeline: ML-based grasp success prediction

Architecture

Pipeline Flow:
Vision (Qwen2-VL) → 3D Reconstruction → Grasp Planning → Sim-to-Real → Robot Control

Components

dishbot/
├── src/dishbot/
│   ├── __init__.py              # Package initialization
│   ├── config.py                # Configuration management
│   ├── vision_module.py         # Qwen2-VL + 3D reconstruction
│   ├── grasp_planning.py        # Grasp pose generation
│   ├── isaac_sim_env.py         # Isaac Sim environment
│   ├── robot_controller.py      # Robot arm control
│   ├── training_pipeline.py     # ML training
│   └── main.py                  # Entry point
├── configs/                      # Configuration files
├── data/                         # Training data
├── checkpoints/                  # Model checkpoints
├── tests/                        # Unit tests
├── pyproject.toml               # Project configuration
└── README.md                    # This file

Installation

Prerequisites

CPython 3.10+ (PyPy is NOT supported - PyTorch requires CPython)
CUDA 11.8+ (for GPU acceleration on Linux/Windows)
NVIDIA Isaac Sim (optional, for real simulation)

Basic Installation

# Clone the repository
git clone https://github.com/dishbot/dishbot.git
cd dishbot

# Create virtual environment with CPython (not PyPy!)
# If you have multiple Python versions, specify the path explicitly:
#   /usr/local/bin/python3.10 -m venv .venv
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Verify you're using CPython (should NOT say PyPy)
python --version

# Install PyTorch FIRST (required before installing dishbot)
# See https://pytorch.org/get-started/locally/ for other configurations

# macOS (CPU only):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# Linux/Windows with CUDA 11.8:
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Linux/Windows with CUDA 12.1:
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install the package
pip install -e .

Install with Development Tools

# PyTorch must be installed first (see above)
pip install -e ".[dev]"

Install with Visualization

# PyTorch must be installed first (see above)
pip install -e ".[visualization]"

Install All Optional Dependencies

# PyTorch must be installed first (see above)
pip install -e ".[all]"

Isaac Sim Installation

Isaac Sim requires Linux with an NVIDIA GPU. There are two options:

Option 1: Docker (Recommended for macOS/Windows or Linux without Omniverse)

Use our Docker setup to run Isaac Sim in a container. This is the recommended approach for:

macOS users (Isaac Sim doesn't support macOS natively)
Windows users without WSL2 GPU support
Clean isolation of the simulation environment
CI/CD pipelines

See Docker Setup below for detailed instructions.

Option 2: Native Installation (Linux only)

For native Linux installation with NVIDIA GPU:

Download and install NVIDIA Omniverse
Install Isaac Sim from the Omniverse Launcher
Follow the Isaac Sim Python setup guide

Docker Setup

The Docker setup allows running Isaac Sim on any machine with access to an NVIDIA GPU, including remote Linux servers.

Prerequisites

NVIDIA GPU: A CUDA-capable GPU (RTX 2070 or better recommended)
NVIDIA Driver: Version 525.60 or higher
Docker: Version 19.03 or higher
NVIDIA Container Toolkit: For GPU access in containers

Installing NVIDIA Container Toolkit (Linux)

# Add NVIDIA package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker

# Verify GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Building the Docker Image

# Build the image
docker compose build

# Or build directly with docker
docker build -t dishbot:latest .

Running with Docker Compose

# Show available commands
docker compose run dishbot --help

# Run demo with Isaac Sim
docker compose run dishbot demo

# Generate training data
docker compose run dishbot train --generate-data --num-samples 1000

# Train the model
docker compose run training

# Interactive development shell
docker compose run dev

# Start with WebRTC livestream (view at http://localhost:8211)
docker compose up livestream

Running with Docker Directly

# Run demo
docker run --gpus all -v $(pwd)/data:/workspace/data dishbot:latest demo

# Interactive shell
docker run --gpus all -it dishbot:latest shell

# Run custom Python script
docker run --gpus all -v $(pwd):/workspace dishbot:latest python my_script.py

# With livestream enabled
docker run --gpus all -p 8211:8211 dishbot:latest livestream

Remote Development (macOS → Linux Server)

If you're on macOS and have access to a Linux server with an NVIDIA GPU:

On the Linux server, clone the repository and build the Docker image:

git clone https://github.com/dishbot/dishbot.git
cd dishbot
docker compose build

SSH tunnel for livestream access:

# On your Mac
ssh -L 8211:localhost:8211 user@linux-server

Run with livestream on the server:
```
docker compose up livestream
```
View in browser at http://localhost:8211 on your Mac

Environment Variables

Variable	Description	Default
`DISHBOT_HEADLESS`	Run without display	`1`
`DISHBOT_LIVESTREAM`	Enable WebRTC streaming	`0`
`DISHBOT_CONFIG`	Path to config file	-

Volumes

The following directories are mounted by default:

Host Path	Container Path	Purpose
`./src`	`/workspace/src`	Source code
`./configs`	`/workspace/configs`	Configuration files
`./data`	`/workspace/data`	Training data
`./checkpoints`	`/workspace/checkpoints`	Model weights
`./outputs`	`/workspace/outputs`	Generated outputs

Troubleshooting Docker

GPU not detected:

# Verify NVIDIA driver
nvidia-smi

# Check Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Permission issues (Linux):

# Ensure your user can run Docker
sudo usermod -aG docker $USER
# Log out and back in

# Fix volume permissions (optional; UID/GID match host)
export UID=$(id -u)
export GID=$(id -g)
docker compose build

Windows – "group id / GID already exists":
The Dockerfile is set up so that if the base image already has a group with the given GID (e.g. 1000), the build reuses that group instead of creating a new one. You can build without setting UID/GID; defaults (1000) work. If you still see the error, set a different GID before building (e.g. in PowerShell: $env:GID=10000 then docker compose build).

Out of memory:

# Increase shared memory for training
docker compose run --shm-size=16gb training

Quick Start

Run Demo (Mock Simulation)

dishbot demo

Run Demo (Real Isaac Sim)

dishbot demo --real-sim

Generate Training Data

dishbot train --generate-data --num-samples 10000

Train Grasp Predictor

dishbot train --num-epochs 100

Evaluate Model

dishbot evaluate --checkpoint checkpoints/checkpoint_best.pt

Run Full Pipeline

dishbot run

Configuration

DishBot uses a hierarchical configuration system. You can customize settings via:

YAML files: Pass --config path/to/config.yaml
Environment variables: Use DISHBOT_ prefix (e.g., DISHBOT_VISION__MODEL_NAME)
Command line arguments: Override specific settings

Example Configuration

# configs/default.yaml
vision:
  model_name: "Qwen/Qwen2-VL-7B-Instruct"
  device: "auto"
  torch_dtype: "float16"
  voxel_size: 0.005
  dbscan_eps: 0.02

grasp:
  gripper_width: 0.08
  approach_distance: 0.1
  stability_weight: 0.4

simulation:
  headless: false
  physics_dt: 0.00833
  enable_domain_randomization: true

training:
  batch_size: 32
  learning_rate: 0.0001
  num_epochs: 100

Usage Examples

Python API

from dishbot import (
    DishBotConfig,
    DishVisionSystem,
    GraspPlanner,
    IsaacSimDishwashingEnv,
    DishwashingRobotController,
)

# Initialize configuration
config = DishBotConfig()

# Create vision system
vision = DishVisionSystem(
    vision_config=config.vision,
    camera_config=config.camera,
)

# Reconstruct 3D from RGBD
point_cloud = vision.reconstruct_3d_geometry(rgb_image, depth_image)

# Segment dishes
dishes = vision.segment_individual_dishes(point_cloud)

# Plan grasps
planner = GraspPlanner(config=config.grasp)
for dish in dishes:
    grasps = planner.compute_grasp_candidates(dish)
    best_grasp = planner.select_best_grasp(grasps)

Using the Trained Predictor

from dishbot import GraspTrainingPipeline

# Load trained model
pipeline = GraspTrainingPipeline()
pipeline.load_model("checkpoints/checkpoint_best.pt")

# Predict grasp success
success_prob = pipeline.predict(
    grasp_pose,
    object_center,
    object_extent,
    object_type_id,
)

Development

Running Tests

pytest tests/ -v

Code Formatting

black src/
ruff check src/ --fix

Type Checking

mypy src/dishbot/

Project Structure Details

Vision Module

The DishVisionSystem class provides:

Semantic dish detection using Qwen2-VL
RGBD to point cloud conversion
DBSCAN clustering for dish segmentation
Dish type classification from geometry

Grasp Planning

The GraspPlanner supports multiple strategies:

Top-down: For flat objects (plates)
Side grasp: For tall objects (cups, glasses)
Rim grasp: For containers (bowls)
Pinch grasp: For thin objects (utensils)

Simulation

The IsaacSimDishwashingEnv provides:

Configurable sink scene
Random dish spawning
RGBD camera observations
Domain randomization
Mock environment for development

Robot Control

The DishwashingRobotController offers:

Forward/inverse kinematics
Trajectory generation
Grasp execution
Pick and place operations

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use DishBot in your research, please cite:

@software{dishbot2026,
  title={DishBot: Robotic Dishwashing with Vision and 3D Reconstruction},
  year={2026},
  url={https://github.com/dishbot/dishbot}
}

Acknowledgments

Qwen2-VL for vision-language understanding
Open3D for 3D geometry processing
NVIDIA Isaac Sim for robot simulation
Franka Emika for the Panda robot model

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
docker		docker
src		src
tests		tests
Dockerfile		Dockerfile
README.md		README.md
Robotic dishwashing with vision and 3D reconstruction - Claude.mhtml		Robotic dishwashing with vision and 3D reconstruction - Claude.mhtml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

DishBot: Robotic Dishwashing with Vision and 3D Reconstruction

Overview

Architecture

Components

Installation

Prerequisites

Basic Installation

Install with Development Tools

Install with Visualization

Install All Optional Dependencies

Isaac Sim Installation

Option 1: Docker (Recommended for macOS/Windows or Linux without Omniverse)

Option 2: Native Installation (Linux only)

Docker Setup

Prerequisites

Installing NVIDIA Container Toolkit (Linux)

Building the Docker Image

Running with Docker Compose

Running with Docker Directly

Remote Development (macOS → Linux Server)

Environment Variables

Volumes

Troubleshooting Docker

Quick Start

Run Demo (Mock Simulation)

Run Demo (Real Isaac Sim)

Generate Training Data

Train Grasp Predictor

Evaluate Model

Run Full Pipeline

Configuration

Example Configuration

Usage Examples

Python API

Using the Trained Predictor

Development

Running Tests

Code Formatting

Type Checking

Project Structure Details

Vision Module

Grasp Planning

Simulation

Robot Control

Contributing

License

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages