A ROS1 package for perception-driven intent recognition and workspace layout optimization to support human-robot collaboration with a Sawyer robot.
- Operating Systems: Ubuntu 20.04 LTS (tested)
- Software Dependencies:
- Programming language: Python 3.8+
- ROS: ROS Noetic (catkin workspace)
- Major ROS packages: rospy,tf,tf2_ros,cv_bridge,message_filters,vision_msgs,sensor_msgs,geometry_msgs,visualization_msgs
- Other ROS stacks used at runtime:
- realsense2_camera(Intel RealSense depth camera)
- relaxed_ik_ros1(inverse kinematics teleop/planning)
- intera_interface(Sawyer robot SDK)
 
 
- Python libraries (installed via requirements.txt): numpy, opencv-python, mediapipe (optional), ultralytics (YOLO), matplotlib, pandas, shapely, tqdm, torch, pyribs, rospkg, catkin_pkg
- Hardware:
- Rethink Robotics Sawyer with electric gripper
- Intel RealSense depth camera (e.g., D435 series)
- NVIDIA GPU recommended for YOLO inference
 
- Set up ROS Noetic and a catkin workspace
sudo apt update
# Install ROS Noetic (see ROS docs if not installed)
# sudo apt install ros-noetic-desktop-full
# ROS dependencies used by this package
sudo apt install -y \
  ros-noetic-vision-msgs \
  ros-noetic-cv-bridge \
  ros-noetic-tf \
  ros-noetic-tf2-ros \
  ros-noetic-message-filters \
  ros-noetic-joy
# Optional dependencies from other stacks used in launches
sudo apt install -y ros-noetic-realsense2-camera- Clone this package into your catkin workspace and build
cd /catkin_ws
# If not already present, clone dependent stacks (examples):
# git clone https://github.com/IntelRealSense/realsense-ros src/realsense-ros
# git clone https://github.com/uwgraphics/relaxed_ik_ros1 src/relaxed_ik_ros1
# intera SDK should be installed per Sawyer docs
catkin_make
source devel/setup.bash- Create and activate a Python virtual environment, then install Python deps
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r src/tabletop_workspace_opt/requirements.txtNotes:
- ROS Python packages (e.g., rospy,cv_bridge,tf) are provided by apt/ROS, not pip.
- For GPU acceleration, install the correct CUDA/cuDNN prior to installing torch.
- Getting Tea and Snacks
Please see assets/baseline_tea_task.movandassets/workspace_optimized_teaa_tas.mov.
- Data Tools: Includes simple tools for manual data collection and labeling.
- Intent recognition and perception pipeline (camera + YOLO + intent):
source /catkin_ws/devel/setup.bash
roslaunch tabletop_workspace_opt intent_recognizer.launch- Expected Output:
- RealSense streams are started and aligned; YOLO detections are shown/republished.
- RViz displays 3D bounding boxes and markers for tracked and detected objects.
- The intent_inferencenode publishes~distributionand~top_goalreflecting user intent.
- If show_gui:=true, OpenCV windows display annotated frames and GUI controls.
 
If you do not have hardware available, record or play back a ROS bag with the following topics: /right_cam/color/image_raw, /right_cam/aligned_depth_to_color/image_raw, /right_cam/color/camera_info, and optional /robot/limb/right/endpoint_state.
- Workspace optimization
python3 map_elites.py --config config/tea_task.yaml
The configuration files specify the task graph, objects in the scene, colors to show each object in for the plot, and the bounding box sizes of the objects. You can use movable_object_mask to specify which object positions you don't want the algorithm to modify. For those objects, you have to specify their position in object_positions for collision checks.
- Expected Output:
- Object positions: [x, y, theta] printed in the console for all objects
- an image file (wo_layout_cma-me.png) of the objects in a 2D top-down view
- an image file (wo_archive_heatmap_cma-me.png) that shows the MAP-elites archive
- a pickle file of the MAP-elites archive
 
- Configuration parameters are exposed as ROS params in the launch files and nodes:
- Perception (yolo_3d_pose_node.py): model path (~model), thresholds, class filters, frame names, GUI toggle.
- Intent (intent_inference_node.py):~tracker_type(hand or end_effector),~base_frame, temporal window, beta, and speed threshold.
- IK/teleop via relaxed_ik_ros1and joystick/keyboard teleop are included in the launches.
 
- Perception (
- To adapt for your robot or camera:
- Update frame names (~base_frame,~color_optical_frame) and static transforms in the launch files.
- Replace YOLO weights (e.g., yolov8m.ptoryolo11m.pt) via the~modelparam.
- Adjust detection classes and patch sizes via node params.
- For end-effector intent, ensure /robot/limb/right/endpoint_stateexists or remap the topic.
 
- Update frame names (
- Optimization components (src/envopt/map_elites.py) can be run standalone for workspace layout studies; ensure Python deps from requirements are installed and run it with Python to generate archives and visualizations.
If you use this code in academic work, please cite the accompanying paper ().
