RobotecAI
diff --git a/‎poetry.lock‎
Lines changed: 488 additions & 389 deletions b/‎poetry.lock‎
Lines changed: 488 additions & 389 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 4 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 4 deletions
diff --git a/‎src/rai_extensions/rai_perception/README.md‎
Lines changed: 120 additions & 42 deletions b/‎src/rai_extensions/rai_perception/README.md‎
Lines changed: 120 additions & 42 deletions
diff --git a/‎src/rai_extensions/rai_perception/pyproject.toml‎
Lines changed: 18 additions & 0 deletions b/‎src/rai_extensions/rai_perception/pyproject.toml‎
Lines changed: 18 additions & 0 deletions
@@ -50,10 +50,7 @@ rai_bench = {path = "src/rai_bench", develop = true}
 optional = true
 
 [tool.poetry.group.perception.dependencies]
-torch = "^2.3.1"
-torchvision = "^0.18.1"
-rf-groundingdino = "^0.2.0"
-sam2 = { git = "https://github.com/RobotecAI/Grounded-SAM-2", branch = "main" }
+rai_perception = {path = "src/rai_extensions/rai_perception", develop = true}
 
 [tool.poetry.group.nomad]
 optional = true
 
@@ -2,56 +2,108 @@
 
 # RAI Perception
 
-This package provides a ROS 2 node that interfaces with the [Idea-Research GroundingDINO Model](https://github.com/IDEA-Research/GroundingDINO) for open-set object detection.
+This package provides ROS2 integration with [Idea-Research GroundingDINO Model](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2, RobotecAI fork](https://github.com/RobotecAI/Grounded-SAM-2) for object detection, segmentation, and gripping point calculation. The `GroundedSamAgent` and `GroundingDinoAgent` are ROS2 service nodes that can be readily added to ROS2 applications. It also provides tools that can be used with [RAI LLM agents](../../../docs/tutorials/walkthrough.md) to construct conversational scenarios.
 
+In addition to these building blocks, this package includes utilities to facilitate development, such as a ROS2 client that demonstrates interactions with agent nodes.
 
 ## Installation
 
-In your workspace you need to have an `src` folder containing this package `rai_perception` and the `rai_interfaces` package.
+While installing `rai_perception` via Pip is being actively worked on, to incorporate it into your application, you will need to set up a ROS2 workspace.
 
-### Preparing the GroundingDINO
+### ROS2 Workspace Setup
 
-Add required ROS dependencies:
+Create a ROS2 workspace and copy this package:
 
-```
-rosdep install --from-paths src --ignore-src -r
+```bash
+mkdir -p ~/rai_perception_ws/src
+cd ~/rai_perception_ws/src
+
+# only checkout rai_perception package
+# TODO:juliaj, update branch to main!
+git clone --depth 1 --branch jj/feat/rai-perception-pkg https://github.com/RobotecAI/rai.git temp
+cd temp
+git archive --format=tar --prefix=rai_perception/ HEAD:src/rai_extensions/rai_perception | tar -xf -
+mv rai_perception ../rai_perception
+cd ..
+rm -rf temp
 ```
 
-## Build and run
+### ROS2 Dependencies
 
-In the base directory of the `RAI` package install dependencies:
+Add required ROS dependencies. From the workspace root, run
 
-```
-poetry install --with perception
+```bash
+rosdep install --from-paths src --ignore-src -r
 ```
 
-Source the ros installation
+### Build and Run
 
-```
-source /opt/ros/${ROS_DISTRO}/setup.bash
-```
+Source ROS2 and build:
 
-Run the build process:
+```bash
+# Source ROS2 (humble or jazzy)
+source /opt/ros/${ROS_DISTRO}/setup.bash
 
-```
+# Build workspace
+cd ~/rai_perception_ws
 colcon build --symlink-install
+
+# Source ROS2 packages
+source install/setup.bash
 ```
 
-Source the environment
+### Python Dependencies
 
-```
-source setup_shell.sh
+`rai_perception` depends on `rai-core` and `sam2`. There are many ways to set up a virtual environment and install these dependencies. Below, we provide an example using Poetry.
+
+**Step 1:** Copy the following template to `pyproject.toml` in your workspace root, updating it according to your directory setup:
+
+```toml
+# rai_perception_project pyproject template
+[tool.poetry]
+name = "rai_perception_ws"
+version = "0.1.0"
+description = "ROS2 workspace for RAI perception"
+package-mode = false
+
+[tool.poetry.dependencies]
+python = "^3.10, <3.13"
+rai-core = ">=2.5.4"
+rai-perception = {path = "src/rai_perception", develop = true}
+
+[build-system]
+requires = ["poetry-core>=1.0.0"]
+build-backend = "poetry.core.masonry.api"
 ```
 
-Run the `GroundedSamAgent` and `GroundingDinoAgent` agents.
+**Step 2:** Install dependencies:
 
+First, we create Virtual Environment with Poetry:
+
+```bash
+cd ~/rai_perception_ws
+poetry lock
+poetry install
 ```
-python run_vision_agents.py
+
+Now, we are ready to launch perception agents:
+
+```bash
+# Activate virtual environment
+source "$(poetry env info --path)"/bin/activate
+export PYTHONPATH
+PYTHONPATH="$(dirname "$(dirname "$(poetry run which python)")")/lib/python$(poetry run python --version | awk '{print $2}' | cut -d. -f1,2)/site-packages:$PYTHONPATH"
+
+# run agents
+python src/rai_perception/scripts/run_perception_agents.py
 ```
 
+> [!TIP]
+> To manage ROS 2 + Poetry environment with less friction: Keep build tools (colcon) at system level, use Poetry only for runtime dependencies of your packages.
+
 <!--- --8<-- [end:sec1] -->
 
-Agents create two ROS 2 Nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../../../docs/API_documentation/connectors/ROS_2_Connectors.md).
+`rai-perception` agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../../../docs/API_documentation/connectors/ROS_2_Connectors.md).
 These agents can be triggered by ROS2 services:
 
 -   `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
@@ -68,83 +120,109 @@ These agents can be triggered by ROS2 services:
 ## RAI Tools
 
 `rai_perception` package contains tools that can be used by [RAI LLM agents](../../../docs/tutorials/walkthrough.md)
-enhance their perception capabilities. For more information on RAI Tools see
+to enhance their perception capabilities. For more information on RAI Tools see
 [Tool use and development](../../../docs/tutorials/tools.md) tutorial.
 
-<!--- --8<-- [start:sec3] -->
+<!--- --8<-- [start:sec2] -->
 
 ### `GetDetectionTool`
 
-This tool calls the grounding dino service to use the model to see if the message from the provided camera topic contains objects from a comma separated prompt.
+This tool calls the GroundingDINO service to detect objects from a comma-separated prompt in the provided camera topic.
 
-<!--- --8<-- [end:sec3] -->
+<!--- --8<-- [end:sec2] -->
 
 > [!TIP]
 >
 > you can try example below with [rosbotxl demo](../../../docs/demos/rosbot_xl.md) binary.
-> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_raw` topics.
+> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_rect_raw` topics.
 
-<!--- --8<-- [start:sec4] -->
+<!--- --8<-- [start:sec3] -->
 
 **Example call**
 
 ```python
+import time
 from rai_perception.tools import GetDetectionTool
 from rai.communication.ros2 import ROS2Connector, ROS2Context
 
 with ROS2Context():
     connector=ROS2Connector(node_name="test_node")
+
+    # Wait for topic discovery to complete
+    print("Waiting for topic discovery...")
+    time.sleep(3)
+
     x = GetDetectionTool(connector=connector)._run(
         camera_topic="/camera/camera/color/image_raw",
-        object_names=["chair", "human", "plushie", "box", "ball"],
+        object_names=["bed", "bed pillow", "table lamp", "plant", "desk"],
     )
+    print(x)
 ```
 
 **Example output**
 
 ```
-I have detected the following items in the picture - chair, human
+I have detected the following items in the picture plant, table lamp, table lamp, bed, desk
 ```
 
 ### `GetDistanceToObjectsTool`
 
-This tool calls the grounding dino service to use the model to see if the message from the provided camera topic contains objects from a comma separated prompt. Then it utilises messages from depth camera to create an estimation of distance to a detected object.
+This tool calls the GroundingDINO service to detect objects from a comma-separated prompt in the provided camera topic. Then it utilizes messages from the depth camera to estimate the distance to detected objects.
 
 **Example call**
 
 ```python
-from rai_perception.tools import GetDetectionTool
+from rai_perception.tools import GetDistanceToObjectsTool
 from rai.communication.ros2 import ROS2Connector, ROS2Context
+import time
 
 with ROS2Context():
     connector=ROS2Connector(node_name="test_node")
-    connector.node.declare_parameter("conversion_ratio", 1.0) # scale parameter for the depth map
+    connector.node.declare_parameter("conversion_ratio", 1.0)  # scale parameter for the depth map
+
+    # Wait for topic discovery to complete
+    print("Waiting for topic discovery...")
+    time.sleep(3)
+
     x = GetDistanceToObjectsTool(connector=connector)._run(
         camera_topic="/camera/camera/color/image_raw",
         depth_topic="/camera/camera/depth/image_rect_raw",
-        object_names=["chair", "human", "plushie", "box", "ball"],
+        object_names=["desk"],
     )
 
+    print(x)
 ```
 
 **Example output**
 
 ```
-I have detected the following items in the picture human: 3.77m away
+I have detected the following items in the picture desk: 2.43m away
 ```
 
 ## Simple ROS2 Client Node Example
 
-An example client is provided with the package as `rai_perception/talker.py`
+The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.
+
+This example is useful for:
 
-You can see it working by running:
+-   Testing perception services integration
+-   Understanding the ROS2 service call patterns
+-   Seeing detection and segmentation results with bounding boxes and masks
 
+Run the example:
+
+```bash
+cd ~/rai_perception_ws
+python src/rai_perception/scripts/run_perception_agents.py
 ```
-python run_vision_agents.py
-cd rai # rai repo BASE directory
-ros2 run rai_perception talker --ros-args -p image_path:=src/rai_extensions/rai_perception/images/sample.jpg
+
+In a different window, run
+
+```bash
+cd ~/rai_perception_ws
+ros2 run rai_perception talker --ros-args -p image_path:=src/rai_perception/images/sample.jpg
 ```
 
-If everything was set up properly you should see a couple of detections with classes `dinosaur`, `dragon`, and `lizard`.
+The example will detect objects (dragon, lizard, dinosaur) and save a visualization with bounding boxes and masks to `masks.png`.
 
-<!--- --8<-- [end:sec4] -->
+<!--- --8<-- [end:sec3] -->
@@ -0,0 +1,18 @@
+[tool.poetry]
+name = "rai_perception"
+version = "0.1.0"
+description = "Package enabling perception capabilities for RAI"
+authors = ["Kajetan Rachwał <[email protected]>"]
+license = "Apache License 2.0"
+readme = "README.md"
+
+[tool.poetry.dependencies]
+# TODO:(juliaj) update sam2 dependency after https://github.com/RobotecAI/Grounded-SAM-2/pull/3 is merged
+torch = "^2.3.1"
+torchvision = "^0.18.1"
+rf-groundingdino = "^0.2.0"
+sam2 = { git = "https://github.com/RobotecAI/Grounded-SAM-2", branch = "main" }
+
+[build-system]
+requires = ["poetry-core>=1.0.0"]
+build-backend = "poetry.core.masonry.api"