Skip to content

[Feature Request] Agent-less Zenoh-based hardware interface for embedded microcontrollers (Zephyr RTOS) #3130

@ammaarrahmed

Description

@ammaarrahmed

Is your feature request related to a problem? Please describe.
Currently, integrating resource-constrained microcontrollers with ros2_control heavily relies on micro-ROS and an XRCE-DDS agent running on the host. While effective, this architecture introduces several friction points for high-frequency (500Hz+) control loops:

  1. Agent Bottleneck: The XRCE-DDS agent acts as a single point of failure and a scaling bottleneck when managing multiple hardware nodes.
  2. Serialization Overhead: CDR serialization on the MCU adds non-trivial latency per message.
  3. Zephyr Support: While micro-ROS has strong FreeRTOS support, its native integration with Zephyr RTOS and its built-in networking stack is less mature.

Describe the solution you'd like
I am proposing an agent-less, data-centric hardware interface leveraging Zenoh (zenoh-pico on the MCU and zenoh-cpp on the ROS 2 host). This bypasses the need for an intermediate agent, allowing direct P2P communication between the embedded hardware and the ros2_control framework.

To ensure strict real-time compliance and maximize network throughput, the proposed architecture incorporates two critical design choices:

  1. Batched Key-Value Mapping: Instead of publishing individual keys per joint (which introduces massive network header overhead), the MCU will pack a C-struct of all joint states into a single Zenoh key (e.g., robot/<id>/state). This ensures data coherency per control cycle and minimizes the wire payload to raw IEEE 754 bytes + a single Zenoh header.
  2. Real-Time Safe Execution: The SystemInterface plugin's read() and write() methods cannot block or allocate memory. The plugin will instantiate a background thread to handle Zenoh network I/O. This thread will push/pull data from a lock-free, atomic double-buffer, allowing the ControllerManager's RT loop to grab the latest state in $O(1)$ time without waiting on sockets.

Describe alternatives you've considered

  • micro-ROS (XRCE-DDS): The current standard. I have used this extensively (e.g., ESP32 bridged to an STM32), but the agent overhead and memory footprint make it suboptimal for ultra-low latency requirements compared to Zenoh's ~5-byte wire overhead.
  • Raw UDP/TCP Sockets: Completely bypasses middleware. While fast, it loses the scalability, dynamic discovery, and pub/sub routing flexibility that the Zenoh/ROS ecosystem provides.

Additional context
This proposal is aligned with the OSRF GSoC project: https://github.com/osrf/osrf_wiki/wiki/GSoC-2026#zephyr-zenoh-integration-for-ros2_control

Proposed Architecture Flow

┌──────────────────────────────────────────────────────────────────────────┐
│                        ROS 2 HOST (Linux)                                │
│                                                                          │
│  ┌─────────────────────────────────────────────────────┐                 │
│  │              ros2_control Controller Manager         │                 │
│  │                                                     │                 │
│  │  ┌───────────────────────────────────────────────┐  │                 │
│  │  │   ZenohHardwareInterface (SystemInterface)    │  │                 │
│  │  │                                               │  │                 │
│  │  │  [RT Thread]             [Background Thread]  │  │                 │
│  │  │  read(): atomic load <─> Zenoh subscriber     │  │                 │
│  │  │  write(): atomic push<─> Zenoh publisher      │  │                 │
│  │  └────────────────────┬──────────────────────────┘  │                 │
│  └───────────────────────┼─────────────────────────────┘                 │
│                    Zenoh Session (zenoh-cpp)                             │
└──────────────────────────┼───────────────────────────────────────────────┘
                           │  TCP/UDP (P2P or via Router)
                           │  Key: robot/<id>/state (Batched struct)
┌──────────────────────────┼───────────────────────────────────────────────┐
│                   MICROCONTROLLER (Zephyr RTOS)                          │
│                          │                                               │
│                    Zenoh Session (zenoh-pico)                            │
│                          │                                               │
│  ┌───────────────────────┼───────────────────────────┐                   │
│  │   pub: packed struct of all encoder states        │                   │
│  │   sub: packed struct of all motor commands        │                   │
│  └──────┬────────────────────────────────┬─────────────┘                   │
│  ┌──────▼──────────┐            ┌────────▼─────────┐                     │
│  │   Sensor HAL    │            │   Actuator HAL   │                     │
│  └─────────────────┘            └──────────────────┘                     │
└──────────────────────────────────────────────────────────────────────────┘

Next Steps: Proof of Concept Prototype
To validate this approach before the GSoC coding period, I am currently planning to build a proof-of-work prototype with the following milestones:

Local Simulation: A Linux C++ application simulating the ros2_control RT loop, communicating via zenoh-cpp to a Zephyr instance running in QEMU with zenoh-pico.

Hardware Benchmark: Deploying the Zephyr node to physical hardware (ESP32/STM32) to benchmark end-to-end latency, jitter, and memory footprint compared to a baseline micro-ROS setup.

I would appreciate any feedback from the maintainers on this architectural direction, specifically regarding how dynamically we should generate the Zenoh keys based on the URDF. Happy to iterate on this!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions