Skip to content

Dagster Cloud Agent cgroup errors on AWS ECS Managed Instances #33360

@paxan

Description

@paxan

Summary

Dagster Cloud Agent 1.12.11 logs cgroup v2 FileNotFoundError errors when running on AWS ECS Managed Instances (Bottlerocket OS). This ECS launch type is relatively new: available since September 30, 2025.

Environment

  • Dagster Cloud Agent: docker.io/dagster/dagster-cloud-agent:1.12.11
  • Deployment: AWS ECS Hybrid Agent
  • Launch Type: ECS Managed Instances (uses Bottlerocket OS)

Error Logs

2026-01-24 09:48:22 +0000 - dagster_cloud.agent - ERROR - Failed to retrieve CPU period from cgroup
Traceback (most recent call last):
  File "/dagster/dagster/_utils/container.py", line 299, in _retrieve_containerized_cpu_cfs_period_us_v2
    with open(cpu_max_path_cgroup_v2()) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cpu.max'

2026-01-24 09:48:22 +0000 - dagster_cloud.agent - ERROR - Failed to retrieve memory usage from cgroup: [Errno 2] No such file or directory: '/sys/fs/cgroup/memory.current'

2026-01-24 09:48:22 +0000 - dagster_cloud.agent - ERROR - Failed to retrieve memory limit from cgroup. There may be no limit set on the container.
Traceback (most recent call last):
  File "/dagster/dagster/_utils/container.py", line 260, in _retrieve_containerized_memory_limit_v2
    with open(memory_limit_path_cgroup_v2()) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/memory.max'

Workarounds Attempted

1. Mounting /sys/fs/cgroup volume

Added volume mount for /sys/fs/cgroup to the task definition - errors persist because ECS containers don't have access to cgroup files in their namespace.

2. Using path override env vars with /dev/null

Set DAGSTER_CPU_MAX_PATH, DAGSTER_MEMORY_LIMIT_PATH_V2, DAGSTER_MEMORY_USAGE_PATH_V2 to /dev/null - causes parse errors such as:

IndexError: list index out of range
ValueError: invalid literal for int() with base 10: ''

Expected Behavior

The agent should:

  1. Handle environments where cgroup files are not accessible (common in managed container platforms), or
  2. Use fall back vars like DAGSTER_CLOUD_AGENT_CPU_LIMIT if the path vars are /dev/null w/o attempting to parse 0 length contents, maybe?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions