Skip to content

Ray actor loop with local envs#81

Open
ollmer wants to merge 138 commits intomainfrom
envs_speed_debug
Open

Ray actor loop with local envs#81
ollmer wants to merge 138 commits intomainfrom
envs_speed_debug

Conversation

@ollmer
Copy link
Copy Markdown
Contributor

@ollmer ollmer commented Oct 6, 2025

Ray-based implementation of ActorLoop that replaces multiprocessing and in memory queues.

Task Execution

  • Uses ray.remote() instead of multiprocessing.Process
  • Initializes Ray with configurable worker count and dashboard
  • Tasks execute rollout policy in separate processes, 1 process per CPU. One ray task handles async_batch_size problems in an async loop simultaneously.

Load Balancing

  • Tracks number of tasks assigned per LLM URL
  • Submits tasks to least busy LLM
  • Checks capacity constraints per LLM before submission

Queue Management

  • Replaces SharedMemoryQueue with in-memory lists, as Ray handles passing results between processes on its own
  • Uses ray.wait() to poll for finished tasks (up to 100 at a time)
  • Groups results by problem ID before returning

Monitoring

  • Logs task latencies, Ray overhead, token throughput, number of failed rollouts
  • Reports per-LLM utilization

Method Overrides

  • start_backend(): Initialize Ray runtime
  • have_capacity(): Check task count + per-LLM limits
  • submit_problem(): Create Ray tasks for each attempt
  • get_new_results(): Poll Ray and return completed groups
  • stop_tasks(): Shutdown Ray
  • Queue size methods adapted for in-memory lists tracking

Configuration

Enabled via cfg.use_ray=true in config. Selected automatically in run_actor_loop().

MCP Server config

  • server startup command replaced with the shorter one that expects the mcp-run-python module to be installed. Lack of installation during startups speeds up actor loop significantly as this startup one time per each task.

@rafapi rafapi mentioned this pull request Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants