-
Notifications
You must be signed in to change notification settings - Fork 103
feat: add tool to check if GPUs are available for training #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mrge found 5 issues across 6 files. View them in mrge.io
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature @vaibs-d ! I have a couple of questions about how the new tools plug into the agentic workflow, see comments. Otherwise, excited to get this feature merged 🚀
def get_executor_tool(distributed: bool = False) -> Callable: | ||
"""Get the appropriate executor tool based on the distributed flag.""" | ||
def get_executor_tool() -> Callable: | ||
"""Get the executor tool for training code execution.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually don't need this to be wrapped in a closure anymore, since we're not passing any parameters to it.
@tool | ||
def get_gpu_info() -> dict: | ||
""" | ||
Get available GPU information for code generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two questions about this tool:
- It doesn't seem to be added to the list of tools for any of the agents, unless I'm missing something?
- I'm not sure how the agent is expected to use this tool. Is the idea that the agent would check using this tool if different ML frameworks have access to GPU?
I think I see a couple of issues here:
- The docstring may be a little too terse for an agent to reliably understand what this is for.
- This should be added to an agent's tools, and the agent's prompts should maybe also be modified to suggest using this
- This will check for GPU availability on the local compute instance; if we're using a Ray cluster, this would not check for the presence of GPUs in the Ray cluster
@tool | ||
def get_ray_info() -> dict: | ||
""" | ||
Get Ray cluster information including GPU availability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the tool above, I'm not entirely clear on how an agent should use this tool. The docstring probably doesn't give an agent enough context about what this tool is really "for" and how it fits in the broader workflow.
At least with GPT-4o and Claude 3.7 I've seen that the tool docstrings need to be very unambiguous for the agent to reliably use the tool correctly.
0e5c7db
to
903b1b9
Compare
This PR adds tools for the MLE agent to asses if GPUs are available via Ray or locally to use for model training.