-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a competitive agent with open LLMs #1085
Comments
The user should be able to choose single or multiple LLM to power all the agents. For example, mixtral could power the generalized agents while deepseekcoder power the code generating agents, and white-rabbit-neo could power the testing/cybersecurity agents. This way, only one LLM will be active at a time as per the active agent, and multiple niche specific open LLM could collaborate to outperform private LLMs like gpt-4 while running locally on consumer grade hardware. |
I think the models need to be "self-prompting" From the experience I have had with OpenDevin there are a lot of times it gets close to doing the thing that I want it to but it falls short of the goal and then just starts either repeating the same command or will just do something random. It would be interesting to use two distinct prompting strategies so that the model effectively has a conversation with itself. The first prompt would be something along the lines of looking at its previous actions and the goal and coming up with a plan for the next action it could take. Then the second prompt would be getting the agent to perform an action based on the thoughts provided by the response to the first prompt. I think this would offer the agent more flexibility and it would give it more ability to guide itself towards a better in context solution than any static prompt template can. the downside is that you need to have two model queries per action you take instead of one. Also, Microsoft just released wizardLM 2 and it is way better than anything I have tried local so far. |
gpt-pilot is quite good at this. Try it out to get an idea. I think there are planner and reviewer agents for each step. I kind of wish OpenDevin incorporated gpt-pilot for the engine. |
A nice way to improve open-source LLMs is by fine-tuning them with trajectories from stronger models like GPT-4. Bonus point if we can filter out the bad ones. One way to achieve this at scale, similar to wildchat, is to provide officially hosted I imagine this could be used to:
|
Thanks @Jiayi-Pan!! All of the bullet points mentioned are actually on our roadmap :)) |
Amazing and thanks for the pointer! I will have a look and see what I can contribute |
@Jiayi-Pan We are currently thinking about re-purposing the existing agent tuning dataset (e.g., code, agent tuning) for (1) so we can have a preliminary v0.1 OSS model :) |
Also does this feel like a technical foundation for building fine-tuning tool kits through generating quasi-synthetic data? |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
We're still working on this! |
Hey @neubig , sorry for being late I was a bit busy these days and I was working on a small version but I had some resource limitations so I didn't progress. |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
@Jiayi-Pan here is a bit of a leading question:
|
I think with deepseek-v3 and @Jiayi-Pan and @xingyaoww 's SWE-Gym project we now probably have open models that can achieve reasonable scores in OpenHands! We still need to create a better leaderboard, but we can handle this isn a new issue: #5869 Congratulations to us on closing one of the oldest issues in our backlog :) |
What problem or use case are you trying to solve?
Currently OpenDevin somewhat works with the strongest closed LLMs such as GPT-4 or Claude Opus, but we have not confirmed good results with open LLMs that can be run locally. We would like to create a formula to achieve competitive results with local LMs.
Do you have thoughts on the technical implementation?
This will require a strong (perhaps fine-tuned) coding agent LLM. It will probably have to be tuned based on strong code LMs such as CodeLlama, StarCoder, DeepseekCoder, or some other yet-to-be-released LLM.
The text was updated successfully, but these errors were encountered: