-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion]: Coding performance of local models #3407
Comments
In addition, is there any free alternative to paid or ollama models that could be used for having a good OpenDevin performance (e.g. free API)? |
Linking as it seems relevant: #1085 |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
We have this on the roadmap and I understand we'll test some open LLMs as we're testing for the benchmarks performance of the current version. We know that with Llama-70B-Instruct, openhands achieves 9.67% on SWE-bench, compared to 26.67% for Sonnet 3.5. We will benchmark at least Llama-405B, and hopefully more.
The prompts for all agents are in files named prompt or prompts, you can always try tweaking them. For example, for the main agent system_prompt.j2 and user_prompt.j2. Personally, I think a specialized agent is a possible solution. It's also possible that the template system we have newly introduced, that you can see in the prompt files above, will be able to allow enough customization for better performance. |
I would like to close this issue in favor of two other issues which basically cover this. It's still a very important issue, we're just deduplicating! |
What problem or use case are you trying to solve?
Experimenting OpenDevin on local workspace with ollama and 7/8B models (llama3, codellama, codegemma) on my 6GB VRAM GPU, since I cannot try bigger models with such GPU. We can do some stuff as illustrated here by @SmartManoj but it's far from being as fluid as what it looks to be with GPT-4 for instance (I didn't try the latter myself).
What local models seem to be the best performing currently? Did someone try OpenDevin with bigger models and observe a satisfying performance for using as a true coding assistant?
I want to emphasize I am not criticizing the OpenDevin framework, I am perfectly aware that open-source models capability is most probably still too far from closed-source models. On the contrary, this is already impressive to see OpenDevin to be working with small local models, and I just want to gather information here on what would be the best combo to have the best performance with current framework and models.
Do you have thoughts on the technical implementation?
It's been suggested in #1336 that maybe the prompts could be reviewed or improved for local models. Can this be considered? Is there a prompting logic or hard-coded prompts that can be reviewed or improved in OpenDevin?
Describe alternatives you've considered
Wait for better small open-source LLMs... 😅
Additional context
I want to emphasize I am not criticizing the OpenDevin framework, I am perfectly aware that open-source models capability is most probably still too far from closed-source models. On the contrary, this is already impressive to see OpenDevin to be working with small local models, and I just want to gather information here on what would be the best combo to have the best performance with current framework and models.
Thanks to the OpenDevin team for the great work! 🙌
The text was updated successfully, but these errors were encountered: