[Discussion]: Coding performance of local models #3407

MichaelKarpe · 2024-08-15T16:49:28Z

What problem or use case are you trying to solve?

Experimenting OpenDevin on local workspace with ollama and 7/8B models (llama3, codellama, codegemma) on my 6GB VRAM GPU, since I cannot try bigger models with such GPU. We can do some stuff as illustrated here by @SmartManoj but it's far from being as fluid as what it looks to be with GPT-4 for instance (I didn't try the latter myself).

What local models seem to be the best performing currently? Did someone try OpenDevin with bigger models and observe a satisfying performance for using as a true coding assistant?

I want to emphasize I am not criticizing the OpenDevin framework, I am perfectly aware that open-source models capability is most probably still too far from closed-source models. On the contrary, this is already impressive to see OpenDevin to be working with small local models, and I just want to gather information here on what would be the best combo to have the best performance with current framework and models.

Do you have thoughts on the technical implementation?

It's been suggested in #1336 that maybe the prompts could be reviewed or improved for local models. Can this be considered? Is there a prompting logic or hard-coded prompts that can be reviewed or improved in OpenDevin?

Describe alternatives you've considered

Wait for better small open-source LLMs... 😅

Additional context

I want to emphasize I am not criticizing the OpenDevin framework, I am perfectly aware that open-source models capability is most probably still too far from closed-source models. On the contrary, this is already impressive to see OpenDevin to be working with small local models, and I just want to gather information here on what would be the best combo to have the best performance with current framework and models.

Thanks to the OpenDevin team for the great work! 🙌

MichaelKarpe · 2024-08-15T16:55:23Z

In addition, is there any free alternative to paid or ollama models that could be used for having a good OpenDevin performance (e.g. free API)?

mamoodi · 2024-08-15T19:40:57Z

Linking as it seems relevant: #1085

github-actions · 2024-09-17T01:45:17Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

enyst · 2024-09-17T03:22:55Z

We have this on the roadmap and I understand we'll test some open LLMs as we're testing for the benchmarks performance of the current version. We know that with Llama-70B-Instruct, openhands achieves 9.67% on SWE-bench, compared to 26.67% for Sonnet 3.5. We will benchmark at least Llama-405B, and hopefully more.

Is there a prompting logic or hard-coded prompts that can be reviewed or improved in OpenDevin?

The prompts for all agents are in files named prompt or prompts, you can always try tweaking them. For example, for the main agent system_prompt.j2 and user_prompt.j2.

Personally, I think a specialized agent is a possible solution. It's also possible that the template system we have newly introduced, that you can see in the prompt files above, will be able to allow enough customization for better performance.

neubig · 2024-09-20T12:25:51Z

I would like to close this issue in favor of two other issues which basically cover this. It's still a very important issue, we're just deduplicating!

MichaelKarpe added the enhancement New feature or request label Aug 15, 2024

kevin-support-bot bot mentioned this issue Aug 15, 2024

[Discussion]: Coding performance of local models SmartManoj/Kevin#37

Closed

github-actions bot added the Stale Inactive for 30 days label Sep 17, 2024

github-actions bot removed the Stale Inactive for 30 days label Sep 18, 2024

neubig closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion]: Coding performance of local models #3407

[Discussion]: Coding performance of local models #3407

MichaelKarpe commented Aug 15, 2024

MichaelKarpe commented Aug 15, 2024

mamoodi commented Aug 15, 2024

github-actions bot commented Sep 17, 2024

enyst commented Sep 17, 2024

neubig commented Sep 20, 2024

[Discussion]: Coding performance of local models #3407

[Discussion]: Coding performance of local models #3407

Comments

MichaelKarpe commented Aug 15, 2024

MichaelKarpe commented Aug 15, 2024

mamoodi commented Aug 15, 2024

github-actions bot commented Sep 17, 2024

enyst commented Sep 17, 2024

neubig commented Sep 20, 2024