Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion]: Coding performance of local models #3407

Closed
MichaelKarpe opened this issue Aug 15, 2024 · 5 comments
Closed

[Discussion]: Coding performance of local models #3407

MichaelKarpe opened this issue Aug 15, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@MichaelKarpe
Copy link

What problem or use case are you trying to solve?

Experimenting OpenDevin on local workspace with ollama and 7/8B models (llama3, codellama, codegemma) on my 6GB VRAM GPU, since I cannot try bigger models with such GPU. We can do some stuff as illustrated here by @SmartManoj but it's far from being as fluid as what it looks to be with GPT-4 for instance (I didn't try the latter myself).

What local models seem to be the best performing currently? Did someone try OpenDevin with bigger models and observe a satisfying performance for using as a true coding assistant?

I want to emphasize I am not criticizing the OpenDevin framework, I am perfectly aware that open-source models capability is most probably still too far from closed-source models. On the contrary, this is already impressive to see OpenDevin to be working with small local models, and I just want to gather information here on what would be the best combo to have the best performance with current framework and models.

Do you have thoughts on the technical implementation?

It's been suggested in #1336 that maybe the prompts could be reviewed or improved for local models. Can this be considered? Is there a prompting logic or hard-coded prompts that can be reviewed or improved in OpenDevin?

Describe alternatives you've considered

Wait for better small open-source LLMs... 😅

Additional context

I want to emphasize I am not criticizing the OpenDevin framework, I am perfectly aware that open-source models capability is most probably still too far from closed-source models. On the contrary, this is already impressive to see OpenDevin to be working with small local models, and I just want to gather information here on what would be the best combo to have the best performance with current framework and models.

Thanks to the OpenDevin team for the great work! 🙌

@MichaelKarpe MichaelKarpe added the enhancement New feature or request label Aug 15, 2024
@MichaelKarpe
Copy link
Author

In addition, is there any free alternative to paid or ollama models that could be used for having a good OpenDevin performance (e.g. free API)?

@mamoodi
Copy link
Collaborator

mamoodi commented Aug 15, 2024

Linking as it seems relevant: #1085

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Sep 17, 2024
@enyst
Copy link
Collaborator

enyst commented Sep 17, 2024

We have this on the roadmap and I understand we'll test some open LLMs as we're testing for the benchmarks performance of the current version. We know that with Llama-70B-Instruct, openhands achieves 9.67% on SWE-bench, compared to 26.67% for Sonnet 3.5. We will benchmark at least Llama-405B, and hopefully more.

Is there a prompting logic or hard-coded prompts that can be reviewed or improved in OpenDevin?

The prompts for all agents are in files named prompt or prompts, you can always try tweaking them. For example, for the main agent system_prompt.j2 and user_prompt.j2.

Personally, I think a specialized agent is a possible solution. It's also possible that the template system we have newly introduced, that you can see in the prompt files above, will be able to allow enough customization for better performance.

@github-actions github-actions bot removed the Stale Inactive for 30 days label Sep 18, 2024
@neubig
Copy link
Contributor

neubig commented Sep 20, 2024

I would like to close this issue in favor of two other issues which basically cover this. It's still a very important issue, we're just deduplicating!

@neubig neubig closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants