-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Offline DeepSeek Model #82
Comments
/bounty $800 |
💎 $800 bounty • IntelliNodeSteps to solve:
Thank you for contributing to intelligentnode/Intelli! Add a bounty • Share on socials
|
/attempt #82
|
@Barqawiz Have you considered leveraging Ollama for loading and running the DeepSeek models instead? By building a lightweight wrapper to integrate with Ollama, we could create an API for AI agents to interact with local models. Not only supporting DeepSeek but also any other models compatible with Ollama. This would simplify development, reduce overhead, and ensure we're tapping into the optimizations that Ollama offers for efficient model inference and memory management. I think a custom model loader would be quite difficult to maintain later on This approach would make it easier to scale and support a broader range of models in the future as well |
Good question @oliverqx. Let me explain the rationale behind using an offline model and why I'm avoiding Ollama or similar high-level modules. Intelli can build a graph of collaborating agents using the flow concept: Sequence Flow Documentation. I've managed to integrate multiple offline models into the flow using the KerasWrapper, which provides a convenient way to load several offline models, such as Llama, Mistral, and others: KerasWrapper class. However, Keras does not currently support DeepSeek, and adding that functionality will likely take some time from Keras team. As a result, my current focus is on DeepSeek. I avoid using Ollama because I want to minimize external dependencies. I looked into Ollama as a high-level library, and integrating it would introduce additional unnecessary modules - the same thing with HF Transformers. You can influence how Ollama uses modules like Torch, optimization libraries, or use Safetensors from HuggingFace. These lower-level modules are accepted. I'm happy to credit their work if you influence their approaches, but I prefer not to have Ollama as a dependency for running the flow. Feel free to use o1 or o3 or deepseek to write any part the code. |
@intelligentnode are there any specific DeepSeek variants you'd prefer? |
You can use the official ones from R1 or any quantized variant: DeepSeek-R1 Models
DeepSeek-R1-Distill Models
In general it is going to be expensive to run DeepSeek-R1. But you can test the code on the 7B or 8B models to be accept as attempt. Also if you know a hosted quantized version from R1 you can test on it. |
@intelligentnode for tokenization, is it alright to use AutoTokenizer from transformers? i know:
was wondering if that also applies to tokenization though |
If it requires installing Transformers to use it, then no. |
/attempt #82 Options |
@Enity300 would you like to work on this together? its quite the task i think |
@oliverqx It is good that you mentioned this. With your collaboration, I can assist you with relevant chunks of the code that require stitching, and testing. Send me your email using the below form, and will organize a call: Mention you are from Github. |
so far ive studied the deep seek model repo, this week ive been studying llama.cpp, sometime this weekend i think i can open a PR. until then, work is very theoretical. Will definitely hit you up once i get a solid grasp on what optimization means im not an ai dev so this is a lot of new info, this why its taking so long. I'm confident i can come to a solution sometime mid march |
/attempt 82 Options |
💡 @RaghavArora14 submitted a pull request that claims the bounty. You can visit your bounty board to reward. |
Implement an offline DeepSeek model loader for inference that:
Expected Deliverables:
Notes:
https://github.com/deepseek-ai/DeepSeek-V3/tree/main/inference
The text was updated successfully, but these errors were encountered: