-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal]Two better resource scheduling and allocation plugins #2298
Comments
/assign @LY-today |
Nice proposal. Maybe we can discuss about whether to take this as a plugin or a strategy? PTAL @ZiMengSheng @saintube |
If the community classmates approve, I can contribute MR |
What does this plugin look like in implementation details. |
plugin-one
plugin-two
|
OK
|
Can I understand that the community has no intention of integrating these two plugins? Instead suggest adjusting the DeviceShare policy? |
@ZiMengSheng I took a look at RM:koordinator-sh/website#187 ,The deviceshare plugin seems to have a high access cost for traditional nvidia.com/gpu and early VGPU, and this plugin does not seem to be very mature yet. Expanding these two types of plugins may be the fastest way to generate revenue for AI scenarios at this stage. |
MR:#2302 |
What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.
Why is this needed:
There are related descriptions above
Is there a suggested solution, if so, please add it:
plugin-one
config:
config description:
node score:
plugin-two
config:
config description:
node score:
The text was updated successfully, but these errors were encountered: