Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2401: Validate fine-tuning configurations in torch plugin #2508

Open
Tracked by #2401
Electronic-Waste opened this issue Mar 12, 2025 · 2 comments
Open
Tracked by #2401

Comments

@Electronic-Waste
Copy link
Member

What you would like to be added?

In order to ensure the validity of the configurations propagated by SDK, we plan to add some validating requirements to the TrainJob Webhook. We'll implement validations in torch plugin CustomValidationPlugin:

  • The ClusterTrainingRuntime referenced by runtime_ref exists in the control plane.

Ref: https://github.com/kubeflow/trainer/tree/master/docs/proposals/2401-llm-trainer-v2#validate-fine-tuning-configurations

Why is this needed?

Scheduled in #2410

Love this feature?

Give it a 👍 We prioritize the features with most 👍

@Electronic-Waste
Copy link
Member Author

/remove-label lifecycle/needs-triage

@Electronic-Waste
Copy link
Member Author

/area llm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant