ttlSecondsAfterFinished for MPIJob, not only launcher #644

hy00nc · 2024-05-27T10:09:26Z

Do we have plan to extend ttlsSecondsAfterFinished to the MPIJob-level, not just the launcher?

The text was updated successfully, but these errors were encountered:

alculquicondor · 2024-05-27T12:24:03Z

do you mean that you want to keep the pod objects until the ttl finishes?

Or do you want to keep them running?

hy00nc · 2024-05-27T12:57:39Z

@alculquicondor, thanks for the reply. I want the mpijob resource itself to be deleted after ttl, just like how ttlSecondsAfterFinished works in MPIJob V1. In the current implementation, it remains uncleaned until deleted explicitly, right?

alculquicondor · 2024-05-27T14:16:28Z

oh, gotcha. I don't know if that's how other Kubeflow APIs work. If they do, we can bring MPIJob back to parity.

tenzen-y · 2024-05-27T14:46:01Z

oh, gotcha. I don't know if that's how other Kubeflow APIs work. If they do, we can bring MPIJob back to parity.

Indeed, the other Jobs will be removed after ttlSecondsAfterFinished like this:

https://github.com/kubeflow/training-operator/blob/be5df91eb43e2fdfa1b0a7005f7aeb8cc3a52fb1/pkg/controller.v1/common/job.go#L428-L435

hy00nc · 2024-05-28T00:53:58Z

Would it make sense to extend activeDeadlineSeconds and backoffLimit as well? I guess these are also currently limited to launcher, but other kubeflow jobs apply it to the job-level.

alculquicondor · 2024-05-28T12:03:08Z

Those should be fine just in Job, because the launcher job is what controls the execution. If it finishes as Failed, the rest of the pods would terminate too, IIRC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ttlSecondsAfterFinished for MPIJob, not only launcher #644

ttlSecondsAfterFinished for MPIJob, not only launcher #644

hy00nc commented May 27, 2024

alculquicondor commented May 27, 2024

hy00nc commented May 27, 2024 •

edited

Loading

alculquicondor commented May 27, 2024

tenzen-y commented May 27, 2024

hy00nc commented May 28, 2024

alculquicondor commented May 28, 2024

ttlSecondsAfterFinished for MPIJob, not only launcher #644

ttlSecondsAfterFinished for MPIJob, not only launcher #644

Comments

hy00nc commented May 27, 2024

alculquicondor commented May 27, 2024

hy00nc commented May 27, 2024 • edited Loading

alculquicondor commented May 27, 2024

tenzen-y commented May 27, 2024

hy00nc commented May 28, 2024

alculquicondor commented May 28, 2024

hy00nc commented May 27, 2024 •

edited

Loading