Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades kubernetes-asyncio version #16841

Merged

Conversation

jeanluciano
Copy link
Contributor

@jeanluciano jeanluciano commented Jan 23, 2025

Updates kubernetes-asyncio to 32.0.0 and removes all instances of _request_timeout in watch calls.

Impossible cpu request:

Worker 'KubernetesWorker 59f17c04-b908-4f99-a56e-80576fd5ef05' submitting flow run 'ebef93ca-69ae-4bbc-b14d-2877667cd219'
03:40:20 PM
Info
Running on worker id: 625acc63-2087-49aa-ab3c-302b9c167725. See worker logs here: https://app.prefect.cloud/account/9b649228-0419-40e1-9e0d-44954b5c0ab6/workspace/c501586b-95e5-4b11-b635-d6e0235009ff/work-pools/work-pool/k8-local/worker/625acc63-2087-49aa-ab3c-302b9c167725
03:40:20 PM
Info
Creating Kubernetes job...
03:40:20 PM
Info
Job 'splendid-pony-rxl2b': Starting watch for pod start...
03:40:20 PM
Info
Job 'splendid-pony-rxl2b': Pod 'splendid-pony-rxl2b-gv9f4' has started.
03:40:20 PM
Info
Job 'splendid-pony-rxl2b': Pod has status 'Pending'.
03:40:20 PM
Info
Completed submission of flow run 'ebef93ca-69ae-4bbc-b14d-2877667cd219'
03:40:20 PM
Info
Job 'splendid-pony-rxl2b': Pod never started.
03:50:40 PM
Error
Job event 'SuccessfulCreate' at 2025-01-27 21:40:20+00:00: Created pod: splendid-pony-rxl2b-gv9f4
03:50:40 PM
Info
Pod event 'FailedScheduling' at 2025-01-27 21:40:20.840684+00:00: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
03:50:40 PM
Info
Pod event 'FailedScheduling' at 2025-01-27 21:45:27.767451+00:00: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
03:50:40 PM
Info
Reported flow run 'ebef93ca-69ae-4bbc-b14d-2877667cd219' as crashed: Flow run infrastructure exited with non-zero status code -1.
03:50:41 PM
Info

Not logging for 10 minutes:

Worker 'KubernetesWorker b354d244-b11f-46b4-9711-5626c4e73d1d' submitting flow run '26716ab5-4d9d-4426-9710-38d8c35e1b46'
09:23:04 AM
Info
Running on worker id: ba16ef09-fcca-46fb-9db2-df4e350f0b13. See worker logs here: https://app.prefect.cloud/account/9b649228-0419-40e1-9e0d-44954b5c0ab6/workspace/c501586b-95e5-4b11-b635-d6e0235009ff/work-pools/work-pool/k8-local/worker/ba16ef09-fcca-46fb-9db2-df4e350f0b13
09:23:04 AM
Info
Creating Kubernetes job...
09:23:05 AM
Info
Job 'proficient-falcon-vz25g': Starting watch for pod start...
09:23:05 AM
Info
Job 'proficient-falcon-vz25g': Pod 'proficient-falcon-vz25g-pdg7t' has started.
09:23:05 AM
Info
Job 'proficient-falcon-vz25g': Pod has status 'Pending'.
09:23:05 AM
Info
Job 'proficient-falcon-vz25g': Pod 'proficient-falcon-vz25g-pdg7t' has started.
09:23:05 AM
Info
Job 'proficient-falcon-vz25g': Pod 'proficient-falcon-vz25g-pdg7t' has started.
09:23:05 AM
Info
Completed submission of flow run '26716ab5-4d9d-4426-9710-38d8c35e1b46'
09:23:05 AM
Info
Job 'proficient-falcon-vz25g': Pod 'proficient-falcon-vz25g-pdg7t' has started.
09:23:06 AM
Info
Job 'proficient-falcon-vz25g': Pod has status 'Running'.
09:23:06 AM
Info
Opening process...
09:23:07 AM
Info
Downloading flow code from storage at '.'
09:23:09 AM
Info
Beginning flow run 'proficient-falcon' for flow 'sleep-and-log'
09:23:09 AM
Info
Sleeping for 610 seconds
09:23:09 AM
Info
Done sleeping
09:33:19 AM
Info
Finished in state Completed()
09:33:20 AM
Info
Process for flow run 'proficient-falcon' exited cleanly.

closes #16210

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

@jeanluciano jeanluciano changed the title Kubernetes worker ` Kubernetes worker use none for _request_timeout Jan 23, 2025
@jeanluciano jeanluciano marked this pull request as ready for review January 23, 2025 21:20
@zzstoatzz zzstoatzz marked this pull request as draft January 23, 2025 21:51
@zzstoatzz
Copy link
Collaborator

moving to draft for now until we can articulate the need for this given #15744

cc @kevingrismore

…rnetes-worker' of https://github.com/PrefectHQ/prefect into jean/oss-5995-response-payload-is-not-completed-in-kubernetes-worker
@kevingrismore
Copy link
Contributor

We should be ok removing all the ClientTimeout instances we're passing to _request_timeout with the bumped dependency. I would verify by doing the following:
On the updated async k8s package and all ClientTimeouts removed,

  • Try to start a flow run with an impossibly large CPU request and a job start timeout of 10+ minutes. Ensure the timeout is enforced as intended rather than the connection dropping
  • Run a flow that doesn't log for 10+ minutes and ensure the connection doesn't drop

@jeanluciano jeanluciano marked this pull request as ready for review January 29, 2025 17:17
@github-actions github-actions bot added bug Something isn't working integrations Related to integrations with other services labels Jan 29, 2025
@kevingrismore
Copy link
Contributor

@jeanluciano can you update the PR name and share some logs from runs demonstrating both test cases I mentioned work as expected?

@jeanluciano jeanluciano changed the title Kubernetes worker use none for _request_timeout Upgrades kubernetes-asyncio version Jan 29, 2025
Copy link
Contributor

@kevingrismore kevingrismore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just revisiting this issue this morning - thank you @jeanluciano for the PR and @kevingrismore for the solid review!

@cicdw cicdw merged commit 4989f15 into main Jan 30, 2025
14 checks passed
@cicdw cicdw deleted the jean/oss-5995-response-payload-is-not-completed-in-kubernetes-worker branch January 30, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working integrations Related to integrations with other services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Response payload is not completed in kubernetes worker logger
4 participants