-
Notifications
You must be signed in to change notification settings - Fork 3
Docker profiles, container omission, fixed job cancellation #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| from fastapi.params import Depends | ||
| from fastapi import APIRouter, Body | ||
| from fastapi.responses import Response | ||
| # MODIFIED: Import JSONResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop all LLM descriptive comments and guides, instead you can include include short inline documentation and comments where you deem necessary and longer descriptions in docs for methods and classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to the comments @micoleaoo, thank you :)
tesp_api/service/event_actions.py
Outdated
| print("Volumes:") | ||
| print(volumes) | ||
| output_confs, volume_confs = map_volumes(str(job_id), volumes, outputs) | ||
| mapped_outputs, mapped_volumes = map_volumes(str(job_id), volumes, outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary temporary variables, keep the original, or what was the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an incorrect assumption that the list is already containing data. I'll keep the original.
tesp_api/service/event_actions.py
Outdated
| )).map(lambda updated_task: get_else_throw( | ||
| updated_task, TaskNotFoundError(task_id, Just(TesTaskState.QUEUED)) | ||
| )).then(lambda updated_task: setup_data( | ||
| )).then(lambda updated_task_val: setup_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistic mishap, I'll discard this change
| task = await task_repository.update_task_state( | ||
| task_id, | ||
| TesTaskState.RUNNING, | ||
| TesTaskState.EXECUTOR_ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TesTaskState.EXECUTOR_ERROR change is not included in latest changes
|
|
||
| if command_status.get('returncode', -1) != 0: | ||
| print(f"Task {task_id} executor error (return code: {command_status.get('returncode', -1)}). Setting state to EXECUTOR_ERROR.") | ||
| await task_repository.update_task_state(task_id, TesTaskState.RUNNING, TesTaskState.EXECUTOR_ERROR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or does it work now logically in the same way as before with the TesTaskState.EXECUTOR_ERROR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the logic should be preserved and made more robust:
-
After the Pulsar job finishes (or
job_status_completereturns), we checkcommand_status.get('returncode', -1) != 0. -
Before declaring it an
EXECUTOR_ERROR, there is a check if the task has been canceled by the API, because a cancelled job might also result in a non-zero exit code from Pulsar's perspective (e.g., if it was killed), so we prioritize theCANCELEDstate set by the user. -
If it's not
CANCELEDand the return code is non-zero, then we explicitly update the task state toTesTaskState.EXECUTOR_ERROR(fromTesTaskState.RUNNING). -
We also now explicitly call
pulsar_operations.erase_job(task_id)in this path to ensure the failed Pulsar job is cleaned up. -
Then, we return to prevent the code from falling through to the logic that sets the state to
COMPLETE.
…lambda parameter name back to updated_task
README.md
Outdated
|
|
||
| #### All services (default): | ||
| ``` | ||
| docker compose --profile all up -d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the readme, no longer applies
… of DTS services through docker compose, potentially replacing them in the near future (now works with HTTP only). Added output files and container clean up after tests.
BorisYourich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well done, thank you so much !
Changes made:
input_confsoroutput_confsis not presentProblems & Fixes regarding 3rd point:
Galaxy
JSONDecodeErroron Cancel:/cancelendpoint sent an empty response body; Galaxy expected JSON ({})./cancelendpoint (task_endpoints.py) to returnJSONResponse(content={}), sending"{}".TESP API Kept Polling Cancelled Jobs:
handle_run_task(inevent_actions.py) continued polling Pulsar for jobs already cancelled by the API, leading toLookupErrors and incorrectly changing task state fromCANCELEDtoSYSTEM_ERROR.event_actions.py):LookupError),handle_run_tasknow checks the DB. If task isCANCELED, it exits gracefully (as the cancel API handled it).handle_run_taskto stop processing if a task is found to beCANCELEDearly in its execution or immediately after the Pulsar job finishes/fails.Outcome:
CANCELEDtasks from becomingSYSTEM_ERROR.