Docker profiles, container omission, fixed job cancellation #43

micoleaoo · 2025-05-26T12:01:58Z

Changes made:

added docker profiles
container ommision if input_confs or output_confs is not present
fixed job cancellation issues causing Galaxy errors, lingering jobs, and TESP API error loops.

Problems & Fixes regarding 3rd point:

Galaxy JSONDecodeError on Cancel:
- Problem: TESP API's /cancel endpoint sent an empty response body; Galaxy expected JSON ({}).
- Fix: Updated TESP API's /cancel endpoint (task_endpoints.py) to return JSONResponse(content={}), sending "{}".
TESP API Kept Polling Cancelled Jobs:
- Problem: handle_run_task (in event_actions.py) continued polling Pulsar for jobs already cancelled by the API, leading to LookupErrors and incorrectly changing task state from CANCELED to SYSTEM_ERROR.
- Fixes (event_actions.py):
  - Error Handling: If polling Pulsar fails (e.g., LookupError), handle_run_task now checks the DB. If task is CANCELED, it exits gracefully (as the cancel API handled it).
  - Early Exit: Added checks to handle_run_task to stop processing if a task is found to be CANCELED early in its execution or immediately after the Pulsar job finishes/fails.
  - Cleaned Executor Errors: Ensured Pulsar jobs are erased if a task fails due to an executor error (not just cancellation).

Outcome:

Galaxy cancellations are now smooth.
TESP API correctly stops processing and polling for cancelled jobs.
Task states are managed more accurately, especially preventing CANCELED tasks from becoming SYSTEM_ERROR.

…ogic/rewrote code for those conditions in event_actions.py: initialization, conditional command building, command joining, empty command check, singularity placeholders, error handling/logging

for issue #36

issue #36

…client, CESNET/usegalaxy#153

martenson · 2025-05-26T13:16:07Z

tesp_api/api/endpoints/task_endpoints.py

 from fastapi.params import Depends
 from fastapi import APIRouter, Body
-from fastapi.responses import Response
+# MODIFIED: Import JSONResponse


please drop all LLM descriptive comments and guides, instead you can include include short inline documentation and comments where you deem necessary and longer descriptions in docs for methods and classes

BorisYourich

Please reply to the comments @micoleaoo, thank you :)

BorisYourich · 2025-05-28T10:44:29Z

tesp_api/service/event_actions.py

-        print("Volumes:")
-        print(volumes)
-        output_confs, volume_confs = map_volumes(str(job_id), volumes, outputs)
+        mapped_outputs, mapped_volumes = map_volumes(str(job_id), volumes, outputs)


Unnecessary temporary variables, keep the original, or what was the reason for this change?

There was an incorrect assumption that the list is already containing data. I'll keep the original.

BorisYourich · 2025-06-09T09:21:03Z

tesp_api/service/event_actions.py

        )).map(lambda updated_task: get_else_throw(
            updated_task, TaskNotFoundError(task_id, Just(TesTaskState.QUEUED))
-        )).then(lambda updated_task: setup_data(
+        )).then(lambda updated_task_val: setup_data(


explain this change

stylistic mishap, I'll discard this change

BorisYourich · 2025-06-09T09:26:40Z

tesp_api/service/event_actions.py

-            task = await task_repository.update_task_state(
-                task_id,
-                TesTaskState.RUNNING,
-                TesTaskState.EXECUTOR_ERROR


The TesTaskState.EXECUTOR_ERROR change is not included in latest changes

BorisYourich · 2025-06-09T09:27:55Z

tesp_api/service/event_actions.py

+
+        if command_status.get('returncode', -1) != 0:
+            print(f"Task {task_id} executor error (return code: {command_status.get('returncode', -1)}). Setting state to EXECUTOR_ERROR.")
+            await task_repository.update_task_state(task_id, TesTaskState.RUNNING, TesTaskState.EXECUTOR_ERROR)


Or does it work now logically in the same way as before with the TesTaskState.EXECUTOR_ERROR ?

Yes, the logic should be preserved and made more robust:

After the Pulsar job finishes (or job_status_complete returns), we check command_status.get('returncode', -1) != 0.

Before declaring it an EXECUTOR_ERROR, there is a check if the task has been canceled by the API, because a cancelled job might also result in a non-zero exit code from Pulsar's perspective (e.g., if it was killed), so we prioritize the CANCELED state set by the user.

If it's not CANCELED and the return code is non-zero, then we explicitly update the task state to TesTaskState.EXECUTOR_ERROR (from TesTaskState.RUNNING).

We also now explicitly call pulsar_operations.erase_job(task_id) in this path to ensure the failed Pulsar job is cleaned up.

Then, we return to prevent the code from falling through to the logic that sets the state to COMPLETE.

…lambda parameter name back to updated_task

tesp_api/api/endpoints/task_endpoints.py

BorisYourich · 2025-07-14T12:14:08Z

README.md

+
+#### All services (default):
+```
+docker compose --profile all up -d


Fix the readme, no longer applies

… of DTS services through docker compose, potentially replacing them in the near future (now works with HTTP only). Added output files and container clean up after tests.

BorisYourich

Very well done, thank you so much !

BorisYourich and others added 8 commits May 5, 2025 10:49

Added condition in docker.py for stage in and out. Added additional l…

68e5b86

…ogic/rewrote code for those conditions in event_actions.py: initialization, conditional command building, command joining, empty command check, singularity placeholders, error handling/logging

commit for issue #36

ad62fea

Update README.md

7383e66

for issue #36

update for issue #36, docker profiles function as desired

9ac548f

Update README.md

4d43342

issue #36

fixed stuck loop caused by cancelation/deletion of a job with galaxy …

ce1fca3

…client, CESNET/usegalaxy#153

Update ci.yaml

967c77a

Update README.md

26c7a64

micoleaoo requested a review from BorisYourich May 26, 2025 12:01

micoleaoo self-assigned this May 26, 2025

martenson reviewed May 26, 2025

View reviewed changes

martenson mentioned this pull request May 27, 2025

Galaxy unable to delete jobs submitted through TES Pulsar library CESNET/usegalaxy#153

Closed

cleaned up comments, docstrings

08598eb

micoleaoo linked an issue May 27, 2025 that may be closed by this pull request

Docker compose profiles for pulsar omission #36

Closed

BorisYourich requested changes Jun 9, 2025

View reviewed changes

reverted changes: temporary variables back to direct assignement and …

3957c86

…lambda parameter name back to updated_task

micoleaoo requested a review from BorisYourich June 13, 2025 08:08

BorisYourich reviewed Jul 7, 2025

View reviewed changes

tesp_api/api/endpoints/task_endpoints.py Outdated Show resolved Hide resolved

micoleaoo added 4 commits July 7, 2025 10:36

comment delete

418cf28

added docker profile for pulsar service

1da5dab

added docker profile for pulsar service

ddcc337

resolved conflicts

7177e7e

micoleaoo force-pushed the dev branch from d3c7c09 to 7177e7e Compare July 14, 2025 10:18

micoleaoo added 3 commits July 14, 2025 12:35

Update ci.yaml

135c1a7

Update ci.yaml

224a484

updated dockerfile image

35babf6

BorisYourich reviewed Jul 14, 2025

View reviewed changes

micoleaoo added 3 commits July 14, 2025 14:51

Update README.md

2ad5161

Added directory IO test which functions with upload_server.py instead…

3abf4a6

… of DTS services through docker compose, potentially replacing them in the near future (now works with HTTP only). Added output files and container clean up after tests.

Update ci.yaml

4033daf

micoleaoo added 3 commits July 17, 2025 14:20

Update ci.yaml

958d585

Update ci.yaml

83203d2

Update ci.yaml

169f6ca

BorisYourich approved these changes Jul 21, 2025

View reviewed changes

BorisYourich merged commit 9292d9b into main Jul 27, 2025
2 checks passed

Docker profiles, container omission, fixed job cancellation #43

Docker profiles, container omission, fixed job cancellation #43

Uh oh!

Conversation

micoleaoo commented May 26, 2025

Uh oh!

martenson May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BorisYourich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BorisYourich May 28, 2025

Choose a reason for hiding this comment

Uh oh!

micoleaoo Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

BorisYourich Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

micoleaoo Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

BorisYourich Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

BorisYourich Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

micoleaoo Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BorisYourich Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BorisYourich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

martenson May 26, 2025 •

edited

Loading

BorisYourich left a comment •

edited

Loading

BorisYourich Jul 14, 2025 •

edited

Loading