You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
/usr/libexec/condor/condor_job_router failed on vocms0254 (2025-02-11, 4:36 PM cern time) and vocms0283 (2025-02-07, 5:28 PM cern time).
The condor error stated
02/11/25 16:36:44 my_popenv: Failed to exec /data/srv/WmAgentScripts/Unified/go_condor.py, errno=2 (No such file or directory)
I think it is a transient error, since I can ls the file in both affected machines [1], so I do not consider this issue to be a "bug".
Describe the solution you'd like
Since It has been confirmed that these scripts are deprecated since ages and since they cause transient problems, I would suggest to properly clean them up. a condor restart is a heavy operation that i would prefer to avoid at all costs.
The cleanup should cover
remove job router / overflow configuration
current puppet makes sure that the configuration is present if jobrouting is enable, but does not remove it if it is disabled. in order to remove the configuration, we should not skip the file directive, but we should set the ensure argument to absent, as done for example here, read official docs
since we do not use the wmagentscripts in the schedd anymore, we should remove the directory /data/srv/WmAgentScripts changing this ensurehere to absent
Describe alternatives you've considered
we can ignore these condor restarts
Additional context
(none)
[1]
cmst1@vocms0283:data $ ls -l /data/srv/WmAgentScripts/Unified/go_condor.py
-rwxr-xr-x. 1 cmst1 zh 21453 May 31 2024 /data/srv/WmAgentScripts/Unified/go_condor.py
The text was updated successfully, but these errors were encountered:
Thank you, Dario. We should also remove from puppet templates - both at CERN and FNAL - the cloning of https://github.com/CMSCompOps/WmAgentScripts, which I understand to be stale since the migration to CERN Gitlab last year.
@hassan11196 please let us know if there is anything that we are missing on this.
Impact of the new feature
All WMAgents
Is your feature request related to a problem? Please describe.
/usr/libexec/condor/condor_job_router
failed onvocms0254
(2025-02-11, 4:36 PM cern time) and vocms0283 (2025-02-07, 5:28 PM cern time).The condor error stated
I think it is a transient error, since I can
ls
the file in both affected machines [1], so I do not consider this issue to be a "bug".Describe the solution you'd like
Since It has been confirmed that these scripts are deprecated since ages and since they cause transient problems, I would suggest to properly clean them up. a condor restart is a heavy operation that i would prefer to avoid at all costs.
The cleanup should cover
file
directive, but we should set theensure
argument toabsent
, as done for example here, read official docs/data/srv/WmAgentScripts
changing thisensure
here toabsent
Describe alternatives you've considered
we can ignore these condor restarts
Additional context
(none)
[1]
The text was updated successfully, but these errors were encountered: