Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly derepcate WmAgentScripts from wmagents #12263

Open
2 tasks
mapellidario opened this issue Feb 13, 2025 · 1 comment
Open
2 tasks

Properly derepcate WmAgentScripts from wmagents #12263

mapellidario opened this issue Feb 13, 2025 · 1 comment

Comments

@mapellidario
Copy link
Member

Impact of the new feature

All WMAgents

Is your feature request related to a problem? Please describe.

/usr/libexec/condor/condor_job_router failed on vocms0254 (2025-02-11, 4:36 PM cern time) and vocms0283 (2025-02-07, 5:28 PM cern time).

The condor error stated

02/11/25 16:36:44 my_popenv: Failed to exec /data/srv/WmAgentScripts/Unified/go_condor.py, errno=2 (No such file or directory)

I think it is a transient error, since I can ls the file in both affected machines [1], so I do not consider this issue to be a "bug".

Describe the solution you'd like

Since It has been confirmed that these scripts are deprecated since ages and since they cause transient problems, I would suggest to properly clean them up. a condor restart is a heavy operation that i would prefer to avoid at all costs.

The cleanup should cover

  • remove job router / overflow configuration
    • current puppet makes sure that the configuration is present if jobrouting is enable, but does not remove it if it is disabled. in order to remove the configuration, we should not skip the file directive, but we should set the ensure argument to absent, as done for example here, read official docs
  • since we do not use the wmagentscripts in the schedd anymore, we should remove the directory /data/srv/WmAgentScripts changing this ensure here to absent

Describe alternatives you've considered

we can ignore these condor restarts

Additional context

(none)


[1]

cmst1@vocms0283:data $ ls -l /data/srv/WmAgentScripts/Unified/go_condor.py
-rwxr-xr-x. 1 cmst1 zh 21453 May 31  2024 /data/srv/WmAgentScripts/Unified/go_condor.py
@amaltaro
Copy link
Contributor

Thank you, Dario. We should also remove from puppet templates - both at CERN and FNAL - the cloning of https://github.com/CMSCompOps/WmAgentScripts, which I understand to be stale since the migration to CERN Gitlab last year.

@hassan11196 please let us know if there is anything that we are missing on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants