Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normal Archived workflows with wma_prod rules #12267

Open
hassan11196 opened this issue Feb 19, 2025 · 2 comments
Open

Normal Archived workflows with wma_prod rules #12267

hassan11196 opened this issue Feb 19, 2025 · 2 comments

Comments

@hassan11196
Copy link
Member

hassan11196 commented Feb 19, 2025

Impact of the bug
MS-RuleCleanor

Describe the bug
In an effort to reduce disk space, we have started to delete container levels wma_prod created by agents. In the process of finding these rules. I found around 12.3 PB worth of rucio rules locking data on Disk.
These numbers are only for container-level rules. It is a very high number but please take this number with a grain of salt, as I am not counting by the actual replicas but rather rules, and there is an overlap between the 2 copies of container level rule and block level rule.

Image

Image

Image

Image

Image

Image

Labels:

  • Deleteable -> container rule can be deleted as all block rules are okay.
  • Deleteable-ALL -> container rule can be deleted as all block rules are okay AND tape rule is okay.
  • STUCK -> container rule CANNOT be deleted as all block rules are NOT all okay.
  • BUG -> container rule CANNOT be deleted as block rules do not equal the total number of blocks.

Nevertheless, this number should Ideally be near zero as MS-RuleCleanor should be cleaning rules before the workflows are archived.

How to reproduce it
Here is the list of all such datasets and the csv used to produce this plot
dataset_with_archived_wfs.csv

Expected behavior
No wma_prod rules for workflows that are archived.

Additional context and error message
I will try to find the block level rules as well.

FYI @anpicci @amaltaro

@amaltaro
Copy link
Contributor

Hi @hassan11196 , let me try to understand the problem that you are reporting here. Are you saying that you have found workflows sitting in archived status and yet with existent wma_prod rules locking their output datasets? If so, then suggesting that:
a) either MSRuleCleaner is not cleaning up all of the wma_prod rules;
b) or that the agent is creating rules after a workflow gets archived.

If that is correct, I can confirm that Amanda identified this problem (the case b) above) last month and we are tracking that in this ticket: #12246

At least for the case that we were investigating, it was caused by the ultra slow JobAccountant polling cycle that we had back during Christmas, caused by a misconfiguration on the Oracle side.

@hassan11196
Copy link
Member Author

Hi @amaltaro
I suspected case a and was not aware of case b being a possibility, but case b does makes more sense.

I can still go through the MS-RuleCleanor logs to verify that it's not case a (If you have any tips please do share), and then we can close this issue since we already have the case b issue.

Thank you Alan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants