-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rules created by wma_prod after workflow archival #12246
Comments
@anehnis thank you for creating this ticket. I've done some refactoring to the original description above. About the output data flow, we have a summary of it documented in this section: https://cms-wmcore.docs.cern.ch/training/data_flow/#output-data-flow On what concerns the 4 options described above, I am slightly inclined to the option 2)., hence insuring that DBS3Upload and RucioInjector do not inject data for workflows that are no longer "relevant". It is still not clear to me what will be the most efficient way to answer the question: "do I still need this output data?". The few possibilities I see are: If we find such case - a file and/or block that needs to be inserted into Rucio (and DBS) - we should skip the injection into the external service and mark it as completed on the component side. For DBS, this would be done through For RucioInjector, it looks like we would have to call One of the questions I have is, can we update this with an invalid/null rule id? Should we default it to a fake rule id? NOTE: I feel that we have to refactor how the polling cycle of these components would work as well. Either we: |
@anehnis as we discussed today, there is a second option to deal with this which would also resolve a very-long standing issue described in this ticket: #8148 In short, we would couple workflow completion with data injection into DBS and Rucio. In other words, whenever the agent identifies a workflow ready to be completed, it would ensure that both DBS3Upload and RucioInjector components expedite data injection for that given workflow. As we discussed, components don't talk to each other, only through database information. So this development would involve identifying workflows ready-to-complete and prioritize their output data in those components. Let me record here what we just discussed. Challenges
Questions (and some draft answers)
Developments
Please add anything that I might have missed from our discussion. |
Impact of the bug
Rules are created for wmaprod account that will never be cleaned by MSRuleCleaner
Describe the bug
An instance was seen with a workflow that was aborted on Dec 24. The agent (vocms0254) had a backlog of ~10 days. It parsed the merge jobs on Dec 31 which then caused dbs3upload and rucioinjector to act. Rules were made for wmaprod and they won't be cleaned up as MSRuleCleaner has already archived the workflow.
Describe the solution you'd like
After a given stage of the workflow, the agent should no longer inject data into DBS and Rucio, as it can cause other type of problems, for instance:
Describe alternatives you've considered
There are multiple ways to address this issue, here are some:
wma_prod
rules that should be deletedAdditional context
Included are some logs, @amaltaro was able to find for this example. time_travel_rules_debug.txt
The text was updated successfully, but these errors were encountered: