-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data avaibility per tag #155
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…nto data_avaibility_per_tag
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…nto data_avaibility_per_tag
for more information, see https://pre-commit.ci
…nto data_avaibility_per_tag
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…nto data_avaibility_per_tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @GiovanniVolta !
Thanks for this PR. I think a tool like this could be useful, but I still have some doubts about some things and parts of the implementation.
- Is this tool not very similar to xefind? What is the use case of this vs xefind? I know that xefind checks for data availability in another way, and relies on the runDB (that needs to be updated by the people that do reprocessing, usually its automatic so it works). I would stress maybe more on requesting that the DB is up to date rather than proposing another tool? Also, the good thing of xefind relying on the DB is that nothing can go wrong with the settings of environments, locally installed packages etc.. so you just know it should be there, then if you don't see it you start asking yourself what you are doing wrong. With this tool I am scared if people do something wrong (very easy, just imagine you have straxen locally installed and this tool fails) - you just straight away believe that the data is not even supposed to be there.
- If we still want to propose this tool, I think some changes would be necessary. Mainly I think we should run the pytohn script that does the is_stored in a job. Why? Because if you submit a job you use all the tools we already have set up that take care of providing you with the right context, the right sorftware with singularity, the right cutax etc, and the right paths set to the storages etc. I am referring especially about the treatment of cutax and the paths and all. This is all not needed if submitting a job to singularity. I think can be dangerous to have it also here, it is another thing to keep up to date and again it will easily pass the message that data is not there when it should be there.
Let me know what you think about those points! Might be that I am wrong and if you still think you want to push this it's okay for me.
Thank you Carlo, your points are very valid and indeed this script is better to have it in another place rather than in utilix. Just for the record, I made this script to keep the data availability page up to date. I could not do it with |
No description provided.