-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File format definition overhaul #357
base: master
Are you sure you want to change the base?
Conversation
It's nothing official, but I played with using the term |
Thanks for tackling this. 👍
I think we'll ultimately want a record of what changed between versions, but most of the contents of the page can be focused on the latest version. Hopefully it won't change very often. The changes section should also make it clear that this is only describing the file structure, not the contents. E.g. if the axis order for a detector changes, that's a problem and people are going to be annoyed about it, but it's beyond the scope of this document.
I think they're different enough in the files that we have to - e.g. control data always has value & timestamp datasets.
We'll have to do that case-by-case, really. For the specific case of an empty RUN group, I don't think we need to document it. It's unlikely anything relies on this group existing, and we don't want to be bound to always create it.
I think the rest all have some sort of general meaning (creationDate, dataFormatVersion, proposalNumber, runNumber, sequenceNumber, updateDate). But proposal & run numbers assume data is always collected as part of a run - maybe we should write down what's expected if you e.g. write simulated data, or dump data from Influx for a custom time period, or write out combined data from multiple runs.
It looks like the first round of experiments in 2017 were saved in that format, but then by 2018 it had been replaced with the first/count format. This was before we'd got version numbers for the format, so it just happened. It's probably a historical curiosity at this point. |
de7601d
to
a276150
Compare
a276150
to
26573fd
Compare
As part of the PRWG discussions and unrelated discussions in the CAL team, we realized our "documentation" of the EuXFEL file structure is both the only one existing as well as out of date. In addition as part of the former, it would be useful to describe the file format in a more generic way and not only bound to files written by the DAQ from Karabo.
This is a first draft of this. I expect a lot of discussions and further work on this. The intention was to describe the structure as abstract as possible to how and what data actually ends up there, yet still with comments to how the "typical use case" in the form of recorded DAQ runs looks like.
I left some of my own open questions in the document, but a brief summary:
1.2
) and only track changes?RUN
top-level group even if there's noCONTROL
groupMETADATA
datasets? What about custom datasets, like forpycalibration
?first/last/status
datasets inINDEX/<source>
? Never saw those myself, move to appendix or ignore entirely?You can find a built version of this branch here.