-
Notifications
You must be signed in to change notification settings - Fork 22
GLOWS l1a validation corrections #2584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
GLOWS l1a validation corrections #2584
Conversation
tech3371
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor questions. But the code changes looks good.
|
|
||
| # block header and flags are handled differently, so not tested here | ||
| # "number_of_spins_per_block" is a special case and handled specifically | ||
| # (validation data is incorrect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there plan to fix that mismatch from validation data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into it. I don't think it's worth putting tons of time in because this is a pretty minor mismatch, but I can try and re-run the validation data through the new version of the validation code and see if that produces the new output.
| # Filter out DE records with no direct_events (incomplete packet sequences) | ||
| de_l1a_list = [de for de in de_l1a_list if de.direct_events is not None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greglucas, you just did that for CoDICE. I am wondering if we should refactor incomplete packets filter as general utils function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did add a warning for missing sequence counts for a given apid, but that is only when using the packet_file_to_datasets function which I don't think GLOWS is. But, I think that every instrument will want to handle these differently because some may want to keep it as a warning, others may have different conditions they want to consider on what a valid sequence of packets is.
What happens here if you have some direct_events but not all of the sequence, so not all of the data is missing and it is actually partially there. Maybe this is handled in creating the values in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that case is handled later. The missing sequences are in an attribute for tracking. I did hear from Marek about some related changes surrounding duplicate DE packets, so this piece may get updated again soon.
| if self.number_of_bins_per_histogram % 2 == 1: | ||
| self.histogram = self.histogram[:-1] | ||
|
|
||
| if self.number_of_bins_per_histogram != len(self.histogram): | ||
| logger.warning( | ||
| f"Number of bins {self.number_of_bins_per_histogram} does not match " | ||
| f"processed number of bins {len(self.histogram)}!" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is trimming self.histogram variable array and then comparing. Looks like both are if statement and not elif. Is that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I only want to trim one byte off the end, because that's the only valid case. Then, I check if the length mismatches - if it does, that means I either have much less data than expected, or more, which are both error cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to raise or logger.error it instead of a warning then?
Does this mean that the flight software is always transmitting an even number of bytes then so you are taking care of that above? That seems somewhat odd to me and I'm not sure why they would send an extra byte down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the flight software always transmits an even number of bytes so if there is an odd number of bins, we end up with an extra zero at the end. This is something that came up in MAG too so I think this is what is occurring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to raise an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you what you want to do. There are pros and cons. It does fail loudly, but that also means it fails loudly and other people get upset with us for not producing the good data that was there 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I would rather this just be evident if I am looking. If the data is as expected, then does it really matter...? I don't really use number of bins in the later code, it's all based off the histogram lengths anyway.
greglucas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right. But, one thing to bring up is that on CoDICE for example we look at the number of events and then trim the byte array down based on that. I think that is similar to what is going on here, but not exactly the same. Would you be able to do something like:
actual_hist = BYTEHIST[:NBINS]because there is the value of number of bins in the XTCE definition. Another option would be to update the XTCE to use that parameter type as a fixed value for number of bins too rather than making it a variable length field.
|
@greglucas There is a check which compares the number of bins to the histogram length, is that sufficient for what you're saying? I'm a little worried about losing data by blindly chopping. |
Yes, I think we are basically on the same page here. I am not sure exactly what GLOWS is doing, so just threw out a few different ideas that are essentially the same. I don't know why CoDICE does that byte padding and then does this chopping as well. I am a bit unclear why you chop the one byte off though and that wouldn't be captured in the NBINS parameter. 👍 still have my approval as-is right now. |
Change Summary
Some GLOWS Validation corrections for L1A with 4 repoint validation data set.
Overview
The bulk of the changes are to accommodate a bad assumption in the original code about the number of bins in a histogram. Each timestamp can have a varying length of histogram bins.
To preserve the larger dataset across all files, this creates the length of the histogram as 3600, and just fills in the portion of the data for the shorter bin counts. I think this aligns with ISTP's requirements about full dataset standardization.
This did require migrating the packet decommutation settings to a variable length dataset rather than always assuming a fixed number of bytes.
Testing
The spins in the existing validation data in the code have an off by one error, so that test needed an update.