Skip to content

Conversation

@maxinelasp
Copy link
Contributor

Change Summary

Some GLOWS Validation corrections for L1A with 4 repoint validation data set.

Overview

The bulk of the changes are to accommodate a bad assumption in the original code about the number of bins in a histogram. Each timestamp can have a varying length of histogram bins.

To preserve the larger dataset across all files, this creates the length of the histogram as 3600, and just fills in the portion of the data for the shorter bin counts. I think this aligns with ISTP's requirements about full dataset standardization.

This did require migrating the packet decommutation settings to a variable length dataset rather than always assuming a fixed number of bytes.

Testing

The spins in the existing validation data in the code have an off by one error, so that test needed an update.

@maxinelasp maxinelasp requested review from a team and greglucas January 15, 2026 21:26
@maxinelasp maxinelasp self-assigned this Jan 15, 2026
@maxinelasp maxinelasp requested review from sdhoyt and tech3371 and removed request for a team January 15, 2026 21:26
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor questions. But the code changes looks good.


# block header and flags are handled differently, so not tested here
# "number_of_spins_per_block" is a special case and handled specifically
# (validation data is incorrect)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there plan to fix that mismatch from validation data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into it. I don't think it's worth putting tons of time in because this is a pretty minor mismatch, but I can try and re-run the validation data through the new version of the validation code and see if that produces the new output.

Comment on lines +131 to +132
# Filter out DE records with no direct_events (incomplete packet sequences)
de_l1a_list = [de for de in de_l1a_list if de.direct_events is not None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greglucas, you just did that for CoDICE. I am wondering if we should refactor incomplete packets filter as general utils function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add a warning for missing sequence counts for a given apid, but that is only when using the packet_file_to_datasets function which I don't think GLOWS is. But, I think that every instrument will want to handle these differently because some may want to keep it as a warning, others may have different conditions they want to consider on what a valid sequence of packets is.

What happens here if you have some direct_events but not all of the sequence, so not all of the data is missing and it is actually partially there. Maybe this is handled in creating the values in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that case is handled later. The missing sequences are in an attribute for tracking. I did hear from Marek about some related changes surrounding duplicate DE packets, so this piece may get updated again soon.

Comment on lines +286 to +293
if self.number_of_bins_per_histogram % 2 == 1:
self.histogram = self.histogram[:-1]

if self.number_of_bins_per_histogram != len(self.histogram):
logger.warning(
f"Number of bins {self.number_of_bins_per_histogram} does not match "
f"processed number of bins {len(self.histogram)}!"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is trimming self.histogram variable array and then comparing. Looks like both are if statement and not elif. Is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I only want to trim one byte off the end, because that's the only valid case. Then, I check if the length mismatches - if it does, that means I either have much less data than expected, or more, which are both error cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to raise or logger.error it instead of a warning then?

Does this mean that the flight software is always transmitting an even number of bytes then so you are taking care of that above? That seems somewhat odd to me and I'm not sure why they would send an extra byte down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the flight software always transmits an even number of bytes so if there is an odd number of bins, we end up with an extra zero at the end. This is something that came up in MAG too so I think this is what is occurring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to raise an error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you what you want to do. There are pros and cons. It does fail loudly, but that also means it fails loudly and other people get upset with us for not producing the good data that was there 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I would rather this just be evident if I am looking. If the data is as expected, then does it really matter...? I don't really use number of bins in the later code, it's all based off the histogram lengths anyway.

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right. But, one thing to bring up is that on CoDICE for example we look at the number of events and then trim the byte array down based on that. I think that is similar to what is going on here, but not exactly the same. Would you be able to do something like:

actual_hist = BYTEHIST[:NBINS]

because there is the value of number of bins in the XTCE definition. Another option would be to update the XTCE to use that parameter type as a fixed value for number of bins too rather than making it a variable length field.

@maxinelasp
Copy link
Contributor Author

@greglucas There is a check which compares the number of bins to the histogram length, is that sufficient for what you're saying? I'm a little worried about losing data by blindly chopping.

@greglucas
Copy link
Collaborator

@greglucas There is a check which compares the number of bins to the histogram length, is that sufficient for what you're saying? I'm a little worried about losing data by blindly chopping.

Yes, I think we are basically on the same page here. I am not sure exactly what GLOWS is doing, so just threw out a few different ideas that are essentially the same. I don't know why CoDICE does that byte padding and then does this chopping as well. I am a bit unclear why you chop the one byte off though and that wouldn't be captured in the NBINS parameter.

👍 still have my approval as-is right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants