Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Runs in Time Domain #152

Open
kkappler opened this issue Feb 13, 2022 · 1 comment
Open

Merging Runs in Time Domain #152

kkappler opened this issue Feb 13, 2022 · 1 comment

Comments

@kkappler
Copy link
Collaborator

Issue #80 proposes a solution to merging runs that is done in Frequency domain. This is a simple and fairly general solution but it does not allow us to take advantage of potentially longer period data that could be available if some small gaps were filled. This relates to issue #66

In general that would require time series processing and the merging of runs in Time Domain. This would probably require a new class MergedRunTS or something like that.

Time Domain Run Merging can be done according to one of two schemes:

  1. Nan-fill (or effective Nan fill
  2. Interpolate / replace with numeric data

In all cases, merging runs implies a the existence of a gap, and the gap will have either numbers or nans in the time series array, or could be designated by an undefined chunk, i.e. a discontinuity in an array.

Nan-fill is a nice, simple solution, that allows generically for numeric overwrite, without a structural modification.

Two things can go wrong with Nan-fill:
A. The gap could be very large. We may then generate an absurdly long time series ... and possibly cause RAM problems. That could be solved by reading from the MTH5 on an as-needed basis, effectively chunking from one filesystem to another. Open a "receiver of FCs" h5 and then
read-->process-->write until the job is done.
B. Nan in the time-series can cause issues with anti-alias filters (during decimation) or other issues in the time series processing. Standard workarounds for this involve replacing the gap (with zeros or an estimate), processing, and then assigning nan to data in STFT-land where there were gaps in TS-land since FC processing is robust to Nan

@kujaku11
Copy link
Collaborator

@kkappler I've tried both ways before and had better estimates when the gap was filled with the median value of both sides of the gap. The FC's from the gap are usually tossed in the robust processing and it just seems like easier book keeping.

But the program I was using didn't have nan support so maybe that could be as simple as using a masked array? Which could be useful for down the road when the user is able to mask bad data from the time series viewer.

Xarray natively has a "fill" method and gives you the choice of nan or some other value.

Suggest having a variable for the maximum gap length to support, like 20 seconds or something related to the sample rate or number of samples to minimize absurd padding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants