Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time domain de-spiking #377

Open
kkappler opened this issue Mar 1, 2025 · 0 comments
Open

Time domain de-spiking #377

kkappler opened this issue Mar 1, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@kkappler
Copy link
Collaborator

kkappler commented Mar 1, 2025

How would a time domain despiking code plug into aurora / mth5 ?

The workflow breaks into spike identification, and replacement. Thinking about the 2011 Time Domain Despiking paper as an example The identification method was based on time domain "features" of the data, essentially the ratio of the windowed data variance w.r.t. a quiet remote station. Thus the configuration would involve specifying, for each channel to evaluate, a remote reference channel, that is normally quiet.

Given a (survey, station, channel) tuple to clean, this would map to a (reference_survey, reference_station, reference_channel). We could create a time domain feature container inside the mth5 (there is currently a branch for features under active development, enabling the user to store features in the data). The features would be "windowed data variances" per channel, and "log variance ratios" per channel-pair (sort of like how cross-powers are only defined in a channel-pair context). Then the "spikes" could be a boolean feature, essentially a threshold applied to the "log variance ratios". That would provide a list of time intervals that should be replaced in a per-channel sort of way.

Now the slightly more complicated part of the process would be to compute a time domain impulse response operator (IRO) between the channel to repair, and the reference station channels. This involves:

  • Identification of some reasonably quiet data at both sites (I am not sure how long of a time series is needed to for this, I think just a few minutes would probably be OK at normal ULF frequencies, .. For AMT, this would get a bit more complicated because you would want to estimate the IRO when signal-to-noise ratios are high (coherent) between the stations — leaving that aside for now...
  • A software method to compute the IRO between a channel and some number of input channels (multiple input single output system)
  • The output of the software method should be stored somewhere in the MTH5 metadata, I'm not sure where yet ... It could be FFT-ed and stored as a sort of auxiliary transfer function I suppose ... Needs some thought
  • Then the cleaned data could be initialized as a copy of the original time series. (This would need to be done per MTH5 data acquisition "run").
  • At this point, one would loop over the "log variance ratio" features, loading some (user specfied) width of data around the spike, and then the spike data would be replaced with the match-filtered synthetic data.

There would definitely be some tooling needed to make this efficient, as loading the whole time series, and rewriting it per every spike back to disk would be slow, ... Probably load a whole run, replace the spikes and then write the whole run back would be fastest.

@kkappler kkappler added the enhancement New feature or request label Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant