Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoregressive prewhitening (per channel) #60

Open
kkappler opened this issue Aug 7, 2021 · 4 comments
Open

Autoregressive prewhitening (per channel) #60

kkappler opened this issue Aug 7, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@kkappler
Copy link
Collaborator

kkappler commented Aug 7, 2021

First difference prewhitening mitigated most of the spectral leakage artefacts in the synthetic test dataset. Gary suggested that we add autoregressive prewhitening as well, say to order 3 or 4. This would be implemented per channel, and thus would need to recolored.

This is a different issue than Gary's segment-by-segment prewhitening from the original FORTRAN codes, which is a seperate issue but can be implemented (see issue #3 for some details)

@kkappler
Copy link
Collaborator Author

kkappler commented Dec 9, 2021

To Do this practically we need:

  1. A way to fit an ARMA(p,q) model to an input time-series that returns as a function of input arguments p, q, two sequences of floats, (ar_1, ar_2, ... ar_p), & (ma_1, ma_2, ... ma_q),
    While these are expressed as coefficients in a time domain model, they may be equivalent to a polynomial fit of the same order to the frequency domain spectra ... in which case more terms would be needed when spectral lines were present ... in general, this method suffers from a lack of justification for the highest order term
    https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model
    https://www.informs.org/content/view/full/273349
    https://www.statsmodels.org/v0.11.1/_modules/statsmodels/tsa/arima_model.html#ARIMAResults

@kkappler
Copy link
Collaborator Author

kkappler commented Aug 7, 2022

This approach, and prewhitening in general can be applied either to the whole time series before windowing/detrending, or after windowing it can be applied in a "window-by-window" way. Obviously for first (or second) difference there should be little if any difference, but when the prewhitening involves fitting a statistical model, this choice of when along the pipeline to apply this transformation matters, i.e. we expect slightly different ARMA coefficients for each window, and moreover, these window-by-window fits will differ from the ARMA coefficients that wouls be obtained if we fit a model to the entire time series.

In general the pipeline should be modified to give the user the option (via config) of when to apply prewhitening

  • give the user the option (via config) of when to apply prewhitening

@kkappler
Copy link
Collaborator Author

I took a crack at this on the fix_issue_60 branch, but it was not trivial. ToDo:

  • Setup a discussion with Gary about how to best implement this

@kkappler
Copy link
Collaborator Author

kkappler commented Nov 5, 2022

Created a jupyter notebook called arma_primer.ipynb which shows how to use statsmodels package to fit an arma model and handle a few gory details. At this point I am satified that given an arbitrary time series we can prewhiten it with ARMA, then FFT and then compensate for the ARMA coefficients on the frequency domain side.

The following need to be addressed:

Note: this ticket is about adding prewhitening per channel, but there has been no statement yet about whether the operation happens within a channel "per-window" or "per time series". Discussions with Gary hinted that "we may as well" just fit to the entire time series, but those discussions predated observations that the time series can get very (very) long ... i.e. larger than RAM in some cases. Because of this, I think we need to specify a prewhitening window length. The default window lengths can be either "whole time series", or "fft window" ... anything else will wind up being complicated, for what are likely diminishing returns.

Ultimately, I had hoped to show that ARMA PW on the "whole run" would yield PW quality around the same as if we had applied it "per FFT window", because in that case, I had hoped to deprecate the intrinsic windowing code and just use the built-in spectrogram method of scipy.signal. However ... showing this, in any sort of definitive way seems like a study on it's own, and is in some sense beyond the scope of the current aurora objectives.

So I think we want to

  • Add support in the processing metadata for this operation
    • prewhitening type = "first difference", "ARMA, P, Q, fft_window", "ARMA, P, Q, total_run".
      At risk of both a performance hit, missing a novel approach, and simplicity, I will implement the 'fft_window' method first, and possibly create a follow-up ticket for 'total run'
  • Put a tap in the code to transform a data window into data' = {ARMA coefficients, ARMA residuals}
  • [ ]write a method that acts the same as run_ts_to_stft, taking as input the run_ts, and treating it as data' (or parsing it into many data' s) and returns the FFT array, with appropriate recoloring ... *this one requires careful thought,
  • polish the algorithm that actually computes and then applied the ARMA transformations
  • stitch it all together and test on the synthetic
  • Don't forget to look at ARIMA, which is to say, we need to decide
    • whether or not to first difference the data up front, and how many times, and
    • whether the differencing happens before or after calling statsmodels
      ....in fact, might it be true that a stasmodels ARIMA(0,1,0) on total run is equivalent to first difference filtering???

Finally, consider that once the time series is decomposed into white noise and an ARMA filter, the FFT of the ARMA filter is still being transformed in the ipynb with scipy.signal.freqz, which calls fft under the hood ... aren't we subject to leakage from that transformation? If so, how do we know it is less severe than fft without the ARMA decomposition? Empirically it looks like there is less leakage (high frequencies of the spectra have lower amplitudes) but it isn't obvious (to me) how exactly the leakage is being mitigated, nor by how much.

@kkappler kkappler self-assigned this Nov 10, 2022
kkappler added a commit that referenced this issue Nov 14, 2022
-it works for low order MA, but not for ARMA(2,3) for example
Issue(s): #60
kkappler added a commit that referenced this issue Nov 14, 2022
THese are incomplete and I had intended to merge them all toghether, but
after noticing the bug in arma_primar, I want to get an archived copy of
primer4.

Primer4 doesn't suffer infinitely lfilter response because the time
series is only 1024 points, not 2048

[Issue(s): #60]
@kkappler kkappler removed the Phase 2 label Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant