-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eignevalues of learnt system do not match those of the generating process #14
Comments
Is there a reason you think they should match? Maybe in the infinite data limit, if you estimate the parameters with maximum likelihood, though I'm not 100% sure about that. |
If I'm not wrong in maximum likelihood with a simple system like this the model should converge quite fast because the dimension of the LTI is predefined (2) Now with original eigenvalues =[ 0.5 0.5] over 2 optimization runs I obtain: Looks like there is a prior on the models to obtain eigenvalues that are different and related. My intentions is to learn the model of a process and using the learnt model to do online predictions after observing a sequence of input output steps |
If you want to use maximum likelihood with no switching, you should probably use pylds and the EM algorithm. Whether a dynamics estimate is consistent would have to depend on other factors, like whether the system that generated the data is observable. Priors wouldn't matter with EM, but just look at the implementation of DefaultLDS; the |
Thanks a lot, sorry for taking your time. I actually need the Switching LDS but I wanted to check how the system would behave in the simplest case. The system I use for generating data in the simple_demo modification should be observable:
PS what are mu_inits, sigma_inits and init_dynamics_distns? |
No problem, this would be good to figure out :) That example should indeed be observable. Another thing to keep in mind is that EM (and other alternating algorithms like Gibbs) can get stuck in local optima, so it depends on where you initialize. One way to initialize is with something like 4SID, which is asymptotically consistent (see also Byron Boots's thesis). I started an implementation to use with pylds / pyslds but never finished it. You could just try initializing near the answer. The priors might also be unintentionally strong, but I would have to dig into the code to understand them. T=8000 seems like it would be plenty of data to overwhelm a reasonable prior. The initial dynamics distributions (one for each state, each parameterized by a The representation of uncertainty depends on what algorithm you're using. If you're using EM, there is uncertainty explicitly represented in the state but not in the parameter estimates. If you're using variational mean field, the uncertainty on the parameters is estimated via variational factor on the parameters in the same form as the prior (e.g. a matrix normal inverse Wishart for each dynamics matrix). If you're using block Gibbs sampling, then the uncertainty is encoded in the iterates of the Markov chain, so that you can form MCMC estimates using those samples (e.g. to estimate posterior variance of some parameter, you would compute the sample variance of the Gibbs iterates of that parameter). Each call to Hope that helps. I don't have time to dig into the details of your example right now, but let's keep this issue open until you solve it. |
Thank you so much, just for completeness these are the details I used for the training:
|
As you suggested the problem probably is using the gibbs sampling approximation. But it is not trivial to extract a mixture parameters from samples due to label switching. I tried to run the example meanfield.py but I get this error
And I fail installing the git clone git://github.com/mattjj/pyhsmm-autoregressive.git the error is: |
Hi Matthew, Using a code similar to that of simple_demo.py I compared gibbs (MC) and the meanfield (MF) computing the permutation of the hidden states and of their eigenvalues that had minimum distance (eig distance) from that of the generating process. Varying K (number of discrete states) I obtained:
Here MF gets consistently better results than MC
Here MC occasionally shows much better results than MF
In this case MF is behaving worse than MC I would still prefer MF as it should not be affected by label switching problems as MC probably is (but I'm not sure if it is the case). One approach to extract the models from MC even in face of label switching (which is also relevant for this issue ) would be to try to align the eigenvalues as I did to find the permutation with minimum distance on the eigenvalues and use the corresponding transformation to combine models extracted from different samples. I also still don't understand the poor performance of the MC when K is 1. |
I modified simple demo to check how accurate was the model reconstruction.
I found that even with simple configuration the eigenvalues do not match.
The number of hidden states,K, 1 and both the test and true models where of class DefaultSLDS
The text was updated successfully, but these errors were encountered: