I tested the NADE model and trained it for 3500 epochs, but it did not learn the MNIST dataset properly. Below is my output.

I would like to understand the possible reasons for this. Could it be that 3500 epochs is still too low or is there a potential issue with the model itself?
Any insights would be greatly appreciated.
Thank you!