-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hi all
I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.
I observed that CTC objective function was increasing and finally converged during training.

But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure
In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure

Do you have same situation when you train NN-CTC for sequence labeling problem? I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.
Any idea about this result?
Thank you for reading my question.
