p(blank symbol) >> p(non-blank symbol) during NN-CTC training

Hi all

I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.

I observed that CTC objective function was increasing and finally converged during training.
![image](https://cloud.githubusercontent.com/assets/10232337/8330191/8a6a7da4-1aba-11e5-92ce-752303d35503.png)

But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure

![image](https://cloud.githubusercontent.com/assets/10232337/8330115/1debd722-1aba-11e5-9772-81f099dd4fda.png)

In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure
![image](https://cloud.githubusercontent.com/assets/10232337/8330169/6d3d2f92-1aba-11e5-8f8e-f70c2c8b9419.png)

Do you have same situation when you train NN-CTC for sequence labeling problem?  I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.  
Any idea about this result?

Thank you for reading my question.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions