Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changed the feature stream pipelines #12

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mhy-kevin-dev
Copy link

Hi, Kumar,

I have found that the feature stream in Kaldi seemed to do "add-deltas..." first, and then do "splice-feats ...", so I did a small fix for steps_kt/dataGenerator.py and steps_kt/decode.sh on the feature pipelines from "splice-feats... | add-deltas..." to "add-deltas... | splice-feats...".

I have tested this modification on different corpora, and the performances could be improved in most cases.

Thanks,
kevin yang

changed the feature pipelines from "splice-feats ... | add-deltas ... "  to "add-deltas ... | splice-feats ... "
@dspavankumar
Copy link
Owner

Hello Kevin,

Thank you for the suggestion. Can you comment on how big the WER difference is, and on how many hours of data the training setup used?

When I wrote the code I followed CMU's Kaldi+PDNN setup as the reference, which did splicing first and then adding deltas:
https://github.com/yajiemiao/kaldipdnn/blob/master/steps_pdnn/build_nnet_pfile.sh

Thanks,
Pavan.

@mhy-kevin-dev
Copy link
Author

Hi, Kumar,

I tested this on TIMIT and MATBN (about 40 hours).
The alignments were determined by Viterbi alignment on the training data following the Kaldi's recipe.

  • The PER improvements were ~0.3% on TIMIT with MFCC (monophone alignments), but had relatively large improvements (~1%) with fMLLR (LDA+MLLT+SAT alignments).
  • The WER improvements were ~0.2% with FBANK+Pitch (LDA+MLLT+SAT alignments), ~0.4% with MFCC (LDA+MLLT+SAT alignments) on MATBN.

However, I have also noticed that the number of pdfs(or alignments) on TIMIT & WSJ are different between your settings and mine.
(e.g. My TIMIT monophone GMM have 144 pdfs, 2012 pdfs in LDA+MLLT which generated from the Kaldi's recipe)
I am curious that perhaps this reason cause the improvements when I use this feature pipelines.
Maybe you can try this feature stream on your experiment setting :)

On the other hand, I followed the Kaldi nnet1 recipe:
https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/nnet/train.sh
(line 227: Add deltas; line 250: Set $feat_dim; line 255: Make default proto with splice)

This script appended "add-deltas" into $feats_tr (or $feats_cv), and set current feature dimension to $feat_dim at line 250.
The splice part were determined at line 255 in feature_transform_proto which contains the splice InputDim, OutputDim:
<Splice> <InputDim> $feat_dim <OutputDim> $(((2*splice+1)*feat_dim)) ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants