Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the CTC zipformer2 training #1713

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
fix the CTC zipformer2 training
- too many supervision tokens
- change filtering rule to `if (T - 2) < len(tokens): return False`
- this prevents inf. from appearing in the CTC loss value
KarelVesely84 committed Aug 12, 2024
commit d400bc5edf3a3510d29497b9a7b6b1d1d8eb730d
6 changes: 4 additions & 2 deletions egs/librispeech/ASR/zipformer/train.py
Original file line number Diff line number Diff line change
@@ -1300,9 +1300,11 @@ def remove_short_and_long_utt(c: Cut):
T = ((c.num_frames - 7) // 2 + 1) // 2
tokens = sp.encode(c.supervisions[0].text, out_type=str)

if T < len(tokens):
# For CTC `(T - 2) < len(tokens)` is needed. otherwise inf. in loss appears.
# For Transducer `T < len(tokens)` was okay.
if (T - 2) < len(tokens):
logging.warning(
f"Exclude cut with ID {c.id} from training. "
f"Exclude cut with ID {c.id} from training (too many supervision tokens). "
f"Number of frames (before subsampling): {c.num_frames}. "
f"Number of frames (after subsampling): {T}. "
f"Text: {c.supervisions[0].text}. "