Skip to content

Add exact structured Viterbi decoder alongside the legacy path#46

Open
ssmall256 wants to merge 1 commit into
maxrmorrison:masterfrom
ssmall256:pr/structured-viterbi-conservative
Open

Add exact structured Viterbi decoder alongside the legacy path#46
ssmall256 wants to merge 1 commit into
maxrmorrison:masterfrom
ssmall256:pr/structured-viterbi-conservative

Conversation

@ssmall256

Copy link
Copy Markdown

Summary

This PR adds an exact structured Viterbi decoder to torchcrepe while
preserving the current decoder behavior under the existing viterbi alias.

The main goal is to make the faster decoder available for direct comparison
without forcing an immediate default change.

What This Adds

  • torchcrepe.decode.viterbi_legacy
    • preserves the current softmax -> numpy -> librosa.sequence.viterbi path
  • torchcrepe.decode.viterbi_banded_fast
    • exact structure-aware decoder for the existing local transition graph
  • torchcrepe.decode.viterbi
    • still points to the legacy path in this PR
  • CLI decoder selection now accepts the explicit decoder names
  • focused parity tests
  • scripts/benchmark_decoders.py for decoder-core and real-audio comparisons

Why

torchcrepe already uses a strongly local transition topology in its Viterbi
decoder, but the current implementation still routes through a generic dense
librosa.sequence.viterbi call.

This PR keeps the dynamic program exact while replacing the dense predecessor
scan with a structure-aware recurrence over only reachable predecessor states.

This is not an approximation and does not change the decoded path on the
exercised parity workloads.

Benchmark Notes

Local benchmark results on cached decoder inputs showed a large decoder-core
win for the explicit fast path:

  • synthetic decoder core:
    • 512 frames: about 10.5x faster
    • 2048 frames: about 10.6x faster
  • cached real-audio decoder on 2914 frames:
    • about 461.9 ms legacy -> 42.4 ms fast

End-to-end gain depended on how much of runtime decoding occupied:

  • tiny: 522.8 ms -> 95.0 ms
  • full: 2221.7 ms -> 1942.9 ms

I am intentionally not using those measurements here to argue for an immediate
default switch. The point of this PR is to land the explicit fast path and the
comparison surface first.

Parity

Focused local parity checks were exact for the exercised workloads:

  • decoded bin path
  • decoded pitch output

Parity tests are included in this PR.

Validation

Commands used locally:

python -m pytest tests/test_decode.py tests/test_cli.py -q
python -m py_compile \
  torchcrepe/decode.py \
  torchcrepe/__main__.py \
  scripts/benchmark_decoders.py \
  tests/test_decode.py \
  tests/test_cli.py

Reviewer Notes

  • viterbi remains unchanged in this PR.
  • The new path is fully explicit as viterbi_banded_fast.
  • If maintainers prefer, a follow-up discussion can decide whether the public
    viterbi alias should eventually move after broader review and benchmarking.

Context

This patch comes from a broader cross-repo study of exact structured Viterbi in
pitch trackers. That study produced repeated positive transfer in penn-mlx,
vendored upstream penn on both CPU and MPS, mlxcrepe, libf0, and
librosa.pyin. torchcrepe was one of the strongest positive transfer cases,
which is why I am proposing the additive decoder surface here first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant