Add exact structured Viterbi decoder alongside the legacy path#46
Open
ssmall256 wants to merge 1 commit into
Open
Add exact structured Viterbi decoder alongside the legacy path#46ssmall256 wants to merge 1 commit into
ssmall256 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an exact structured Viterbi decoder to
torchcrepewhilepreserving the current decoder behavior under the existing
viterbialias.The main goal is to make the faster decoder available for direct comparison
without forcing an immediate default change.
What This Adds
torchcrepe.decode.viterbi_legacysoftmax -> numpy -> librosa.sequence.viterbipathtorchcrepe.decode.viterbi_banded_fasttorchcrepe.decode.viterbiscripts/benchmark_decoders.pyfor decoder-core and real-audio comparisonsWhy
torchcrepealready uses a strongly local transition topology in its Viterbidecoder, but the current implementation still routes through a generic dense
librosa.sequence.viterbicall.This PR keeps the dynamic program exact while replacing the dense predecessor
scan with a structure-aware recurrence over only reachable predecessor states.
This is not an approximation and does not change the decoded path on the
exercised parity workloads.
Benchmark Notes
Local benchmark results on cached decoder inputs showed a large decoder-core
win for the explicit fast path:
512frames: about10.5xfaster2048frames: about10.6xfaster2914frames:461.9 mslegacy ->42.4 msfastEnd-to-end gain depended on how much of runtime decoding occupied:
tiny:522.8 ms->95.0 msfull:2221.7 ms->1942.9 msI am intentionally not using those measurements here to argue for an immediate
default switch. The point of this PR is to land the explicit fast path and the
comparison surface first.
Parity
Focused local parity checks were exact for the exercised workloads:
Parity tests are included in this PR.
Validation
Commands used locally:
Reviewer Notes
viterbiremains unchanged in this PR.viterbi_banded_fast.viterbialias should eventually move after broader review and benchmarking.Context
This patch comes from a broader cross-repo study of exact structured Viterbi in
pitch trackers. That study produced repeated positive transfer in
penn-mlx,vendored upstream
pennon both CPU and MPS,mlxcrepe,libf0, andlibrosa.pyin.torchcrepewas one of the strongest positive transfer cases,which is why I am proposing the additive decoder surface here first.