feat: add number token loss implementation #38960
Closed
+965
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat: Add Number Token Loss (NTL) implementation
Issue: #38950
Add Number Token Loss (NTL) to improve language model performance on numerical tasks.
NTL addresses the fundamental limitation of cross-entropy loss on numerical tokens
by incorporating ordinal information into the training objective.
Key features:
Implementation includes:
The loss is designed to augment cross-entropy for tasks involving numerical
reasoning, mathematical operations, and any scenario where token ordering
matters for numerical values.
Resolves the issue where predicting "6" vs "9" for target "5" yields
the same cross-entropy loss, despite "6" being numerically closer.