Conversation
…port from public APIs - metrics.py: add RougeScore TypedDict (precision/recall/fmeasure); rouge_score() now returns dict[str, RougeScore] instead of dict[str, tuple[float, float, float]] (breaking change) - word_tokenization.py: add CharLevelStats, WordLevelStats, GlobalStats, TokenizationStats TypedDicts; update compute_stats() return type; _flatten_result() uses @overload to accept both TokenizationStats and generic Mapping without a cast - benchmarks/__init__.py: export RougeScore, CharLevelStats, WordLevelStats, GlobalStats, TokenizationStats - _fastcoref.py: rename CorefResultDict -> CorefResult; alias the fastcoref.modeling.CorefResult import as FastCorefResult to avoid clash - coref/__init__.py: export CorefResult - coref/core.py: update return type and fallback return to use CorefResult - tag/named_entity.py: improve EntitySpan docstring - tests/extra/testx_benchmarks.py: update rouge tests for named-field access; add test_rouge_score_return_type and test_compute_stats_return_type - CHANGELOG.md: add migration notes for all TypedDicts Co-authored-by: bact <128572+bact@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/de0a9df1-25ba-4e89-a047-7ee0664753c1
Copilot
AI
changed the title
[WIP] Add TypedDicts for complex dictionaries with docstrings
Add TypedDicts for complex return dicts: RougeScore, TokenizationStats family, CorefResult
Mar 24, 2026
- CharLevelStats → CharLevelStat (singular, matches BleuScore/RougeScore)
- WordLevelStats → WordLevelStat
- GlobalStats → GlobalStat
- TokenizationStats → TokenizationStat; converted from functional TypedDict
form to class form now that global_ is a valid identifier
- "global" key → "global_" in compute_stats() return value
- Update __init__.py __all__ and imports
- Update tests: import names, assertIn("global_"), typed annotations
- Update CHANGELOG migration notes
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/d0fa4ca1-06a6-48cc-b353-4df24554e700
Copilot
AI
force-pushed
the
copilot/add-typed-dicts-for-complex-dictionaries
branch
from
March 24, 2026 12:57
61f0c8f to
a1863d4
Compare
- WordLevelStat field: correctly_tokenised_words → correctly_tokenized_words - GlobalStat field: tokenisation_indicators → tokenization_indicators - Private helper: _find_words_correctly_tokenised → _find_words_correctly_tokenized - All call sites in compute_stats(), cli/benchmark.py, and tests updated - CHANGELOG migration example updated Co-authored-by: bact <128572+bact@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/e8eff276-0f77-4d6f-9d4e-d9787e6f414b
Contributor
Author
Copilot
AI
changed the title
Add TypedDicts for complex return dicts: RougeScore, TokenizationStats family, CorefResult
Add TypedDicts for complex return dicts: RougeScore, TokenizationStat family, CorefResult
Mar 24, 2026
Updated CHANGELOG to reflect breaking changes and new TypedDicts in pythainlp.benchmarks and pythainlp.coref. Added migration notes for users to transition to the new TypedDict structures.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



What do these changes do
Adds TypedDicts to several public APIs that previously returned opaque
dict[str, Any]ordict[str, tuple]types, making keys and value types explicit, IDE-completable, and statically checkable.New TypedDicts in
pythainlp.benchmarks:RougeScore— per-ROUGE-type result withprecision,recall,fmeasurefields. Breaking change:rouge_score()return type changes fromdict[str, tuple[float, float, float]]todict[str, RougeScore].CharLevelStat,WordLevelStat,GlobalStat,TokenizationStat— typed wrapper for the nested dict fromcompute_stats(). The global-level key is"global_"(trailing underscore avoids the Python reserved word). Class-basedTypedDictsyntax throughout.New TypedDicts in
pythainlp.coref:CorefResult(renamed from internalCorefResultDict) — now exported;coreference_resolution()return type tightened fromlist[dict[str, Any]]tolist[CorefResult].Naming conventions:
CharLevelStat, notCharLevelStats) — these are single result records, consistent with existingBleuScore/RougeScoreand NLTK'sChunkScoreconvention.correctly_tokenized_words,tokenization_indicators,_find_words_correctly_tokenized.What was wrong
Several public functions returned opaque dict types whose structure was only discoverable by reading source code. Additionally, the
TokenizationStatsTypedDict required the functionalTypedDict(...)form because"global"is a Python reserved keyword, making it awkward to use directly. Field names used British spelling (tokenised,tokenisation) inconsistently with the rest of the codebase.How this fixes it
CharLevelStat,WordLevelStat,GlobalStat,TokenizationStat) with the global-level key renamed to"global_", allowing the standard class-basedTypedDictsyntax.correctly_tokenized_words,tokenization_indicators,_find_words_correctly_tokenized.word_tokenization.py,cli/benchmark.py, tests, and CHANGELOG migration notes.Your checklist for this pull request
📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.