Add TypedDicts for complex return dicts: RougeScore, TokenizationStat family, CorefResult by Copilot · Pull Request #1368 · PyThaiNLP/pythainlp

Copilot · 2026-03-24T12:30:03Z

What do these changes do

Adds TypedDicts to several public APIs that previously returned opaque dict[str, Any] or dict[str, tuple] types, making keys and value types explicit, IDE-completable, and statically checkable.

New TypedDicts in pythainlp.benchmarks:

RougeScore — per-ROUGE-type result with precision, recall, fmeasure fields. Breaking change: rouge_score() return type changes from dict[str, tuple[float, float, float]] to dict[str, RougeScore].
CharLevelStat, WordLevelStat, GlobalStat, TokenizationStat — typed wrapper for the nested dict from compute_stats(). The global-level key is "global_" (trailing underscore avoids the Python reserved word). Class-based TypedDict syntax throughout.

New TypedDicts in pythainlp.coref:

CorefResult (renamed from internal CorefResultDict) — now exported; coreference_resolution() return type tightened from list[dict[str, Any]] to list[CorefResult].

Naming conventions:

Singular form throughout (CharLevelStat, not CharLevelStats) — these are single result records, consistent with existing BleuScore/RougeScore and NLTK's ChunkScore convention.
US spelling used consistently: correctly_tokenized_words, tokenization_indicators, _find_words_correctly_tokenized.

What was wrong

Several public functions returned opaque dict types whose structure was only discoverable by reading source code. Additionally, the TokenizationStats TypedDict required the functional TypedDict(...) form because "global" is a Python reserved keyword, making it awkward to use directly. Field names used British spelling (tokenised, tokenisation) inconsistently with the rest of the codebase.

How this fixes it

Introduces four TypedDicts (CharLevelStat, WordLevelStat, GlobalStat, TokenizationStat) with the global-level key renamed to "global_", allowing the standard class-based TypedDict syntax.
Renames all fields and the internal helper function to use US spelling: correctly_tokenized_words, tokenization_indicators, _find_words_correctly_tokenized.
All usages updated across word_tokenization.py, cli/benchmark.py, tests, and CHANGELOG migration notes.

Your checklist for this pull request

Passed code styles and structures
Passed code linting checks and unit test

📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

@overload

…port from public APIs - metrics.py: add RougeScore TypedDict (precision/recall/fmeasure); rouge_score() now returns dict[str, RougeScore] instead of dict[str, tuple[float, float, float]] (breaking change) - word_tokenization.py: add CharLevelStats, WordLevelStats, GlobalStats, TokenizationStats TypedDicts; update compute_stats() return type; _flatten_result() uses @overload to accept both TokenizationStats and generic Mapping without a cast - benchmarks/__init__.py: export RougeScore, CharLevelStats, WordLevelStats, GlobalStats, TokenizationStats - _fastcoref.py: rename CorefResultDict -> CorefResult; alias the fastcoref.modeling.CorefResult import as FastCorefResult to avoid clash - coref/__init__.py: export CorefResult - coref/core.py: update return type and fallback return to use CorefResult - tag/named_entity.py: improve EntitySpan docstring - tests/extra/testx_benchmarks.py: update rouge tests for named-field access; add test_rouge_score_return_type and test_compute_stats_return_type - CHANGELOG.md: add migration notes for all TypedDicts Co-authored-by: bact <128572+bact@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/de0a9df1-25ba-4e89-a047-7ee0664753c1

- CharLevelStats → CharLevelStat (singular, matches BleuScore/RougeScore) - WordLevelStats → WordLevelStat - GlobalStats → GlobalStat - TokenizationStats → TokenizationStat; converted from functional TypedDict form to class form now that global_ is a valid identifier - "global" key → "global_" in compute_stats() return value - Update __init__.py __all__ and imports - Update tests: import names, assertIn("global_"), typed annotations - Update CHANGELOG migration notes Co-authored-by: bact <128572+bact@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/d0fa4ca1-06a6-48cc-b353-4df24554e700

bact

@copilot Use US-spelling for names

pythainlp/benchmarks/word_tokenization.py

- WordLevelStat field: correctly_tokenised_words → correctly_tokenized_words - GlobalStat field: tokenisation_indicators → tokenization_indicators - Private helper: _find_words_correctly_tokenised → _find_words_correctly_tokenized - All call sites in compute_stats(), cli/benchmark.py, and tests updated - CHANGELOG migration example updated Co-authored-by: bact <128572+bact@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/e8eff276-0f77-4d6f-9d4e-d9787e6f414b

Copilot · 2026-03-24T13:11:07Z

@copilot Use US-spelling for names

Done in 8b8e174. Changed tokenised→tokenized, tokenisation→tokenization, and _find_words_correctly_tokenised→_find_words_correctly_tokenized across word_tokenization.py, cli/benchmark.py, tests, and the CHANGELOG example.

Updated CHANGELOG to reflect breaking changes and new TypedDicts in pythainlp.benchmarks and pythainlp.coref. Added migration notes for users to transition to the new TypedDict structures.

sonarqubecloud · 2026-03-25T16:03:43Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Initial plan

8999b8c

Copilot AI assigned Copilot and bact Mar 24, 2026

Copilot started work on behalf of bact March 24, 2026 12:30 View session

Copilot AI changed the title ~~[WIP] Add TypedDicts for complex dictionaries with docstrings~~ Add TypedDicts for complex return dicts: RougeScore, TokenizationStats family, CorefResult Mar 24, 2026

Copilot AI requested a review from bact March 24, 2026 12:51

Copilot finished work on behalf of bact March 24, 2026 12:51

Copilot started work on behalf of bact March 24, 2026 12:53 View session

Copilot AI force-pushed the copilot/add-typed-dicts-for-complex-dictionaries branch from 61f0c8f to a1863d4 Compare March 24, 2026 12:57

Copilot finished work on behalf of bact March 24, 2026 12:58

bact reviewed Mar 24, 2026

View reviewed changes

pythainlp/benchmarks/word_tokenization.py Outdated Show resolved Hide resolved

pythainlp/benchmarks/word_tokenization.py Outdated Show resolved Hide resolved

Copilot started work on behalf of bact March 24, 2026 13:08 View session

Copilot AI changed the title ~~Add TypedDicts for complex return dicts: RougeScore, TokenizationStats family, CorefResult~~ Add TypedDicts for complex return dicts: RougeScore, TokenizationStat family, CorefResult Mar 24, 2026

Copilot AI requested a review from bact March 24, 2026 13:12

Copilot finished work on behalf of bact March 24, 2026 13:13

bact marked this pull request as ready for review March 24, 2026 13:15

bact added benchmark benchmarking tools in the libarary refactoring a technical improvement which does not add any new features or change existing features. labels Mar 24, 2026

bact added this to PyThaiNLP Mar 24, 2026

bact added this to the 5.3.3 milestone Mar 24, 2026

Revise CHANGELOG for new TypedDicts and breaking changes

c80c228

Updated CHANGELOG to reflect breaking changes and new TypedDicts in pythainlp.benchmarks and pythainlp.coref. Added migration notes for users to transition to the new TypedDict structures.

bact merged commit a4296cb into dev Mar 25, 2026
24 checks passed

github-project-automation bot moved this to Done in PyThaiNLP Mar 25, 2026

bact deleted the copilot/add-typed-dicts-for-complex-dictionaries branch March 25, 2026 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TypedDicts for complex return dicts: RougeScore, TokenizationStat family, CorefResult#1368

Add TypedDicts for complex return dicts: RougeScore, TokenizationStat family, CorefResult#1368
bact merged 5 commits intodevfrom
copilot/add-typed-dicts-for-complex-dictionaries

Copilot AI commented Mar 24, 2026 •

edited

Loading

Uh oh!

bact left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do

What was wrong

How this fixes it

Your checklist for this pull request

Uh oh!

bact left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 25, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 24, 2026 •

edited

Loading