Skip to content

chore: add codespell support (config, workflow to detect/not fix) and make it fix some typos#87

Open
yarikoptic wants to merge 8 commits intoruc-datalab:mainfrom
yarikoptic:enh-codespell
Open

chore: add codespell support (config, workflow to detect/not fix) and make it fix some typos#87
yarikoptic wants to merge 8 commits intoruc-datalab:mainfrom
yarikoptic:enh-codespell

Conversation

@yarikoptic
Copy link

More about codespell: https://github.com/codespell-project/codespell

I personally introduced it to dozens if not hundreds of projects already and so far only positive feedback.

CI workflow has 'permissions' set only to 'read' so also should be safe.

Changes

  • Configuration in .codespellrc
  • GitHub Actions workflow for CI with problem matcher annotations
  • Fixed 34 files with legitimate typos

Historical context

This project has had 2 prior commits fixing typos manually:

  • c9da457 fix:correct typo in quantize.py
  • e2c4e5c fix: prompt typo for RL stage

Typos fixed in this PR

Fixed typos across the codebase including:

  • Documentation typos (shoud → should, primarly → primarily, etc.)
  • Code typos (assitant → assistant, requirments → requirements, etc.)
  • Variable/function naming (intialize → initialize, advatanges → advantages, etc.)
  • Documentation improvements (envrionment → environment, comparision → comparison, etc.)

Configuration highlights

The configuration excludes:

  • Domain-specific terms (ans, rouge, aci, nd, medias, te, ot, fro, alse, eles)
  • camelCase/PascalCase identifiers (common in code)
  • URLs (to prevent breaking links)
  • Test data files and configuration files with their own codespell configs

yarikoptic and others added 8 commits February 4, 2026 13:54
- Add playground and .cache to skip patterns
- Add *.jsonl to skip (data files)
- Add camelCase/PascalCase regex to ignore code identifiers
- Add domain-specific terms to ignore-words-list: ans, rouge, aci, nd, medias, te

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed typos in error messages and scripts:
- outsid -> outside (docs error message)
- oint -> checkpoint (docs error message, was 'checkpoint' functions)
- interation -> iteration (scripts/multi_coldstart.sh, data directory paths)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add eles to ignore-words-list (variable name for elements)
- Add URL regex pattern to prevent fixing typos in URLs
- Add skip patterns for math.json and setup.cfg files

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed in comment about duplicate calling behavior

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed typos across the codebase (34 files):
- shoud -> should (README.md)
- primarly -> primarily (skyagent/README.md)
- assitant -> assistant (skyagent/agents/base.py)
- requirments -> requirements (skyagent/tasks/general_react/utils.py)
- Hightlight -> Highlight (skyagent/tasks/swebench/utils.py)
- tiemout -> timeout (skyagent/tasks/verifiers/naive_dapo.py)
- feilds -> fields (skyagent/tools/prompt.py)
- occured -> occurred (skyrl-gym/envs/lcb/livecodebench.py)
- accesible -> accessible (skyrl-gym/envs/lcb/livecodebench.py)
- comparision -> comparison (skyrl-gym/envs/lcb/livecodebench.py)
- unkown -> unknown (skyrl-train/import_utils.py)
- explictly -> explicitly (skyrl-train/pyproject.toml, workers/worker.py)
- overriden -> overridden (skyrl-train/docs/configuration/config.rst)
- Continous -> Continuous (skyrl-train/docs/configuration/config.rst)
- atleast -> at least (skyrl-train/docs, entrypoints, inference_engines)
- ore -> or (skyrl-train/docs/tutorials/tools_guide.rst)
- envrionment -> environment (skyrl-train/docs/tutorials/tools_guide.rst)
- intialize -> initialize (skyrl-train/skyrl_train/models.py)
- advatanges -> advantages (skyrl-train/dataset/replay_buffer.py)
- sychronizing -> synchronizing (skyrl-train/distributed/deepspeed_strategy.py)
- vise -> vice (skyrl-train/distributed/ulysses/monkey_patch.py)
- initalized -> initialized (skyrl-train/utils/ppo_utils.py)
- efficent -> efficient (skyrl-train/utils/torch_utils.py)
- divisble -> divisible (skyrl-train/utils/utils.py)
- Gneration -> Generation (skyrl-train/utils/utils.py)
- initalizing -> initializing (skyrl-train/tests/gpu/test_policy_local_engines_e2e.py)
- onl -> only (ms-swift/docs/source/Instruction/常见问题整理.md)
- initilize -> initialize (ms-swift/docs, examples, swift/plugin)
- dimmension -> dimension (ms-swift/examples, swift/plugin)
- everytime -> every time (ms-swift/examples/train/rft/rft.py)
- bewteen -> between (ms-swift/swift/llm/infer/rollout.py)
- loosing -> losing (ms-swift/swift/llm/train/tuner.py)
- Reseach -> Research (example/analysis_on_student_loan/README.md)
- reseach -> research (scripts/multi_rl.sh)

URLs and variable names (eles) were preserved as configured

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This will annotate typos in PR diff views for easier review

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant