Conversation
…ation (CWE-22) Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/a3eefdf3-b5f2-4c0c-a887-e8ee039b38bb
Copilot
AI
changed the title
[WIP] Fix path manipulation issue in PyThaiNLP 5.3.2
fix: replace os.path.join with safe_path_join to prevent path manipulation (CWE-22)
Mar 25, 2026
…ng safe_path_join *parts Co-authored-by: bact <128572+bact@users.noreply.github.com> Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/3c4510c4-5975-4c56-93be-a20c10f6b56e
Contributor
Author
Done in 7214d5c. The last two |
bact
approved these changes
Mar 25, 2026
Updated changelog to summarize security improvements and issue #1369.
bact
approved these changes
Mar 25, 2026
|
wannaphong
approved these changes
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



49 path manipulation vulnerabilities (CWE-22) remain in v5.3.2 despite the prior
safe_path_joinintroduction, because many call sites still useos.path.joindirectly.What do these changes do
Replace every remaining
os.path.joincall that constructs a file-system path withsafe_path_join, which canonicalizes viaos.path.abspathand validates the result is within the expected base directory before returning it.What was wrong
safe_path_joinexisted inpythainlp.tools.pathbut was not consistently used. Affected sites spanned module-level path constants, archive extraction logic, and model-loading helpers — any of which could accept tainted input (archive member names, user-supplied model directories) and silently produce a path outside the intended base.How this fixes it
All
os.path.joincalls that produce file paths are replaced withsafe_path_join:pythainlp/corpus/__init__.py_CORPUS_PATHconstantpythainlp/tag/unigram.pypythainlp/tag/perceptron.pypythainlp/parse/transformers_ud.pypythainlp/translate/en_th.py_get_translate_pathreturn valuepythainlp/spell/words_spelling_correction.pyembeddings.npyandvocabulary.txtpathspythainlp/tokenize/crfcut.pypythainlp/corpus/core.pyget_hf_hub, tar/zip member validation, symlink target validationFor archive extraction (Python 3.9–3.11 fallback path), the previous
os.path.join+_is_within_directorytwo-step is replaced with a singlesafe_path_joincall that raises on traversal. Relative symlink targets are resolved by passing the archive-root-relative member dirname and link target as separate*partsdirectly tosafe_path_join, which handles the join, canonicalization, and containment check in one step — eliminating all intermediateos.path.joincalls:Unused
import osstatements removed from files whereoswas only needed foros.path.join.Your checklist for this pull request
Original prompt
⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.