fix: Windows-compatible path handling in JS/Go parsers#46
Conversation
Three Windows-only failures preventing OpenAnt from running correctly on Windows. Ports the path/encoding fixes from #3 with a more nuanced Unicode strategy. 1. ts-morph backslash interpretation. path.relative() and path.resolve() produce backslash-separated paths on Windows; ts-morph treats those backslashes as escape characters when matching paths it has already added, so the JS analyzer silently emitted 0 functions. Normalise to forward slashes via a toPosixPath() helper before handing paths to ts-morph and before storing them in functionId values. 2. CRLF-tainted file lists. The --files-from consumer split on \n only, leaving a trailing \r on every path when the list was written on Windows. Switch to a /\r?\n/ split with per-line trim(). 3. cp1252 console crashes. The "Pipeline Test" status output used the Unicode glyphs U+2713 / U+2717 / U+2192, which crash on cp1252-encoded stdouts (the Windows default). Detect at import time whether stdout can encode those symbols and fall back to ASCII (OK / FAIL / ->) only when it cannot, so UTF-8 terminals keep the prettier output. Also remove the now-stale Windows xfail/skip markers from tests/ test_js_parser.py and tests/test_go_cli.py — the underlying bugs are fixed, so those guards no longer apply. Adds tests/test_windows_path_handling.py covering all three fixes: - ts-morph accepts a backslash-only file list and produces non-empty POSIX-form functionId values - the analyzer accepts a CRLF-terminated file list with a trailing blank line - both pipeline modules pick the ASCII fallback under a cp1252 stdout and the Unicode form under a UTF-8 stdout Refs knostic#16 (item 8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous review (click to expand)🤖 Automated Claude Code ReviewFindingsF1 -- Critical bug: 14 print calls in both test_pipeline.py files emit literal {SYM_FAIL} / {SYM_OK} / {SYM_ARROW} strings instead of the variable values This is the most significant defect in the PR. A number of Affected sites in
The same 7 patterns appear in All 14 must be f-strings: The These regressions are not caught by the new tests because the tests only inspect F2 -- The PR adds A secondary issue: lines ~740 also call Fix: add F3 --
Additionally, Recommended fix: normalise at construction time: constructor(repoPath) {
this.repoPath = toPosixPath(path.resolve(repoPath));
...
}F4 -- The test constructs a backslash path by replacing every abs_path = str(src / "module.js") # /tmp/pytest-xxx/repo/src/module.js
backslash_path = abs_path.replace("/", "\\") # \tmp\pytest-xxx\repo\src\module.jsOn Linux/macOS, Node's If the Node/npm skip guard fires (no The PR description explicitly claims these tests are "runnable on any platform." That claim is incorrect for this test as written. Options:
F5 -- After The same issue affects Fix: add F6 -- Unicode detection fails silently when stdout is piped or redirected on UTF-8 systems When Python stdout is redirected (piped to a file, log aggregator, or subprocess), Additionally, the if not encoding:
return FalseIf degrading to ASCII for piped/redirected output is the intended behaviour, the docstring should say so explicitly. F7 -- The UTF-8 branch test (lines 862-884) loads only F8 -- The removed code included: if "UnicodeEncodeError" in result.stderr:
if sys.platform == "win32":
pytest.skip(...)
else:
pytest.fail("UnicodeEncodeError from JS parser on non-Windows (unexpected regression)")After the PR, if a The Windows-specific if result.returncode != 0 and "UnicodeEncodeError" in result.stderr:
pytest.fail("UnicodeEncodeError from JS parser (unexpected regression)")F9 -- When a Windows UNC path Positive notes
|
…h gaps, unicode clarity F1: fix 7 broken non-f-string print calls in go/test_pipeline.py (SYM_FAIL, SYM_OK, SYM_ARROW literals emitted verbatim) and 7 matching regressions in javascript/test_pipeline.py. status/marker variables now use the SYM_* variables directly instead of wrapping them in string literals. F2: apply toPosixPath to extractSingleFunction's filePath and to the nested requiredProject.addSourceFileAtPath resolvedPath so single-function CLI mode normalises Windows paths the same way batch mode does. F3: normalise this.repoPath in the TypeScriptAnalyzer constructor via toPosixPath(path.resolve(repoPath)) so all later path.relative / path.join calls operate on a consistent forward-slash base. F6: replace the `or ""` encoding fallback in _stdout_supports_unicode() with an explicit `if not encoding: return False` guard in both pipeline files, and add a docstring noting the piped/redirected-stdout behaviour. F9: add UNC path behaviour note to toPosixPath JSDoc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
F4: gate test_typescript_analyzer_accepts_backslash_paths to Windows only. Replacing all forward slashes in a POSIX absolute path produces a non-absolute path on Linux/macOS, making the test fail on those platforms. F5: pop freshly-loaded cp1252 and utf-8 pipeline modules from sys.modules in finally blocks (pipeline_module fixture and unicode test) to prevent stale module-level state leaking into later tests. F7: parametrize test_pipeline_uses_unicode_when_stdout_supports_it to cover both the JS and Go pipeline modules, closing the coverage gap for the Go Unicode code path. F8: restore UnicodeEncodeError regression guard in test_go_cli.py test_parse_js_repo. The Windows-specific pytest.skip was correct to remove (bug is fixed), but the non-Windows pytest.fail provides a diagnostic on future regressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 Automatically written by ClaudeThanks for the thorough review. All 9 findings addressed across two commits:
All 16 tests pass (12 skipped — Go binary not built, as expected in this environment). |
Previous review (click to expand)🤖 Automated Claude Code ReviewFindingsThis is a follow-up review. F1 through F9 from the previous review are addressed; new findings continue from F10. Status of previous findings:
F10 —
The comment "Remove the freshly loaded module so its stale cp1252-patched module-level state doesn't leak into later tests" is therefore misleading; the cleanup it describes does not happen. To make the fix effective, the module must first be registered: add Note: the practical consequence is small because the module-level F11 — Double The test now has two independent @pytest.mark.skipif(
sys.platform != "win32",
reason="...",
)
@pytest.mark.skipif(
not shutil.which("node") or not JS_NODE_MODULES.exists(),
reason="Node.js or JS parser npm dependencies not available",
)
def test_typescript_analyzer_accepts_backslash_paths(tmp_path):When running on Linux/macOS, the first decorator skips the test with the platform reason. That is the intended behaviour. However, pytest evaluates decorators from bottom to top (inner to outer at runtime), and multiple The concern is discoverability in CI output: on a non-Windows CI agent that also lacks Node, the skip message will say "backslash path simulation only meaningful on Windows" rather than "Node.js or JS parser npm dependencies not available", hiding the missing-dependency condition. More importantly, on Windows without Node installed, the platform guard is This is a minor readability/diagnostic issue rather than a correctness bug, but it is actionable: combining the two conditions into one @pytest.mark.skipif(
sys.platform != "win32"
or not shutil.which("node")
or not JS_NODE_MODULES.exists(),
reason="Windows only (backslash path test) and requires Node.js + JS parser npm deps",
)F12 — Lines 523-527: if (!fs.existsSync(filePath)) {
console.error(`File not found: ${filePath}`);
process.exit(1);
}
const normalisedFilePath = toPosixPath(path.resolve(filePath));
However, there is a subtle inconsistency: the error message reports const normalisedFilePath = toPosixPath(path.resolve(filePath));
if (!fs.existsSync(normalisedFilePath)) {
console.error(`File not found: ${normalisedFilePath}`);
process.exit(1);
}This also catches a (unlikely) edge case: if F13 — The parametrised Since A clean fix: check before inserting ( F14 — Lines 250-253: assert mod.SYM_OK == "✓"
assert mod.SYM_FAIL == "✗"
assert mod.SYM_ARROW == "→"The test file does not declare a An alternative assertion form — Positive notes
|
…istency, sys.path dedup, assertion safety F10: register module in sys.modules inside _load_pipeline_module so that pre/post sys.modules.pop() calls in the fixture and unicode test are meaningful and prevent stale module-level state from leaking. F11: combine double @pytest.mark.skipif decorators on test_typescript_analyzer_accepts_backslash_paths into a single condition so the skip reason is always accurate and neither guard can mask the other. F12: move toPosixPath/path.resolve above the fs.existsSync check in extractSingleFunction so the existence check and error message both use the normalised path, consistent with the rest of the function. F13: guard sys.path.insert with already_on_path check in both the pipeline_module fixture and test_pipeline_uses_unicode_when_stdout_supports_it to prevent duplicate path entries across parametrized test runs; only remove from sys.path if we added it. F14: replace literal Unicode glyph strings in assertions (✓ ✗ →) with \u-escape sequences so pytest assertion error messages remain safe on cp1252 consoles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 Automatically written by ClaudeAll 5 new findings from iteration 2 addressed in commit 22e6b56:
|
🤖 Automated Claude Code ReviewFindingsThis is a follow-up review. F1 through F14 from the previous two reviews are addressed; new findings continue from F15. Status of previous findings (F10–F14):
F15 — The F13 fix guards against the test harness inserting a duplicate # Line 71 of both parsers/javascript/test_pipeline.py and parsers/go/test_pipeline.py:
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
Trace for the first parametrized test call (e.g., cp1252 + javascript):
On the second call (cp1252 + go):
Each subsequent test call adds one more copy. After 4 parametrized tests, A correct fix requires either (a) snapshotting path_snapshot = list(sys.path)
try:
mod = _load_pipeline_module(mod_name, path)
...
finally:
sys.path[:] = path_snapshot
sys.modules.pop(mod_name, None)F16 — The F10 fix registers the module in sys.modules[name] = module
try:
spec.loader.exec_module(module)
except BaseException:
try:
del sys.modules[name]
except KeyError:
pass
raiseThe current sys.modules[name] = mod
spec.loader.exec_module(mod) # if this raises, sys.modules[name] is a partial module
return modIf The callers' The fix is a three-line change inside sys.modules[name] = mod
try:
spec.loader.exec_module(mod)
except BaseException:
sys.modules.pop(name, None)
raise
return modF17 — Trailing-whitespace misalignment in The three new assertion lines have inconsistent alignment before the inline comments: assert mod.SYM_OK == "✓" # CHECK MARK (U+2713) ← 4 spaces
assert mod.SYM_FAIL == "✗" # BALLOT X (U+2717) ← 2 spaces
assert mod.SYM_ARROW == "→" # RIGHTWARDS ARROW (U+2192) ← 2 spaces
Positive notes
|
…rollback, comment alignment F15: snapshot and restore sys.path around exec_module in _load_pipeline_module so that module-level sys.path.insert calls in the pipeline files do not accumulate across parametrized test runs. F16: roll back sys.modules[name] on exec_module failure (CPython importlib convention) so a failed load does not leave a partially-initialized module poisoning the cache. F17: fix inconsistent inline comment spacing on the three \u-escape assertion lines — normalise to 2 spaces before # for all three. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 Automatically written by ClaudeAll 3 final findings from iteration 3 addressed in commit fb31f2a:
|
Manual verification (Windows-specific)
|
Local test resultsRan the JS pipeline on Windows against Commands run: Outcome:
Sample unit ids (note POSIX separators): |
ar7casper
left a comment
There was a problem hiding this comment.
Nice fix and very disciplined scoping (excluding the venv-binary path PR is the right call).
Two non-blocking observations before merge:
1. Test plan accuracy: The description says the new test file is "runnable on any
platform," but test_typescript_analyzer_accepts_backslash_paths is skipif sys.platform != "win32" (correctly — backslash absolute paths aren't meaningful on POSIX). So on
Linux/macOS CI, only 2 of 3 fixes get behavioral coverage. Worth either tightening the
description, or — more useful — adding a unit test that calls toPosixPath() directly with
a backslash input and asserts the result, so the normalisation function itself is verified
everywhere even if the end-to-end ts-morph test stays Windows-only.
2. Static regression scanner parity with #45: PR #45 introduced regex-based scanners
(test_no_bare_open, test_no_bare_pathlib_text_io, test_no_bare_text_mode_subprocess)
that prevent contributors from reintroducing the antipattern. This PR only ships
behavioral tests — they catch the three known sites but won't catch a new
path.relative() call later that forgets toPosixPath(), or a new .split('\n') on a file
list, or a fresh print("✓ ...") somewhere. Not a blocker (current behavioral tests are
real protection), but consider either (a) a comment block at toPosixPath stating the
contract for contributors, or (b) a follow-up issue for a lint rule. Worth flagging for
parity with the #45 pattern.
…and contributor contract Add test_to_posix_path_normalises_backslashes, a node-only (not Windows-only) test that calls the toPosixPath logic inline and asserts backslash→slash conversion. This gives Linux/macOS CI behavioral coverage of the normalisation function even though the end-to-end ts-morph test stays Windows-only. Update the module docstring to accurately describe per-platform coverage. Expand the toPosixPath JSDoc with an explicit contributor contract: every path bound for ts-morph or a functionId must go through toPosixPath(), and skipping it silently produces an empty result on Windows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds test_no_bare_path_calls_in_typescript_analyzer, a cross-platform scanner that greps typescript_analyzer.js for any path.relative/resolve/join() call not accompanied by toPosixPath() within a ±6-line window. This mirrors the PR knostic#45 pattern (test_no_bare_open, test_no_bare_pathlib_text_io, etc.) that prevents contributors from reintroducing encoding antipatterns. Scoped to typescript_analyzer.js only — other JS parser files legitimately call path.X() without toPosixPath() since they do not interact with ts-morph. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the review! Both points addressed in 1. Test plan accuracy / cross-platform coverage of
2. Static regression scanner — parity with #45 Did both (a) and (b):
Sanity-checked the scanner by injecting a fake bare |
|
ok @ar7casper I think this one is ok now |
ReviewSolid, well-scoped fix. The three Windows failures are each diagnosed precisely (ts-morph's backslash-as-escape behaviour, CRLF leftover from One Medium and three Lows below. Nothing blocking. Product alignment
🟡 Medium — Testing ·
|
| Severity | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 1 |
| Low | 3 |
Verdict: Acceptable to merge. The Medium is a real testing-quality gap (reimplementing the function under test) but the surrounding suite is strong enough that the actual regression risk is low — the static scanner, CRLF round-trip, and (on Windows) the backslash-paths E2E test all exercise toPosixPath indirectly. Lows are easy follow-ups.
| # Inline the same logic used in typescript_analyzer.js. We use \x5c | ||
| # (the hex escape for backslash) in the JS literal to avoid any | ||
| # shell or Python string-escaping ambiguity across platforms. | ||
| script = r"function toPosixPath(p){return p.split('\x5c').join('/');} console.log(toPosixPath('C:\x5cUsers\x5cfoo\x5cbar.js'));" |
There was a problem hiding this comment.
🟡 Medium — Testing · This test doesn't actually exercise toPosixPath from typescript_analyzer.js
Issue: The line below reimplements the function inline as a separate JS literal:
function toPosixPath(p){return p.split('\x5c').join('/');}The live implementation in typescript_analyzer.js:53-55 is p.replace(/\\/g, "/"). Two different code paths that happen to produce the same output for this input. A regression in the live regex (someone drops the g flag, changes it to ^\\, etc.) wouldn't be caught — this test would still pass green.
Suggestion: Either (a) export toPosixPath from typescript_analyzer.js:
module.exports = { TypeScriptAnalyzer, toPosixPath };and have the test invoke it via a small Node bootstrap, or (b) extract the function into a tiny shared path_utils.js module that both the analyzer and the test consume. The current shape is essentially testing a reimplementation, not the bug fix.
There was a problem hiding this comment.
Went with option (b): extracted toPosixPath into a new path_utils.js module (no ts-morph dependency, so node_modules is not required to test it). typescript_analyzer.js now imports from there via require("./path_utils") instead of defining the function inline. The test require()s path_utils.js directly and calls the live implementation — a regression in the actual regex is now caught on every platform.
| # --------------------------------------------------------------------------- | ||
|
|
||
| # Match path.relative/resolve/join call sites (non-method-name contexts). | ||
| _BARE_PATH_CALL_RE = re.compile(r"\bpath\.(relative|resolve|join)\s*\(") |
There was a problem hiding this comment.
🟢 Low — Architecture · Static scanner only catches three of the unsafe path.* methods
Issue: This regex covers relative|resolve|join — exactly the three methods used today — but a future contributor adding path.dirname(), path.normalize(), or path.format() before handing the result to ts-morph would silently break Windows again. There's already one path.dirname(normalisedFilePath) call at typescript_analyzer.js:737 — it works only because the input is already normalised, which is a fragile invariant this scanner doesn't enforce.
Suggestion: Either expand the regex to catch every path.*( call (with allow-listing for safe methods like path.isAbsolute, path.basename), or add an explicit one-line note in the test docstring above: "This list intentionally omits dirname/basename/normalize because they preserve the input's separator style — the precondition is that the input itself is already normalised. If you add a path method that changes separators, add it here."
There was a problem hiding this comment.
Added a comment block above _BARE_PATH_CALL_RE explaining the exclusion rationale: dirname, basename, normalize, and isAbsolute preserve the separator style of their input rather than producing new separators, so they are safe as long as the input is already normalised. The comment also flags that any future method that does produce new separators (e.g. path.format) should be added to the regex.
| * extracted). Always normalise to forward slashes before handing paths | ||
| * to ts-morph or storing them as functionId components. | ||
| * | ||
| * UNC paths (`\\server\share\...`) are correctly converted to |
There was a problem hiding this comment.
🟢 Low — Documentation · UNC path claim is documented but not tested
Issue: This docstring promises that \\server\share\... becomes //server/share/..., but no test in test_windows_path_handling.py exercises that path shape. The behaviour is correct today purely as an emergent property of the global replace(/\\/g, "/") — but if someone refactors the regex to a more "specific" form (e.g. replace(/^\\\\/, "//").replace(/\\/g, "/")), UNC handling could silently change without test failure. That's exactly the kind of regression the rest of this PR's testing infrastructure exists to prevent.
Suggestion: Add a single-line check in test_to_posix_path_normalises_backslashes:
assert toPosixPath("\\\\server\\share\\foo.js") == "//server/share/foo.js"There was a problem hiding this comment.
Added a UNC assertion to the same test: \\server\share\foo.js → //server/share/foo.js. Both assertions run in a single node -e invocation via JSON.stringify.
…l function, add UNC + scanner note
Medium: Extract toPosixPath() into path_utils.js (no ts-morph dependency) so the
cross-platform unit test can require() and call the actual implementation rather than
an inlined reimplementation. A regression in the live regex (dropped `g` flag, wrong
pattern, etc.) is now caught on any platform. typescript_analyzer.js replaces the
inline definition with require("./path_utils").
Low (UNC): Add a UNC path assertion to test_to_posix_path_normalises_backslashes:
\\server\share\foo.js -> //server/share/foo.js, covering the contract documented in
the path_utils.js JSDoc.
Low (scanner): Add a docstring note above _BARE_PATH_CALL_RE explaining why
dirname/basename/normalize/isAbsolute are intentionally excluded — they preserve
separator style rather than producing new separators, so they are safe as long as
their input is already normalised.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks — all three actionable points addressed in Medium — test reimplements instead of calling real function Went with option (b): extracted Low — UNC path undocumented/untested Added a UNC assertion to the same test: Low — scanner regex excludes Added a comment block above |
Summary
Three Windows-only failures preventing OpenAnt from running correctly on Windows:
path.relative()produces backslashes on Windows; ts-morph treats them as escape characters when matching paths it has already added, so the JS parser silently emits 0 files. Fix: convert separators to forward slashes via atoPosixPath()helper before handing paths to ts-morph, and use POSIX-formrelativePathwhen buildingfunctionIdvalues.\r\n-tainted file lists — splitting a newline-delimited list on Windows leaves a trailing\ron every path, whichaddSourceFileAtPaththen fails to resolve. Fix: split the--files-fromcontent on/\r?\n/and.trim()each line.✓ ✗ →, which crash on cp1252-encoded stdouts (the Windows default). Fix: detect at import time whethersys.stdout.encodingcan encode those glyphs and fall back to ASCII (OK/FAIL/->) only when it cannot. UTF-8 terminals keep the prettier Unicode output — this is more nuanced than an unconditional ASCII downgrade.Stale Windows
xfail/skipmarkers intests/test_js_parser.pyandtests/test_go_cli.pyare removed since the underlying bugs are fixed.These are the parser path/encoding pieces of issue #16 item 8. The companion venv-binary path fix (
Scripts\python.exevsbin/pythoninapps/openant-cli/internal/python/runtime.go) is intentionally not included here — it's a separate concern and worth its own PR.Addresses item 8 from #16 (does not close the issue).
Test plan
New
libs/openant-core/tests/test_windows_path_handling.pycovers all three fixes, runnable on any platform:functionIdvalues that contain only forward slashes.pytest tests/→99 passed, 12 skipped(the 12 skips aretest_go_cli.pycases that need the compiled Go binary; pre-existing).openant parse <ts-repo>on Windows and confirm a non-zero file count is reported.