Skip to content

fix(sql): enforce UTF-8 when loading keyword resources #1260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renechoi
Copy link
Contributor

📝 Pull-Request Description

What & Why

Keywords.readLines loaded SQL keyword lists with the JVM’s default charset.
On environments configured for non-UTF-8 encodings (e.g. Windows CP-1252) this silently corrupted any keyword containing non-ASCII characters, leading to parsing errors in templates that rely on those lists.

This patch forces UTF-8 decoding for every /keywords/* resource, guaranteeing identical behaviour on all platforms.


Changes in this PR

Type Module / File Summary
🛠 Bug-fix querydsl-sql/src/main/java/com/querydsl/sql/Keywords.java Passes StandardCharsets.UTF_8 to InputStreamReader, replacing reliance on the default charset.
✅ Test querydsl-sql/src/test/java/com/querydsl/sql/KeywordsEncodingTest.java New regression test that loads a UTF-8 resource (encoding-test) and asserts the content is preserved.
📦 Resource querydsl-sql/src/test/resources/keywords/encoding-test Minimal UTF-8 test asset (SELECT + ÄÖÜ) used by the new unit test.

Compatibility

  • Non-breaking – internal implementation detail only; public API unchanged.

  • Applies uniformly to all dialects that depend on Keywords.


Tests & CI

  • New JUnit test verifies UTF-8 decoding.

  • All existing tests continue to pass locally.

  • Current CI hiccup around easy-jacoco-maven-plugin resolution is unrelated; if desired I can follow up with a version pin or mirror configuration.


Related

  • Inspired by common cross-platform issues with default charset usage (no open upstream ticket).


🤝 Thanks for reviewing!

Keywords.readLines previously relied on the JVM default charset,
which could mis-parse the word list on non-UTF-8 systems (e.g. CP1252).

Changes:
  • Pass StandardCharsets.UTF_8 to InputStreamReader
  • Add KeywordsEncodingTest to guard against regressions

Cross-platform behaviour is now deterministic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant