Skip to content

Improve Character emulation #10113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

zbynek
Copy link
Collaborator

@zbynek zbynek commented Apr 6, 2025

Fixes #9705
Fixes #1989 (not 100% of methods are covered, but some are just impossible without bundling unicode database into compiled code)

Implements several APIs using https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape that should be well supported in all browsers.

The implementation is mostly similar to https://groups.google.com/g/google-web-toolkit-contributors/c/73-aScAShs4/m/gZAhlUXiBAAJ , but the fallbacks for old Edge versions are not included.

@zbynek zbynek marked this pull request as draft April 6, 2025 22:14
@zbynek zbynek requested a review from Copilot April 6, 2025 22:25
Copilot

This comment was marked as resolved.

@zbynek zbynek force-pushed the character-emul branch 6 times, most recently from 051a0d0 to 1ab960c Compare April 13, 2025 06:33
@zbynek zbynek force-pushed the character-emul branch 2 times, most recently from d3a8f64 to 1ad6bc2 Compare April 16, 2025 00:20
@zbynek zbynek marked this pull request as ready for review April 26, 2025 11:51
Comment on lines 252 to 253
// Known differences between Java 17 and Chrome 135
// 11f50 .. 11f59, 16ac0 .. 16ac9, 1e4f0 .. 1e4f9, 1fbf0 .. 1fbf9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave this method out if it is still wrong? Which side do we consider to be "wrong" here, Java or Chrome? Would it make sense to have a test (possibly ignored) that shows this, so it is easier to reevaluate later?

Also, consider using hex in digit() above, so that it is the same convention as here, or change this to be decimal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of this comment I should just link to the compatibility table https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Character.html#conformance
and mention that JRE behaviour on any Java < 24 won't match recent browser releases.


/**
* Tests for java.lang.Character.
*/
@DoNotRunWith(Platform.HtmlUnitBug)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the specific failing tests be annotated with this, each with a comment about why it fails, rather than skip existing tests that pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I's pretty much all the tests that call any Character.is* method. They fail because the regexp doesn't even parse in HtmlUnit 2-4. If we want any meaningful test coverage we need #10115 + a custom build of HtmlUnit that includes mozilla/rhino#1848

}

// If String.toUpperCase produces more than 1 codepoint, Character.toUpperCase should
// act either as identity or title-case conversion (not supported in GWT).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a failing test and mark it ignored so we can see about restoring it when it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants