Skip to content

Update HtmlUnit to 4.11.1, update necessary dependencies #10115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zbynek
Copy link
Collaborator

@zbynek zbynek commented Apr 13, 2025

Fixes #9958

Related changes need to be merged in tools repo first, the workflow definitions here need to be reverted.

(I was hoping this would help with regex parsing for, #10113 but it's not sufficient.)

Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this - I had tentatively planned on splitting this off until 2.14 (mavenize gwt itself first) and/or split into an external project (via #10058, so that htmlunit could be entirely external, not bring so many other dependencies to the gwt-dev classpath...

What do you think about us fixing 10058 and then offering a separate gwt-runstyle-htmlunit jar that provides -runStyle=htmlunit4, which then drops legacy dev mode support etc, changes the system tz, etc? I'd also love to expose the webClient.options somehow so that downstream projects could enable future polyfills as desired...

webClient.setActiveXObjectMap(Collections.singletonMap(
"GwtLegacyDevModeExceptionOrReturnValue", "java.lang.Object"));
// webClient.setActiveXObjectMap(Collections.singletonMap(
// "GwtLegacyDevModeExceptionOrReturnValue", "java.lang.Object"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're removing this, we should probably remove it outright, and the comment too.

That said, I think legacy dev mode tests work still with HtmlUnit, so this is probably required to keep? Unless we are actually outright removing that feature, in which case there is a lot more to remove too?

@zbynek
Copy link
Collaborator Author

zbynek commented Apr 13, 2025

I don't think maintaining 2 different runstyles based on HtmlUnit is worth it.
Testing so far shows only one problem, localization of dates, and for that it would be hopefully possible to provide some options like the browser timezone/language.

Doing this only after mavenization makes sense.

Btw. would it make sense to also replace Rhino classes for JSNI parsing with those bundled in HtmlUnit? With JSNI being a legacy feature ES6 support is not important, but if it decreases size of the codebase it might be worth trying.

@niloc132
Copy link
Member

Agreed, I don't want to continue to maintain the old one, just to make sure it is there for legacy dev mode until we're committed to removing it. Post-mavenization (or Gradle, if there were suddenly a push to have good Gradle support), I would want to have gwt-dev.jar to continue to exist in roughly its current form for legacy ("still productive and profitable, but without significant development resources") applications to continue to be maintained. Those either prepared to add gwt-runstyle-htmlunit4.jar, or switch outright to the specific jars they actually want (pretend these are gwt-compiler.jar, gwt-test.jar, gwt-jre.jar, etc... plus their desired gwt-runstyle-*.jars) then could have an easy way to track the latest htmlunit versions as they are available. Or, switch to the new jars, and leave out htmlunit entirely if desired, and get none of its classpath (which as of today, still includes jetty 9).

I'm not quite sure what you're asking about rhino - if there was an easy update that could be done for htmlunit to get es5 class, es6 classes etc, they would have already done it. Or are you asking if GWT's custom rhino could be replaced with htmlunit's implementation? If it is the latter, I think the answer is no, as JSNI doesn't actually support all of es3 - the with keyword has been altered to fail:

case TokenStream.WITH: {
// bruce: we don't support this is JSNI code because it's impossible
// to identify bindings even passably well
//
reportError(ts, "msg.jsni.unsupported.with");

JSNI also supports more than es3, the JSNI method reference syntax:

private int jsniMatchReference() throws IOException {
// First, read the type name whose member is being accessed.
if (!jsniMatchQualifiedTypeName('.', ':')) {
return ERROR;
}
// Now we must the second colon.
//
int c = in.read();
if (c != ':') {
in.unread();
reportSyntaxError("msg.jsni.expected.char", new String[] { ":" });
return ERROR;
}
addToString(c);
// Skip whitespace starting after ::.
skipWhitespace();
// Finish by reading the field or method signature.
if (!jsniMatchMethodSignatureOrFieldName()) {
return ERROR;
}
this.string = new String(stringBuffer, 0, stringBufferTop);
return NAME;
}
private boolean jsniMatchParamListSignature() throws IOException {
// Assume the opening '(' has already been read.
// Read param type signatures until we see a closing ')'.
skipWhitespace();
// First check for the special case of * as the parameter list, indicating
// a wildcard
if (in.peek() == '*') {
addToString(in.read());
if (in.peek() != ')') {
reportSyntaxError("msg.jsni.expected.char", new String[] { ")" });
}
addToString(in.read());
return true;
}
// Otherwise, loop through reading one param type at a time
do {
// Skip whitespace between parameters.
skipWhitespace();
int c = in.read();
if (c == ')') {
// Finished successfully.
//
addToString(c);
return true;
}
in.unread();
} while (jsniMatchParamTypeSignature());
// If we made it here, we can assume that there was an invalid type
// signature that was already reported and that the offending char
// was already unread.
//
return false;
}
private boolean jsniMatchParamTypeSignature() throws IOException {
int c = in.read();
switch (c) {
case 'Z':
case 'B':
case 'C':
case 'S':
case 'I':
case 'J':
case 'F':
case 'D':
// Primitive type id.
addToString(c);
return true;
case 'L':
// Class/Interface type prefix.
addToString(c);
return jsniMatchQualifiedTypeName('/', ';');
case '[':
// Array type prefix.
addToString(c);
return jsniMatchParamArrayTypeSignature();
default:
in.unread();
reportSyntaxError("msg.jsni.expected.param.type", null);
return false;
}
}
private boolean jsniMatchParamArrayTypeSignature() throws IOException {
// Assume the leading '[' has already been read.
// What follows must be another param type signature.
//
return jsniMatchParamTypeSignature();
}
private boolean jsniMatchMethodSignatureOrFieldName() throws IOException {
int c = in.read();
// We must see an ident start here.
//
if (!Character.isJavaIdentifierStart((char)c)) {
in.unread();
reportSyntaxError("msg.jsni.expected.identifier", null);
return false;
}
addToString(c);
for (;;) {
c = in.read();
if (Character.isJavaIdentifierPart((char)c)) {
addToString(c);
}
else if (c == '(') {
// This means we're starting a JSNI method signature.
//
addToString(c);
if (jsniMatchParamListSignature()) {
// Finished a method signature with success.
// Assume the callee unread the last char.
//
return true;
}
else {
// Assume the callee reported the error and unread the last char.
//
return false;
}
}
else {
// We don't know this char, so it finishes the token.
//
in.unread();
return true;
}
}
}
/**
* This method is called to match the fully-qualified type name that
* should appear after the '@' in a JSNI reference.
* @param sepChar the character that will separate the Java idents
* (either a '.' or '/')
* @param endChar the character that indicates the end of the
*/
private boolean jsniMatchQualifiedTypeName(char sepChar, char endChar)
throws IOException {
int c = in.read();
// Whether nested or not, we must see an ident start here.
//
if (!Character.isJavaIdentifierStart((char)c)) {
in.unread();
reportSyntaxError("msg.jsni.expected.identifier", null);
return false;
}
// Now actually add the first ident char.
//
addToString(c);
// And append any other ident chars.
//
for (;;) {
c = in.read();
if (Character.isJavaIdentifierPart((char)c)) {
addToString(c);
}
else {
break;
}
}
// Arrray-type reference
while (c == '[') {
if (']' == in.peek()) {
addToString('[');
addToString(in.read());
c = in.read();
} else {
break;
}
}
// We have a non-ident char to classify.
//
if (c == sepChar) {
addToString(c);
if (jsniMatchQualifiedTypeName(sepChar, endChar)) {
// We consumed up to the endChar, so we finished with total success.
//
return true;
} else {
// Assume that the nested call reported the syntax error and
// unread the last character.
//
return false;
}
} else if (c == endChar) {
// Matched everything up to the specified end char.
//
addToString(c);
return true;
} else {
// This is an unknown char that finishes the token.
//
in.unread();
return true;
}
}

@zbynek
Copy link
Collaborator Author

zbynek commented Apr 13, 2025

Or are you asking if GWT's custom rhino could be replaced with htmlunit's implementation? If it is the latter, I think the answer is no

That's what I meant, thanks. The reason I mentioned ES6 is that Rhino 1.8 (and its HtmlUnit fork) enable many ES6 features by default.

@niloc132
Copy link
Member

Oh that's a nice view - possibly worth a look when it gets closer (and there's a lot less red even in the last column), but I think we should be discouraging writing new JSNI, and instead encouraging jsinterop where possible?

Note that GWT doesn't use rhino to emit JS, so we won't get any extra generated JS features out of this, only parsing JSNI expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to HtmlUnit 3.9.0 or 4.1.0
2 participants