Skip to content

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jan 5, 2026

Bumps org.jsoup:jsoup from 1.21.2 to 1.22.1.

Release notes

Sourced from org.jsoup:jsoup's releases.

jsoup Java HTML Parser release 1.22.1

jsoup 1.22.1 is out now, adding support for the re2j regular expression engine for regex-based CSS selectors, a configurable maximum parser depth, and numerous bug fixes and improvements.

jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Download jsoup now.

Improvements

  • Added support for using the re2j regular expression engine for regex-based CSS selectors (e.g. [attr~=regex], :matches(regex)), which ensures linear-time performance for regex evaluation. This allows safer handling of arbitrary user-supplied query regexes. To enable, add the com.google.re2j dependency to your classpath, e.g.:
  <dependency>
    <groupId>com.google.re2j</groupId>
    <artifactId>re2j</artifactId>
    <version>1.8</version>
  </dependency>

(If you already have that dependency in your classpath, but you want to keep using the Java regex engine, you can disable re2j via System.setProperty("jsoup.useRe2j", "false").) You can confirm that the re2j engine has been enabled correctly by calling Regex.usingRe2j(). #2407

  • Added an instance method Parser#unescape(String, boolean) that unescapes HTML entities using the parser's configuration (e.g. to support error tracking), complementing the existing static utility Parser.unescapeEntities(String, boolean). #2396
  • Added a configurable maximum parser depth (to limit the number of open elements on stack) to both HTML and XML parsers. The HTML parser now defaults to a depth of 512 to match browser behavior, and protect against unbounded stack growth, while the XML parser keeps unlimited depth by default, but can opt into a limit via Parser.setMaxDepth(). #2421
  • Build: added CI coverage for JDK 25 #2403
  • Build: added a CI fuzzer for contextual fragment parsing (in addition to existing full body HTML and XML fuzzers). [oss-fuzz #14041](google/oss-fuzz#14041)

Changes

  • Set a removal schedule of jsoup 1.24.1 for previously deprecated APIs.

Bug Fixes

  • Previously cached child Elements of an Element were not correctly invalidated in Node#replaceWith(Node), which could lead to incorrect results when subsequently calling Element#children(). #2391
  • Attribute selector values are now compared literally without trimming. Previously, jsoup trimmed whitespace from selector values and from element attribute values, which could cause mismatches with browser behavior (e.g. [attr=" foo "]). Now matches align with the CSS specification and browser engines. #2380
  • When using the JDK HttpClient, any system default proxy (ProxySelector.getDefault()) was ignored. Now, the system proxy is used if a per-request proxy is not set. #2388, #2390
  • A ValidationException could be thrown in the adoption agency algorithm with particularly broken input. Now logged as a parse error. #2393
  • Null characters in the HTML body were not consistently removed; and in foreign content were not correctly replaced. #2395
  • An IndexOutOfBoundsException could be thrown when parsing a body fragment with crafted input. Now logged as a parse error. #2397, #2406
  • When using StructuralEvaluators (e.g., a parent child selector) across many retained threads, their memoized results could also be retained, increasing memory use. These results are now cleared immediately after use, reducing overall memory consumption. #2411
  • Cloning a Parser now preserves any custom TagSet applied to the parser. #2422, #2423
  • Custom tags marked as Tag.Void now parse and serialize like the built-in void elements: they no longer consume following content, and the XML serializer emits the expected self-closing form. #2425
  • The <br> element is once again classified as an inline tag (Tag.isBlock() == false), matching common developer expectations and its role as phrasing content in HTML, while pretty-printing and text extraction continue to treat it as a line break in the rendered output. #2387, #2439
  • Fixed an intermittent truncation issue when fetching and parsing remote documents via Jsoup.connect(url).get(). On responses without a charset header, the initial charset sniff could sometimes (depending on buffering / available() behavior) be mistaken for end-of-stream and a partial parse reused, dropping trailing content. #2448
  • TagSet copies no longer mutate their template during lazy lookups, preventing cross-thread ConcurrentModificationException when parsing with shared sessions. #2453
  • Fixed parsing of <svg> foreignObject content nested within a <p>, which could incorrectly move the HTML subtree outside the SVG. #2452

Internal Changes

  • Deprecated internal helper org.jsoup.internal.Functions (for removal in v1.23.1). This was previously used to support older Android API levels without full java.util.function coverage; jsoup now requires core library desugaring so this indirection is no longer necessary. #2412

My sincere thanks to everyone who contributed to this release! If you have any suggestions for the next release, I would love to hear them; please get in touch via jsoup discussions, or with me directly.

You can also follow me (@[email protected]) on Mastodon / Fediverse to receive occasional notes about jsoup releases.

Changelog

Sourced from org.jsoup:jsoup's changelog.

1.22.1 (2026-Jan-01)

Improvements

  • Added support for using the re2j regular expression engine for regex-based CSS selectors (e.g. [attr~=regex], :matches(regex)), which ensures linear-time performance for regex evaluation. This allows safer handling of arbitrary user-supplied query regexes. To enable, add the com.google.re2j dependency to your classpath, e.g.:
  <dependency>
    <groupId>com.google.re2j</groupId>
    <artifactId>re2j</artifactId>
    <version>1.8</version>
  </dependency>

(If you already have that dependency in your classpath, but you want to keep using the Java regex engine, you can disable re2j via System.setProperty("jsoup.useRe2j", "false").) You can confirm that the re2j engine has been enabled correctly by calling org.jsoup.helper.Regex.usingRe2j(). #2407

  • Added an instance method Parser#unescape(String, boolean) that unescapes HTML entities using the parser's configuration (e.g. to support error tracking), complementing the existing static utility Parser.unescapeEntities(String, boolean). #2396
  • Added a configurable maximum parser depth (to limit the number of open elements on stack) to both HTML and XML parsers. The HTML parser now defaults to a depth of 512 to match browser behavior, and protect against unbounded stack growth, while the XML parser keeps unlimited depth by default, but can opt into a limit via org.jsoup.parser.Parser#setMaxDepth. #2421
  • Build: added CI coverage for JDK 25 #2403
  • Build: added a CI fuzzer for contextual fragment parsing (in addition to existing full body HTML and XML fuzzers). [oss-fuzz #14041](google/oss-fuzz#14041)

Changes

  • Set a removal schedule of jsoup 1.24.1 for previously deprecated APIs.

Bug Fixes

  • Previously cached child Elements of an Element were not correctly invalidated in Node#replaceWith(Node), which could lead to incorrect results when subsequently calling Element#children(). #2391
  • Attribute selector values are now compared literally without trimming. Previously, jsoup trimmed whitespace from selector values and from element attribute values, which could cause mismatches with browser behavior (e.g. [attr=" foo "]). Now matches align with the CSS specification and browser engines. #2380
  • When using the JDK HttpClient, any system default proxy (ProxySelector.getDefault()) was ignored. Now, the system proxy is used if a per-request proxy is not set. #2388, #2390
  • A ValidationException could be thrown in the adoption agency algorithm with particularly broken input. Now logged as a parse error. #2393
  • Null characters in the HTML body were not consistently removed; and in foreign content were not correctly replaced. #2395
  • An IndexOutOfBoundsException could be thrown when parsing a body fragment with crafted input. Now logged as a parse error. #2397, #2406
  • When using StructuralEvaluators (e.g., a parent child selector) across many retained threads, their memoized results could also be retained, increasing memory use. These results are now cleared immediately after use, reducing overall memory consumption. #2411
  • Cloning a Parser now preserves any custom TagSet applied to the parser. #2422, #2423
  • Custom tags marked as Tag.Void now parse and serialize like the built-in void elements: they no longer consume following content, and the XML serializer emits the expected self-closing form. #2425
  • The <br> element is once again classified as an inline tag (Tag.isBlock() == false), matching common developer expectations and its role as phrasing content in HTML, while pretty-printing and text extraction continue to treat it as a line break in the rendered output. #2387, #2439
  • Fixed an intermittent truncation issue when fetching and parsing remote documents via Jsoup.connect(url).get(). On responses without a charset header, the initial charset sniff could sometimes (depending on buffering / available() behavior) be mistaken for end-of-stream and a partial parse reused, dropping trailing content. #2448
  • TagSet copies no longer mutate their template during lazy lookups, preventing cross-thread ConcurrentModificationException when parsing with shared sessions. #2453
  • Fixed parsing of <svg> foreignObject content nested within a <p>, which could incorrectly move the HTML subtree outside the SVG. #2452

Internal Changes

  • Deprecated internal helper org.jsoup.internal.Functions (for removal in v1.23.1). This was previously used to support older Android API levels without full java.util.function coverage; jsoup now requires core library desugaring so this indirection is no longer necessary. #2412
Commits
  • 8dd66fe [maven-release-plugin] prepare release jsoup-1.22.1
  • d924385 Changelog prep for v1.22.1
  • 0f3100c Bump actions/upload-artifact from 5 to 6 (#2457)
  • cf6ac20 Bump org.apache.maven.plugins:maven-release-plugin from 3.3.0 to 3.3.1 (#2455)
  • 6bef938 Fix parsing of SVG foreignObject in paragraphs
  • 9b1c0fc Bump org.apache.maven.plugins:maven-release-plugin from 3.2.0 to 3.3.0 (#2450)
  • 1415e64 Bump actions/checkout from 5 to 6 (#2451)
  • 0e99fd9 Isolate TagSet copies to prevent shared mutation (#2453)
  • 90019cb Bump com.github.siom79.japicmp:japicmp-maven-plugin from 0.24.2 to 0.25.0 (#2...
  • 9395269 Don't preemptively close
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Summary by CodeRabbit

  • Chores

    • Bumped HTML parser library (jsoup) from 1.21.2 to 1.22.1.
    • Updated the stored checksum to match the new jsoup release.
  • Changelog

    • Added an Unreleased entry noting the jsoup dependency update and checksum change.

✏️ Tip: You can customize this high-level summary in your review settings.

Bumps [org.jsoup:jsoup](https://github.com/jhy/jsoup) from 1.21.2 to 1.22.1.
- [Release notes](https://github.com/jhy/jsoup/releases)
- [Changelog](https://github.com/jhy/jsoup/blob/master/CHANGES.md)
- [Commits](jhy/jsoup@jsoup-1.21.2...jsoup-1.22.1)

---
updated-dependencies:
- dependency-name: org.jsoup:jsoup
  dependency-version: 1.22.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependabot PRs with auto version bumps from dependabot dependencies Pull requests that update a dependency file labels Jan 5, 2026
@dependabot dependabot bot requested a review from a team as a code owner January 5, 2026 13:02
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Bumped the ingest-attachment plugin's org.jsoup:jsoup dependency from 1.21.2 to 1.22.1 and replaced the corresponding license checksum file.

Changes

Cohort / File(s) Summary
Dependency update
plugins/ingest-attachment/build.gradle
Upgraded org.jsoup:jsoup from 1.21.21.22.1.
License checksum files
plugins/ingest-attachment/licenses/jsoup-1.21.2.jar.sha1, plugins/ingest-attachment/licenses/jsoup-1.22.1.jar.sha1
Removed SHA-1 file for jsoup-1.21.2; added SHA-1 for jsoup-1.22.1 (9085290f1032d8bdd4ab0a279d176a4bd9e282ca).
Changelog
CHANGELOG.md
Added dependency bump entry under Unreleased 3.x noting org.jsoup:jsoup version bump.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • prudhvigodithi

Poem

🐰 I hopped through Gradle, light and quick,
Swapped jsoup versions—click, click, click.
Old checksum tucked away, the new one sings,
A tiny nibble of joy that version brings. 🥕

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description lacks the required template sections. It is missing: 'Description' section explaining what the change achieves, 'Related Issues' section with issue number, and incomplete 'Check List' (no checkboxes marked). However, it does include substantial release notes and changelog details from jsoup. Add a 'Description' section explaining the change, include 'Related Issues' with the GitHub issue number being resolved, and complete the 'Check List' by marking applicable checkboxes and confirming Apache 2.0 license compliance.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: bumping org.jsoup:jsoup dependency from version 1.21.2 to 1.22.1 in the specified plugin directory.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c80d71 and 367a9bb.

📒 Files selected for processing (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: dependabot

Comment @coderabbitai help to get the list of available commands and usage tips.

dependabot bot added 2 commits January 5, 2026 13:04
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

❌ Gradle check result for 1f4fffb: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

❌ Gradle check result for 3c80d71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

❌ Gradle check result for 3c80d71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

❌ Gradle check result for 3c80d71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

❌ Gradle check result for 3c80d71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

❌ Gradle check result for 3c80d71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 367a9bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 367a9bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 367a9bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 367a9bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@cwperks
Copy link
Member

cwperks commented Jan 7, 2026

Seeing a lot of this lately:

org.opensearch.cluster.coordination.NodeJoinLeftIT.testClusterStabilityWhenClusterStatePublicationLagsOnShardCleanup

java.lang.AssertionError: Cluster didn't have a node drop
	at __randomizedtesting.SeedInfo.seed([6FE736075EC75D45:FCB23544E7373C09]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertFalse(Assert.java:65)
	at org.opensearch.cluster.coordination.NodeJoinLeftIT.validateNodeDropDueToPublicationLag(NodeJoinLeftIT.java:460)
	at org.opensearch.cluster.coordination.NodeJoinLeftIT.testClusterStabilityWhenClusterStatePublicationLagsOnShardCleanup(NodeJoinLeftIT.java:365)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)

ref: https://build.ci.opensearch.org/job/gradle-check/69731/

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 367a9bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 367a9bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❕ Gradle check result for 367a9bb: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.29%. Comparing base (d866be8) to head (367a9bb).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20368      +/-   ##
============================================
- Coverage     73.30%   73.29%   -0.01%     
- Complexity    71777    71788      +11     
============================================
  Files          5784     5784              
  Lines        328141   328153      +12     
  Branches      47269    47270       +1     
============================================
+ Hits         240531   240535       +4     
+ Misses        68329    68324       -5     
- Partials      19281    19294      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cwperks cwperks merged commit 9ba2492 into main Jan 7, 2026
41 of 53 checks passed
@dependabot dependabot bot deleted the dependabot/gradle/plugins/ingest-attachment/org.jsoup-jsoup-1.22.1 branch January 7, 2026 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependabot PRs with auto version bumps from dependabot dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants