nahsra
diff --git a/.github/workflows/codeql-analysis.yml b/.github/workflows/codeql-analysis.yml
@@ -29,19 +29,14 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        # Override automatic language detection by changing the below list
-        # Supported options are ['csharp', 'cpp', 'go', 'java', 'javascript', 'python']
         language: ['java', 'javascript']
-        # Learn more...
-        # https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#overriding-automatic-language-detection
 
     steps:
     - name: Checkout repository
-      uses: actions/checkout@v3
+      uses: actions/checkout@v4
       with:
-        # We must fetch at least the immediate parents so that if this is
-        # a pull request then we can checkout the head.
-        fetch-depth: 2
+        # Change from default script: Perform a deep clone so origin/main can be found.
+        fetch-depth: 0
 
     # If this run was triggered by a pull request event, then checkout
     # the head of the pull request instead of the merge commit.
@@ -50,7 +45,7 @@ jobs:
 
     # Initializes the CodeQL tools for scanning.
     - name: Initialize CodeQL
-      uses: github/codeql-action/init@v2
+      uses: github/codeql-action/init@v3
       with:
         languages: ${{ matrix.language }}
         # If you wish to specify custom queries, you can do so here or in a config file.
@@ -61,7 +56,7 @@ jobs:
     # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
     # If this step fails, then you should remove it and run the build manually (see below)
     - name: Autobuild
-      uses: github/codeql-action/autobuild@v2
+      uses: github/codeql-action/autobuild@v3
 
     # ℹ️ Command-line programs to run using the OS shell.
     # 📚 https://git.io/JvXDl
@@ -75,5 +70,5 @@ jobs:
     #   make release
 
     - name: Perform CodeQL Analysis
-      uses: github/codeql-action/analyze@v2
+      uses: github/codeql-action/analyze@v3
 
diff --git a/.github/workflows/maven.yml b/.github/workflows/maven.yml
@@ -10,11 +10,13 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
-    - uses: actions/checkout@v3
-    - name: Set up JDK 1.8
-      uses: actions/setup-java@v3
+    - uses: actions/checkout@v4
       with:
-        java-version: 8
+        fetch-depth: 0
+    - name: Set up JDK 11
+      uses: actions/setup-java@v4
+      with:
+        java-version: 11
         distribution: zulu
     - name: Run unit tests
       run: mvn test
diff --git a/.github/workflows/shiftleft-analysis.yml b/.github/workflows/shiftleft-analysis.yml
@@ -15,7 +15,7 @@ jobs:
     permissions:
       security-events: write
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
     # Instructions
     # 1. Setup JDK, Node.js, Python etc depending on your project type
     # 2. Compile or build the project before invoking scan
@@ -35,6 +35,6 @@ jobs:
         # type: python
 
     - name: Upload report
-      uses: github/codeql-action/upload-sarif@v2
+      uses: github/codeql-action/upload-sarif@v3
       with:
         sarif_file: reports
diff --git a/README.md b/README.md
@@ -8,16 +8,27 @@ Another way of saying that could be: It's an API that helps you make sure that c
 
 Throughout the development of the 1.6.x series, we have identified and deprecated a number of features and APIs. All of these deprecated items have been removed in the 1.7.0 release. These changes were all tracked in ticket: https://github.com/nahsra/antisamy/issues/195. Each of the changes are described below:
 
-CssHandler had 2 constructors which dropped the LinkedList<URI> embeddedStyleSheets parameter. Both contructors now create an empty internal LinkedList<URI> and the method getImportedStylesheetsURIList() can be used to get a reference to it, if needed. This feature is rarely used, and in fact direct invocation of these constructors is also rare, so this change is unlikely to affect most users of AntiSamy. When used, normally an empty list is passed in as this parameter value and that list is never used again.
+`CssHandler` had 2 constructors which dropped the `LinkedList<URI> embeddedStyleSheets` parameter. Both constructors now create an empty internal `LinkedList<URI>` and the method `getImportedStylesheetsURIList()` can be used to get a reference to it, if needed. This feature is rarely used, and in fact direct invocation of these constructors is also rare, so this change is unlikely to affect most users of AntiSamy. When used, normally an empty list is passed in as this parameter value and that list is never used again.
 
- * The CssHandler(Policy, LinkedList\<URI\>, List\<String\>, ResourceBundle) was dropped
-   * It was replaced with: CssHandler(Policy, List\<String\>, ResourceBundle)
- * The CssHandler(Policy, LinkedList\<URI\>, List\<String\>, String, ResourceBundle) was dropped
-   * It was replaced with: CssHandler(Policy, List\<String\>, ResourceBundle, String). NOTE: The order of the last 2 parameters to this method was reversed.
+ * The `CssHandler(Policy, LinkedList<URI>, List<String>, ResourceBundle)` signature was dropped
+   * It was replaced with: `CssHandler(Policy, List<String>, ResourceBundle)`
+ * The `CssHandler(Policy, LinkedList<URI>, List<String>, String, ResourceBundle)` signature was dropped
+   * It was replaced with: `CssHandler(Policy, List<String>, ResourceBundle, String)`. NOTE: The order of the last 2 parameters to this method was reversed.
 
  * Support for XHTML was dropped. AntiSamy now only supports HTML. As we believe this was a rarely used feature, we don't expect this to affect many AntiSamy users.
  * XML Schema validation is now required on AntiSamy policy files and cannot be disabled. You must make your policy file schema compliant in order to use it with AntiSamy.
- * The policy directive 'noopenerAndNoreferrerAnchors' is now ON by default. If it is disabled, AntiSamy issues a nag, encouraging you to enable it.
+ * The policy directive `noopenerAndNoreferrerAnchors` is now ON by default. If it is disabled, AntiSamy issues a nag, encouraging you to enable it.
+
+## Output format disclaimers
+Along AntiSamy's upgrade lifecycle the HTML parser dependency had changes, leading into some output differences that may come up depending on the use case. Consider this if you were using certain versions and get different outputs after upgrading.
+
+This can also apply for the output serializer that transforms the internal HTML representation to the final text output of the tool.
+
+## Deprecating support for external stylesheets
+
+The AntiSamy team has decided that supporting the ability to allow embedded remote CSS is dangerous and so we are deprecating this feature and it will be removed in a future release. It is expected that there are very few, if any, users of this feature.
+
+We have added a log WARNing if this feature is invoked. If you are, please disable/remove this feature by switching to the primary `CssScanner` constructor that does not enable this feature.
 
 ## How to Use
 
@@ -33,7 +44,7 @@ First, add the dependency from Maven:
 ```
 
 ### 2. Choosing a base policy file
-Chances are that your site’s use case for AntiSamy is at least roughly comparable to one of the predefined policy files. They each represent a “typical” scenario for allowing users to provide HTML (and possibly CSS) formatting information. Let’s look into the different policy files:
+Chances are that your site’s use case for AntiSamy is at least roughly comparable to one of the predefined policy files. They each represent a "typical" scenario for allowing users to provide HTML (and possibly CSS) formatting information. Let’s look into the different policy files:
 
 1) antisamy-slashdot.xml
 
@@ -56,11 +67,18 @@ We don’t know of a possible use case for this policy file. If you wanted to al
 ### Logging
 AntiSamy now includes the slf4j-simple library for its logging, but AntiSamy users can import and use an alternate slf4j compatible logging library if they prefer. They can also then exclude slf4j-simple if they want to.
 
-WARNING: AntiSamy's use of slf4j-simple, without any configuration file, logs messages in a buffered manner to standard output. As such, some or all of these log messages may get lost if an Exception, such as a PolicyException is thrown. This can likely be rectified by configuring slf4j-simple to log to standard error instead, or use an alternate slf4j logger that does so.
+WARNING: AntiSamy's use of slf4j-simple, without any configuration file, logs messages in a buffered manner to standard output. As such, some or all of these log messages may get lost if an `Exception`, such as a `PolicyException` is thrown. This can likely be rectified by configuring slf4j-simple to log to standard error instead, or use an alternate slf4j logger that does so.
 
 ### 3. Tailoring the policy file
 You may want to deploy AntiSamy in a default configuration, but it’s equally likely that a site may want to have strict, business-driven rules for what users can allow. The discussion that decides the tailoring should also consider attack surface - which grows in relative proportion to the policy file.
 
+Example policies can be adapted and tested based on the requirements for each tag. The supported tag actions that can be specified are:
+- `filter`: remove tags, but keep content.
+- `validate`: keep content as long as it passes rules.
+- `remove`: remove tag and contents.
+- `truncate`: remove tag attributes and all child tags except por its text content if any.
+- `encode`: similar to filter but it encodes the tag for HTML to preserve it as raw text and its children are moved up one level in the hierarchy.
+
 ### 4. Calling the AntiSamy API
 Using AntiSamy is easy. Here is an example of invoking AntiSamy with a policy file:
 
@@ -94,18 +112,19 @@ CleanResults cr = as.scan(dirtyInput, new File(policyFilePath));
 ### 5. Analyzing CleanResults
 The `CleanResults` object provides a lot of useful stuff.
 
- * `getErrorMessages()` - a list of String error messages -- *if this returns 0 that does not mean there were no attacks!*
  * `getCleanHTML()` - the clean, safe HTML output
  * `getCleanXMLDocumentFragment()` - the clean, safe `XMLDocumentFragment` which is reflected in `getCleanHTML()`
+ * `getErrorMessages()` - a list of String error messages -- *if this returns 0 that does not mean there were no attacks!*
+ * `getNumberOfErrors()` - the number of error messages -- *Again, 0 does not mean the input was safe!*
  * `getScanTime()` - returns the scan time in seconds
 
-__Important Note__: There has been much confusion about the `getErrorMessages()` method. The `getErrorMessages()` method does not subtly answer the question "is this safe input?" in the affirmative if it returns an empty list. You must always use the sanitized input and there is no way to be sure the input passed in had no attacks.
+__Important Note__: There has been much confusion about the `getErrorMessages()` method. The `getErrorMessages()` method (nor `getNumberOfErrors()`) does not subtly answer the question "is this safe input?" in the affirmative if it returns an empty list. You must always use the sanitized input and there is no way to be sure the input passed in had no attacks.
 
-The serialization and deserialization process that is critical to the effectiveness of the sanitizer is purposefully lossy and will filter out attacks via a number of attack vectors. Unfortunately, one of the tradeoffs of this strategy is that we don't always know in retrospect that an attack was seen. Thus, the `getErrorMessages()` API is there to help users understand their well-intentioned input meet the requirements of the system, not help a developer detect if an attack was present.
+The serialization and deserialization process that is critical to the effectiveness of the sanitizer is purposefully lossy and will filter out attacks via a number of attack vectors. Unfortunately, one of the tradeoffs of this strategy is that AntiSamy doesn't always know in retrospect that an attack was seen. Thus, the `getErrorMessages()` and `getNumberOfErrors()` APIs are there to help users understand whether their well-intentioned input meets the requirements of the system, not help a developer detect if an attack was present.
 
 ## Other Documentation
 
-Additional documentation is available on this Github project's wiki page: https://github.com/nahsra/antisamy/wiki
+Additional documentation is available on this GitHub project's wiki page: https://github.com/nahsra/antisamy/wiki
 and the OWASP AntiSamy Project Page: https://owasp.org/www-project-antisamy/
 
 ## Contributing to AntiSamy

diff --git a/SECURITY.md b/SECURITY.md
@@ -28,12 +28,16 @@ can understand what needs to be done to fix it.
 
 These are the known CVEs reported for AntiSamy:
 
-* AntiSamy CVE #1 - CVE-2016-10006: XSS Bypass in AntiSamy before v1.5.5 - https://www.cvedetails.com/cve/CVE-2016-10006
-* AntiSamy CVE #2 - CVE-2017-14735: XSS via HTML5 Entities in AntiSamy before v1.5.7 - https://www.cvedetails.com/cve/CVE-2017-14735
-* AntiSamy CVE #3 - CVE-2021-35043: XSS via HTML attributes using &#00058 as replacement for : character before v1.6.4 - https://www.cvedetails.com/cve/CVE-2021-35043
-* AntiSamy CVE #4 - CVE-2022-28367: AntiSamy before 1.6.6 allows XSS via HTML tag smuggling on STYLE content. https://www.cvedetails.com/cve/CVE-2022-28367. NOTE: This release only included a PARTIAL fix.
-* AntiSamy CVE #5 - CVE-2022-29577: AntiSamy before 1.6.7 allows XSS via HTML tag smuggling on STYLE content. - https://www.cvedetails.com/cve/CVE-2022-29577. This is the complete fix to the previous CVE.
+* AntiSamy CVE #1 - CVE-2016-10006: AntiSamy before 1.5.5 allows XSS Bypass - https://nvd.nist.gov/vuln/detail/CVE-2016-10006
+* AntiSamy CVE #2 - CVE-2017-14735: AntiSamy before 1.5.7 allows XSS via HTML5 Entities - https://nvd.nist.gov/vuln/detail/CVE-2017-14735
+* AntiSamy CVE #3 - CVE-2021-35043: AntiSamy before 1.6.4 allows XSS via HTML attributes using &#00058 as replacement for : character - https://nvd.nist.gov/vuln/detail/CVE-2021-35043
+* AntiSamy CVE #4 - CVE-2022-28367: AntiSamy before 1.6.6 allows XSS via HTML tag smuggling on STYLE content - https://nvd.nist.gov/vuln/detail/CVE-2022-28367. NOTE: This release only included a PARTIAL fix.
+* AntiSamy CVE #5 - CVE-2022-29577: AntiSamy before 1.6.7 allows XSS via HTML tag smuggling on STYLE content - https://nvd.nist.gov/vuln/detail/CVE-2022-29577. This is the complete fix to the previous CVE.
+* AntiSamy CVE #6 - CVE-2023-43643: AntiSamy before 1.7.4 subject to mutation XSS (mXSS) when preserving comments - https://nvd.nist.gov/vuln/detail/CVE-2023-43643
+* AntiSamy CVE #7 - CVE-2024-23635: AntiSamy before 1.7.5 subject to mXSS when preserving comments - https://nvd.nist.gov/vuln/detail/CVE-2024-23635
 
 CVEs in AntiSamy dependencies:
-* AntiSamy prior to 1.6.6 used the old CyberNeko HTML library v1.9.22, which is subject to https://www.cvedetails.com/cve/CVE-2022-28366 and no longer maintained. AntiSamy 1.6.6 upgraded to an active fork of CyberNeko called HtmlUnit-Neko which fixed this CVE in v2.27 of that library. AntiSamy 1.6.6 upgraded to version 2.60.0 of HtmlUnit-Neko.
-* AntiSamy 1.6.8 upgraded to HtmlUnit-Neko v2.61.0 because v2.60.0 is subject to https://www.cvedetails.com/cve/CVE-2022-29546
+* AntiSamy before 1.6.6 used the old CyberNeko HTML library net.sourceforge.nekohtml:nekohtml:1.9.22, which is subject to https://nvd.nist.gov/vuln/detail/CVE-2022-28366 and no longer maintained. AntiSamy 1.6.6 upgraded to an active fork of CyberNeko at net.sourceforge.htmlunit:neko-htmlunit which fixed this CVE in v2.27 of that library. AntiSamy 1.6.6 upgraded to net.sourceforge.htmlunit:neko-htmlunit:2.60.0
+* AntiSamy 1.6.8 upgraded to net.sourceforge.htmlunit:neko-htmlunit:2.61.0 because v2.60.0 is subject to https://nvd.nist.gov/vuln/detail/CVE-2022-29546
+* AntiSamy 1.7.3 upgraded to org.htmlunit:neko-htmlunit:3.1.0 because all versions of net.sourceforge.htmlunit:neko-htmlunit prior to 3.0.0 are subject to https://nvd.nist.gov/vuln/detail/CVE-2023-26119 (Note the group name change for neko-htmlunit starting with v3.0.0)
+* AntiSamy 1.7.4 upgraded to batik-css v1.17 because batik-css:1.16 is subject to https://nvd.nist.gov/vuln/detail/CVE-2022-44729