Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: nahsra/antisamy
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.7.0
Choose a base ref
...
head repository: nahsra/antisamy
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Loading
Showing with 2,660 additions and 561 deletions.
  1. +6 −11 .github/workflows/codeql-analysis.yml
  2. +6 −4 .github/workflows/maven.yml
  3. +2 −2 .github/workflows/shiftleft-analysis.yml
  4. +31 −12 README.md
  5. +11 −7 SECURITY.md
  6. +114 −67 pom.xml
  7. +85 −24 src/main/java/org/owasp/validator/css/CssHandler.java
  8. +213 −57 src/main/java/org/owasp/validator/css/CssParser.java
  9. +26 −5 src/main/java/org/owasp/validator/css/CssScanner.java
  10. +104 −10 src/main/java/org/owasp/validator/css/CssValidator.java
  11. +28 −0 src/main/java/org/owasp/validator/css/media/CssMediaFeature.java
  12. +36 −0 src/main/java/org/owasp/validator/css/media/CssMediaQuery.java
  13. +44 −0 src/main/java/org/owasp/validator/css/media/CssMediaQueryList.java
  14. +25 −0 src/main/java/org/owasp/validator/css/media/CssMediaQueryLogicalOperator.java
  15. +25 −0 src/main/java/org/owasp/validator/css/media/CssMediaType.java
  16. +15 −0 src/main/java/org/owasp/validator/css/package.html
  17. +58 −45 src/main/java/org/owasp/validator/html/AntiSamy.java
  18. +75 −46 src/main/java/org/owasp/validator/html/CleanResults.java
  19. +9 −2 src/main/java/org/owasp/validator/html/InternalPolicy.java
  20. +16 −5 src/main/java/org/owasp/validator/html/Policy.java
  21. +1 −0 src/main/java/org/owasp/validator/html/model/Tag.java
  22. +14 −0 src/main/java/org/owasp/validator/html/model/package.html
  23. +12 −0 src/main/java/org/owasp/validator/html/package.html
  24. +159 −11 src/main/java/org/owasp/validator/html/scan/ASHTMLSerializer.java
  25. +36 −16 src/main/java/org/owasp/validator/html/scan/AbstractAntiSamyScanner.java
  26. +55 −33 src/main/java/org/owasp/validator/html/scan/AntiSamyDOMScanner.java
  27. +83 −24 src/main/java/org/owasp/validator/html/scan/AntiSamySAXScanner.java
  28. +56 −45 src/main/java/org/owasp/validator/html/scan/MagicSAXFilter.java
  29. +14 −0 src/main/java/org/owasp/validator/html/scan/package.html
  30. +10 −4 src/main/java/org/owasp/validator/html/util/ErrorMessageUtil.java
  31. +14 −0 src/main/java/org/owasp/validator/html/util/package.html
  32. +2 −2 src/main/resources/AntiSamy_en_CA.properties
  33. +2 −2 src/main/resources/AntiSamy_en_GB.properties
  34. +2 −2 src/main/resources/AntiSamy_en_US.properties
  35. +3 −3 src/main/resources/AntiSamy_es_MX.properties
  36. +4 −4 src/main/resources/AntiSamy_no_NB.properties
  37. +2 −2 src/main/resources/AntiSamy_ru_RU.properties
  38. +2 −2 src/main/resources/AntiSamy_zh_CN.properties
  39. +478 −3 src/main/resources/antisamy-anythinggoes.xml
  40. +3 −3 src/main/resources/antisamy-ebay.xml
  41. +3 −3 src/main/resources/antisamy-myspace.xml
  42. +21 −3 src/main/resources/antisamy.xml
  43. +7 −10 src/site/site.xml
  44. +218 −0 src/test/java/org/owasp/validator/css/CssHandlerTest.java
  45. +5 −1 src/test/java/org/owasp/validator/css/CssScannerTest.java
  46. +102 −0 src/test/java/org/owasp/validator/css/CssValidatorTest.java
  47. +421 −91 src/test/java/org/owasp/validator/html/test/AntiSamyTest.java
  48. +1 −0 src/test/java/org/owasp/validator/html/test/LiteralTest.java
  49. +1 −0 src/test/java/org/owasp/validator/html/test/TestPolicy.java
17 changes: 6 additions & 11 deletions .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
@@ -29,19 +29,14 @@ jobs:
strategy:
fail-fast: false
matrix:
# Override automatic language detection by changing the below list
# Supported options are ['csharp', 'cpp', 'go', 'java', 'javascript', 'python']
language: ['java', 'javascript']
# Learn more...
# https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#overriding-automatic-language-detection

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
# We must fetch at least the immediate parents so that if this is
# a pull request then we can checkout the head.
fetch-depth: 2
# Change from default script: Perform a deep clone so origin/main can be found.
fetch-depth: 0

# If this run was triggered by a pull request event, then checkout
# the head of the pull request instead of the merge commit.
@@ -50,7 +45,7 @@ jobs:

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -61,7 +56,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3

# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
@@ -75,5 +70,5 @@ jobs:
# make release

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3

10 changes: 6 additions & 4 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
@@ -10,11 +10,13 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up JDK 1.8
uses: actions/setup-java@v3
- uses: actions/checkout@v4
with:
java-version: 8
fetch-depth: 0
- name: Set up JDK 11
uses: actions/setup-java@v4
with:
java-version: 11
distribution: zulu
- name: Run unit tests
run: mvn test
4 changes: 2 additions & 2 deletions .github/workflows/shiftleft-analysis.yml
Original file line number Diff line number Diff line change
@@ -15,7 +15,7 @@ jobs:
permissions:
security-events: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
# Instructions
# 1. Setup JDK, Node.js, Python etc depending on your project type
# 2. Compile or build the project before invoking scan
@@ -35,6 +35,6 @@ jobs:
# type: python

- name: Upload report
uses: github/codeql-action/upload-sarif@v2
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: reports
43 changes: 31 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -8,16 +8,27 @@ Another way of saying that could be: It's an API that helps you make sure that c

Throughout the development of the 1.6.x series, we have identified and deprecated a number of features and APIs. All of these deprecated items have been removed in the 1.7.0 release. These changes were all tracked in ticket: https://github.com/nahsra/antisamy/issues/195. Each of the changes are described below:

CssHandler had 2 constructors which dropped the LinkedList<URI> embeddedStyleSheets parameter. Both contructors now create an empty internal LinkedList<URI> and the method getImportedStylesheetsURIList() can be used to get a reference to it, if needed. This feature is rarely used, and in fact direct invocation of these constructors is also rare, so this change is unlikely to affect most users of AntiSamy. When used, normally an empty list is passed in as this parameter value and that list is never used again.
`CssHandler` had 2 constructors which dropped the `LinkedList<URI> embeddedStyleSheets` parameter. Both constructors now create an empty internal `LinkedList<URI>` and the method `getImportedStylesheetsURIList()` can be used to get a reference to it, if needed. This feature is rarely used, and in fact direct invocation of these constructors is also rare, so this change is unlikely to affect most users of AntiSamy. When used, normally an empty list is passed in as this parameter value and that list is never used again.

* The CssHandler(Policy, LinkedList\<URI\>, List\<String\>, ResourceBundle) was dropped
* It was replaced with: CssHandler(Policy, List\<String\>, ResourceBundle)
* The CssHandler(Policy, LinkedList\<URI\>, List\<String\>, String, ResourceBundle) was dropped
* It was replaced with: CssHandler(Policy, List\<String\>, ResourceBundle, String). NOTE: The order of the last 2 parameters to this method was reversed.
* The `CssHandler(Policy, LinkedList<URI>, List<String>, ResourceBundle)` signature was dropped
* It was replaced with: `CssHandler(Policy, List<String>, ResourceBundle)`
* The `CssHandler(Policy, LinkedList<URI>, List<String>, String, ResourceBundle)` signature was dropped
* It was replaced with: `CssHandler(Policy, List<String>, ResourceBundle, String)`. NOTE: The order of the last 2 parameters to this method was reversed.

* Support for XHTML was dropped. AntiSamy now only supports HTML. As we believe this was a rarely used feature, we don't expect this to affect many AntiSamy users.
* XML Schema validation is now required on AntiSamy policy files and cannot be disabled. You must make your policy file schema compliant in order to use it with AntiSamy.
* The policy directive 'noopenerAndNoreferrerAnchors' is now ON by default. If it is disabled, AntiSamy issues a nag, encouraging you to enable it.
* The policy directive `noopenerAndNoreferrerAnchors` is now ON by default. If it is disabled, AntiSamy issues a nag, encouraging you to enable it.

## Output format disclaimers
Along AntiSamy's upgrade lifecycle the HTML parser dependency had changes, leading into some output differences that may come up depending on the use case. Consider this if you were using certain versions and get different outputs after upgrading.

This can also apply for the output serializer that transforms the internal HTML representation to the final text output of the tool.

## Deprecating support for external stylesheets

The AntiSamy team has decided that supporting the ability to allow embedded remote CSS is dangerous and so we are deprecating this feature and it will be removed in a future release. It is expected that there are very few, if any, users of this feature.

We have added a log WARNing if this feature is invoked. If you are, please disable/remove this feature by switching to the primary `CssScanner` constructor that does not enable this feature.

## How to Use

@@ -33,7 +44,7 @@ First, add the dependency from Maven:
```

### 2. Choosing a base policy file
Chances are that your site’s use case for AntiSamy is at least roughly comparable to one of the predefined policy files. They each represent a typical scenario for allowing users to provide HTML (and possibly CSS) formatting information. Let’s look into the different policy files:
Chances are that your site’s use case for AntiSamy is at least roughly comparable to one of the predefined policy files. They each represent a "typical" scenario for allowing users to provide HTML (and possibly CSS) formatting information. Let’s look into the different policy files:

1) antisamy-slashdot.xml

@@ -56,11 +67,18 @@ We don’t know of a possible use case for this policy file. If you wanted to al
### Logging
AntiSamy now includes the slf4j-simple library for its logging, but AntiSamy users can import and use an alternate slf4j compatible logging library if they prefer. They can also then exclude slf4j-simple if they want to.

WARNING: AntiSamy's use of slf4j-simple, without any configuration file, logs messages in a buffered manner to standard output. As such, some or all of these log messages may get lost if an Exception, such as a PolicyException is thrown. This can likely be rectified by configuring slf4j-simple to log to standard error instead, or use an alternate slf4j logger that does so.
WARNING: AntiSamy's use of slf4j-simple, without any configuration file, logs messages in a buffered manner to standard output. As such, some or all of these log messages may get lost if an `Exception`, such as a `PolicyException` is thrown. This can likely be rectified by configuring slf4j-simple to log to standard error instead, or use an alternate slf4j logger that does so.

### 3. Tailoring the policy file
You may want to deploy AntiSamy in a default configuration, but it’s equally likely that a site may want to have strict, business-driven rules for what users can allow. The discussion that decides the tailoring should also consider attack surface - which grows in relative proportion to the policy file.

Example policies can be adapted and tested based on the requirements for each tag. The supported tag actions that can be specified are:
- `filter`: remove tags, but keep content.
- `validate`: keep content as long as it passes rules.
- `remove`: remove tag and contents.
- `truncate`: remove tag attributes and all child tags except por its text content if any.
- `encode`: similar to filter but it encodes the tag for HTML to preserve it as raw text and its children are moved up one level in the hierarchy.

### 4. Calling the AntiSamy API
Using AntiSamy is easy. Here is an example of invoking AntiSamy with a policy file:

@@ -94,18 +112,19 @@ CleanResults cr = as.scan(dirtyInput, new File(policyFilePath));
### 5. Analyzing CleanResults
The `CleanResults` object provides a lot of useful stuff.

* `getErrorMessages()` - a list of String error messages -- *if this returns 0 that does not mean there were no attacks!*
* `getCleanHTML()` - the clean, safe HTML output
* `getCleanXMLDocumentFragment()` - the clean, safe `XMLDocumentFragment` which is reflected in `getCleanHTML()`
* `getErrorMessages()` - a list of String error messages -- *if this returns 0 that does not mean there were no attacks!*
* `getNumberOfErrors()` - the number of error messages -- *Again, 0 does not mean the input was safe!*
* `getScanTime()` - returns the scan time in seconds

__Important Note__: There has been much confusion about the `getErrorMessages()` method. The `getErrorMessages()` method does not subtly answer the question "is this safe input?" in the affirmative if it returns an empty list. You must always use the sanitized input and there is no way to be sure the input passed in had no attacks.
__Important Note__: There has been much confusion about the `getErrorMessages()` method. The `getErrorMessages()` method (nor `getNumberOfErrors()`) does not subtly answer the question "is this safe input?" in the affirmative if it returns an empty list. You must always use the sanitized input and there is no way to be sure the input passed in had no attacks.

The serialization and deserialization process that is critical to the effectiveness of the sanitizer is purposefully lossy and will filter out attacks via a number of attack vectors. Unfortunately, one of the tradeoffs of this strategy is that we don't always know in retrospect that an attack was seen. Thus, the `getErrorMessages()` API is there to help users understand their well-intentioned input meet the requirements of the system, not help a developer detect if an attack was present.
The serialization and deserialization process that is critical to the effectiveness of the sanitizer is purposefully lossy and will filter out attacks via a number of attack vectors. Unfortunately, one of the tradeoffs of this strategy is that AntiSamy doesn't always know in retrospect that an attack was seen. Thus, the `getErrorMessages()` and `getNumberOfErrors()` APIs are there to help users understand whether their well-intentioned input meets the requirements of the system, not help a developer detect if an attack was present.

## Other Documentation

Additional documentation is available on this Github project's wiki page: https://github.com/nahsra/antisamy/wiki
Additional documentation is available on this GitHub project's wiki page: https://github.com/nahsra/antisamy/wiki
and the OWASP AntiSamy Project Page: https://owasp.org/www-project-antisamy/

## Contributing to AntiSamy
18 changes: 11 additions & 7 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -28,12 +28,16 @@ can understand what needs to be done to fix it.

These are the known CVEs reported for AntiSamy:

* AntiSamy CVE #1 - CVE-2016-10006: XSS Bypass in AntiSamy before v1.5.5 - https://www.cvedetails.com/cve/CVE-2016-10006
* AntiSamy CVE #2 - CVE-2017-14735: XSS via HTML5 Entities in AntiSamy before v1.5.7 - https://www.cvedetails.com/cve/CVE-2017-14735
* AntiSamy CVE #3 - CVE-2021-35043: XSS via HTML attributes using &#00058 as replacement for : character before v1.6.4 - https://www.cvedetails.com/cve/CVE-2021-35043
* AntiSamy CVE #4 - CVE-2022-28367: AntiSamy before 1.6.6 allows XSS via HTML tag smuggling on STYLE content. https://www.cvedetails.com/cve/CVE-2022-28367. NOTE: This release only included a PARTIAL fix.
* AntiSamy CVE #5 - CVE-2022-29577: AntiSamy before 1.6.7 allows XSS via HTML tag smuggling on STYLE content. - https://www.cvedetails.com/cve/CVE-2022-29577. This is the complete fix to the previous CVE.
* AntiSamy CVE #1 - CVE-2016-10006: AntiSamy before 1.5.5 allows XSS Bypass - https://nvd.nist.gov/vuln/detail/CVE-2016-10006
* AntiSamy CVE #2 - CVE-2017-14735: AntiSamy before 1.5.7 allows XSS via HTML5 Entities - https://nvd.nist.gov/vuln/detail/CVE-2017-14735
* AntiSamy CVE #3 - CVE-2021-35043: AntiSamy before 1.6.4 allows XSS via HTML attributes using &#00058 as replacement for : character - https://nvd.nist.gov/vuln/detail/CVE-2021-35043
* AntiSamy CVE #4 - CVE-2022-28367: AntiSamy before 1.6.6 allows XSS via HTML tag smuggling on STYLE content - https://nvd.nist.gov/vuln/detail/CVE-2022-28367. NOTE: This release only included a PARTIAL fix.
* AntiSamy CVE #5 - CVE-2022-29577: AntiSamy before 1.6.7 allows XSS via HTML tag smuggling on STYLE content - https://nvd.nist.gov/vuln/detail/CVE-2022-29577. This is the complete fix to the previous CVE.
* AntiSamy CVE #6 - CVE-2023-43643: AntiSamy before 1.7.4 subject to mutation XSS (mXSS) when preserving comments - https://nvd.nist.gov/vuln/detail/CVE-2023-43643
* AntiSamy CVE #7 - CVE-2024-23635: AntiSamy before 1.7.5 subject to mXSS when preserving comments - https://nvd.nist.gov/vuln/detail/CVE-2024-23635

CVEs in AntiSamy dependencies:
* AntiSamy prior to 1.6.6 used the old CyberNeko HTML library v1.9.22, which is subject to https://www.cvedetails.com/cve/CVE-2022-28366 and no longer maintained. AntiSamy 1.6.6 upgraded to an active fork of CyberNeko called HtmlUnit-Neko which fixed this CVE in v2.27 of that library. AntiSamy 1.6.6 upgraded to version 2.60.0 of HtmlUnit-Neko.
* AntiSamy 1.6.8 upgraded to HtmlUnit-Neko v2.61.0 because v2.60.0 is subject to https://www.cvedetails.com/cve/CVE-2022-29546
* AntiSamy before 1.6.6 used the old CyberNeko HTML library net.sourceforge.nekohtml:nekohtml:1.9.22, which is subject to https://nvd.nist.gov/vuln/detail/CVE-2022-28366 and no longer maintained. AntiSamy 1.6.6 upgraded to an active fork of CyberNeko at net.sourceforge.htmlunit:neko-htmlunit which fixed this CVE in v2.27 of that library. AntiSamy 1.6.6 upgraded to net.sourceforge.htmlunit:neko-htmlunit:2.60.0
* AntiSamy 1.6.8 upgraded to net.sourceforge.htmlunit:neko-htmlunit:2.61.0 because v2.60.0 is subject to https://nvd.nist.gov/vuln/detail/CVE-2022-29546
* AntiSamy 1.7.3 upgraded to org.htmlunit:neko-htmlunit:3.1.0 because all versions of net.sourceforge.htmlunit:neko-htmlunit prior to 3.0.0 are subject to https://nvd.nist.gov/vuln/detail/CVE-2023-26119 (Note the group name change for neko-htmlunit starting with v3.0.0)
* AntiSamy 1.7.4 upgraded to batik-css v1.17 because batik-css:1.16 is subject to https://nvd.nist.gov/vuln/detail/CVE-2022-44729
Loading