Skip to content

Commit

Permalink
Minor improvements to README. Additional enhancements to pom.xml,
Browse files Browse the repository at this point in the history
including adding some reporting plugins and some exclusions to
eliminate convergance issues with dependendcies. Additional test
cases, 3 of which that are failing currently.
  • Loading branch information
davewichers committed Mar 29, 2019
1 parent c5b1030 commit 2e57e52
Show file tree
Hide file tree
Showing 3 changed files with 201 additions and 46 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,14 @@ A library for performing fast, configurable cleansing of HTML coming from untrus

Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc.,
that get persisted on the server. The term "malicious code" in regards to web applications usually mean "JavaScript." Mostly, Cascading Stylesheets are only considered malicious
when they invoke the JavaScript. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner.
when they invoke JavaScript. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner.

How to Use
----------
More details on antisamy are available at: https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project. Particularly at: https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project#tab=How_do_I_get_started_3F.

There is also a legacy developers guide at: https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/owaspantisamy/Developer%20Guide.pdf (not sure how long that will remain accessible).

How to Import
-------------
First, add the dependency from Maven:
```xml
<dependency>
Expand Down
149 changes: 108 additions & 41 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -55,56 +55,74 @@
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>batik-css</artifactId>
<version>1.11</version>
<exclusions>
<!-- exclude this as batik-css 1.11 uses commons-logging 1.0.4 and we want to eliminate the convergence mismatch -->
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<dependency>
<groupId>net.sourceforge.nekohtml</groupId>
<artifactId>nekohtml</artifactId>
<version>1.9.22</version>
<exclusions>
<!-- exclude this as nekohtml 1.9.22 uses xercesImpl 2.11.0 and we want to eliminate the convergence mismatch -->
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<type>jar</type>
<scope>test</scope>
<version>4.12</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.7</version>
<exclusions>
<!-- exclude this as httpclient 4.5.7 uses commons-codec 1.11 and we want to eliminate the convergence mismatch -->
<exclusion>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.12.0</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<version>7.6.21.v20160908</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
<version>7.6.21.v20160908</version>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.12</version>
</dependency>

<!-- Test dependencies -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>

</dependencies>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.1.1</version>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
Expand All @@ -131,8 +149,14 @@
</manifest>
</archive>
</configuration>
</plugin>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.1</version>
<executions>
Expand All @@ -142,18 +166,61 @@
<goals><goal>jar</goal></goals>
</execution>
</executions>
</plugin>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
<executions>
<execution>
<id>attach-sources</id>
<phase>package</phase>
<goals><goal>jar-no-fork</goal></goals>
</execution>
</executions>
</plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
<executions>
<execution>
<id>attach-sources</id>
<phase>package</phase>
<goals><goal>jar-no-fork</goal></goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>versions-maven-plugin</artifactId>
<version>2.5</version>
<reportSets>
<reportSet>
<reports>
<report>dependency-updates-report</report>
<report>plugin-updates-report</report>
</reports>
</reportSet>
</reportSets>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
<reportSets>
<reportSet>
<reports>
<report>dependency-convergence</report>
</reports>
</reportSet>
</reportSets>
<configuration>
<dependencyLocationsEnabled>false</dependencyLocationsEnabled>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
88 changes: 86 additions & 2 deletions src/test/java/org/owasp/validator/html/test/AntiSamyTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1284,7 +1284,6 @@ public void testGithubIssue23() throws ScanException, PolicyException {
// However, the test above can't replicate this misbehavior.
}


@Test
public void testGithubIssue24() throws ScanException, PolicyException {

Expand All @@ -1297,7 +1296,20 @@ public void testGithubIssue24() throws ScanException, PolicyException {
assertThat(as.scan(test24, revisedPolicy, AntiSamy.SAX).getCleanHTML(), containsString(email));
assertThat(as.scan(test24, revisedPolicy, AntiSamy.DOM).getCleanHTML(), containsString(email));
}


@Test
public void testGithubIssue26() throws ScanException, PolicyException {
// Potential bypass (False positive)
String test26 = "&#x22;&#x3E;&#x3C;&#x69;&#x6D;&#x67;&#x20;&#x73;&#x72;&#x63;&#x3D;&#x61;&#x20;&#x6F;&#x6E;&#x65;&#x72;&#x72;&#x6F;&#x72;&#x3D;&#x61;&#x6C;&#x65;&#x72;&#x74;&#x28;&#x31;&#x29;&#x3E;";
// Issue claims you end up with this:
// ><img src=a onerror=alert(1)>

assertThat(as.scan(test26, policy, AntiSamy.SAX).getCleanHTML(), not(containsString("<img src=a onerror=alert(1)>")));
assertThat(as.scan(test26, policy, AntiSamy.DOM).getCleanHTML(), not(containsString("<img src=a onerror=alert(1)>")));

// But you actually end up with this: &quot;&gt;&lt;img src=a onerror=alert(1)&gt; -- Which is as expected
}

@Test
public void testGithubIssue27() throws ScanException, PolicyException {
// This test doesn't cause an ArrayIndexOutOfBoundsException, as reported in this issue even though it
Expand All @@ -1307,4 +1319,76 @@ public void testGithubIssue27() throws ScanException, PolicyException {
assertThat(as.scan(test27, policy, AntiSamy.SAX).getCleanHTML(), containsString("test"));
}

static final String test33 = "<html>\n"
+ "<head>\n"
+ " <title>Test</title>\n"
+ "</head>\n"
+ "<body>\n"
+ " <h1>Tricky Encoding</h1>\n"
+ " <h2>NOT Sanitized by AntiSamy</h2>\n"
+ " <ol>\n"
+ " <li><a href=\"javascript&#00058x=alert,x%281%29\">X&#00058;x</a></li>\n"
+ " <li><a href=\"javascript&#00058y=alert,y%281%29\">X&#00058;y</a></li>\n"

+ " <li><a href=\"javascript&#58x=alert,x%281%29\">X&#58;x</a></li>\n"
+ " <li><a href=\"javascript&#58y=alert,y%281%29\">X&#58;y</a></li>\n"

+ " <li><a href=\"javascript&#x0003Ax=alert,x%281%29\">X&#x0003A;x</a></li>\n"
+ " <li><a href=\"javascript&#x0003Ay=alert,y%281%29\">X&#x0003A;y</a></li>\n"

+ " <li><a href=\"javascript&#x3Ax=alert,x%281%29\">X&#x3A;x</a></li>\n"
+ " <li><a href=\"javascript&#x3Ay=alert,y%281%29\">X&#x3A;y</a></li>\n"
+ " </ol>\n"
+ " <h1>Tricky Encoding with Ampersand Encoding</h1>\n"
+ " <p>AntiSamy turns harmless payload into XSS by just decoding the encoded ampersands in the href attribute</a>\n"
+ " <ol>\n"
+ " <li><a href=\"javascript&amp;#x3Ax=alert,x%281%29\">X&amp;#x3A;x</a></li>\n"
+ " <li><a href=\"javascript&AMP;#x3Ax=alert,x%281%29\">X&AMP;#x3A;x</a></li>\n"

+ " <li><a href=\"javascript&#38;#x3Ax=alert,x%281%29\">X&#38;#x3A;x</a></li>\n"
+ " <li><a href=\"javascript&#00038;#x3Ax=alert,x%281%29\">X&#00038;#x3A;x</a></li>\n"

+ " <li><a href=\"javascript&#x26;#x3Ax=alert,x%281%29\">X&#x26;#x3A;x</a></li>\n"
+ " <li><a href=\"javascript&#x00026;#x3Ax=alert,x%281%29\">X&#x00026;#x3A;x</a></li>\n"
+ " </ol>\n"
+ " <p><a href=\"javascript&#x3Ax=alert,x%281%29\">Original without ampersand encoding</a></p>\n"
+ "</body>\n"
+ "</html>";

@Test
public void testGithubIssue33a() throws ScanException, PolicyException {

// Potential bypass

// Issue claims you end up with this:
// javascript:x=alert and other similar problems (javascript&#00058x=alert,x%281%29) but can't replicate that.
//System.out.println(as.scan(test33, policy, AntiSamy.SAX).getCleanHTML());

assertThat(as.scan(test33, policy, AntiSamy.SAX).getCleanHTML(), not(containsString("javascript&#00058x=alert,x%281%29")));
assertThat(as.scan(test33, policy, AntiSamy.DOM).getCleanHTML(), not(containsString("javascript&#00058x=alert,x%281%29")));
}


@Test
public void testGithubIssue34a() throws ScanException, PolicyException {

// bypass stripNonValidXMLCharacters
// Issue indicates: "<div>Hello\\uD83D\\uDC95</div>" should be sanitized to: "<div>Hello</div>"

String test34a = "<div>Hello\uD83D\uDC95</div>";
assertEquals("<div>Hello</div>", as.scan(test34a, policy, AntiSamy.SAX).getCleanHTML());
assertEquals("<div>Hello</div>", as.scan(test34a, policy, AntiSamy.DOM).getCleanHTML());
}

@Test
public void testGithubIssue34b() throws ScanException, PolicyException {

// bypass stripNonValidXMLCharacters
// Issue indicates: "<div>Hello\\uD83D\\uDC95</div>" should be sanitized to: "<div>Hello</div>"

String test34b = "\uD888";
assertEquals("", as.scan(test34b, policy, AntiSamy.DOM).getCleanHTML());
assertEquals("", as.scan(test34b, policy, AntiSamy.SAX).getCleanHTML());
}

}

0 comments on commit 2e57e52

Please sign in to comment.