|
1 |
| -# OWASP Java HTML Sanitizer |
| 1 | +# WAT |
2 | 2 |
|
3 |
| -[](https://github.com/OWASP/java-html-sanitizer/actions/workflows/maven.yml) [](https://coveralls.io/github/OWASP/java-html-sanitizer?branch=main) [](https://bestpractices.coreinfrastructure.org/projects/2602) [](https://search.maven.org/artifact/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer) |
4 |
| - |
5 |
| - |
6 |
| -A fast and easy to configure HTML Sanitizer written in Java which lets |
7 |
| -you include HTML authored by third-parties in your web application while |
8 |
| -protecting against XSS. |
9 |
| - |
10 |
| -The existing dependency is on JSR 305. The other jars |
11 |
| -are only needed by the test suite. The JSR 305 dependency is a |
12 |
| -compile-only dependency, only needed for annotations. |
13 |
| - |
14 |
| -This code was written with security best practices in mind, has an |
15 |
| -extensive test suite, and has undergone |
16 |
| -[adversarial security review](docs/attack_review_ground_rules.md). |
17 |
| - |
18 |
| -## Table Of Contents |
19 |
| - |
20 |
| -* [Getting Started](#getting-started) |
21 |
| -* [Prepackaged Policies](#prepackaged-policies) |
22 |
| -* [Crafting a policy](#crafting-a-policy) |
23 |
| -* [Custom policies](#custom-policies) |
24 |
| -* [Preprocessors](#preprocessors) |
25 |
| -* [Telemetry](#telemetry) |
26 |
| -* [Questions\?](#questions) |
27 |
| -* [Contributing](#contributing) |
28 |
| -* [Credits](#credits) |
29 |
| - |
30 |
| -## Getting Started |
31 |
| - |
32 |
| -[Getting Started](docs/getting_started.md) includes instructions on |
33 |
| -how to get started with or without Maven. |
34 |
| - |
35 |
| -## Prepackaged Policies |
36 |
| - |
37 |
| -You can use |
38 |
| -[prepackaged policies](https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20240325.1/org/owasp/html/Sanitizers.html): |
39 |
| - |
40 |
| -```Java |
41 |
| -PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS); |
42 |
| -String safeHTML = policy.sanitize(untrustedHTML); |
43 |
| -``` |
44 |
| - |
45 |
| -## Crafting a policy |
46 |
| - |
47 |
| -The |
48 |
| -[tests](https://github.com/OWASP/java-html-sanitizer/blob/main/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java) |
49 |
| -show how to configure your own |
50 |
| -[policy](https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20240325.1/org/owasp/html/HtmlPolicyBuilder.html): |
51 |
| - |
52 |
| -```Java |
53 |
| -PolicyFactory policy = new HtmlPolicyBuilder() |
54 |
| - .allowElements("a") |
55 |
| - .allowUrlProtocols("https") |
56 |
| - .allowAttributes("href").onElements("a") |
57 |
| - .requireRelNofollowOnLinks() |
58 |
| - .toFactory(); |
59 |
| -String safeHTML = policy.sanitize(untrustedHTML); |
60 |
| -``` |
61 |
| - |
62 |
| -## Custom Policies |
63 |
| - |
64 |
| -You can write |
65 |
| -[custom policies](https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20240325.1/org/owasp/html/ElementPolicy.html) |
66 |
| -to do things like changing `h1`s to `div`s with a certain class: |
67 |
| - |
68 |
| -```Java |
69 |
| -PolicyFactory policy = new HtmlPolicyBuilder() |
70 |
| - .allowElements("p") |
71 |
| - .allowElements( |
72 |
| - (String elementName, List<String> attrs) -> { |
73 |
| - // Add a class attribute. |
74 |
| - attrs.add("class"); |
75 |
| - attrs.add("header-" + elementName); |
76 |
| - // Return elementName to include, null to drop. |
77 |
| - return "div"; |
78 |
| - }, "h1", "h2", "h3", "h4", "h5", "h6") |
79 |
| - .toFactory(); |
80 |
| -String safeHTML = policy.sanitize(untrustedHTML); |
81 |
| -``` |
82 |
| - |
83 |
| -Please note that the elements "a", "font", "img", "input" and "span" |
84 |
| -need to be explicitly whitelisted using the `allowWithoutAttributes()` |
85 |
| -method if you want them to be allowed through the filter when these |
86 |
| -elements do not include any attributes. |
87 |
| - |
88 |
| -[Attribute policies](https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20240325.1/org/owasp/html/AttributePolicy.html) allow running custom code too. Adding an attribute policy will not water down any default policy like `style` or URL attribute checks. |
89 |
| - |
90 |
| -```Java |
91 |
| -new HtmlPolicyBuilder = new HtmlPolicyBuilder() |
92 |
| - .allowElement("div", "span") |
93 |
| - .allowAttributes("data-foo") |
94 |
| - .matching( |
95 |
| - (String elementName, String attributeName, String value) -> { |
96 |
| - // Return value for the attribute or null to drop. |
97 |
| - }) |
98 |
| - .onElements("div", "span") |
99 |
| - .build() |
100 |
| -``` |
101 |
| - |
102 |
| -## Preprocessors |
103 |
| - |
104 |
| -Preprocessors allow inserting text and large scale structural changes. |
105 |
| - |
106 |
| -```Java |
107 |
| -new HtmlPolicyBuilder = new HtmlPolicyBuilder() |
108 |
| - // Use a preprocessor to be backwards compatible with the |
109 |
| - // <plaintext> element which |
110 |
| - .withPreprocessor( |
111 |
| - (HtmlStreamEventReceiver r) -> { |
112 |
| - // Provide user with info about links before they click. |
113 |
| - // Before: <a href="https://example.com/..."> |
114 |
| - // After: (https://example.com) <a href="https://example.com/..."> |
115 |
| - return new HtmlStreamEventReceiverWrapper(r) { |
116 |
| - @Override public void openTag(String elementName, List<String> attrs) { |
117 |
| - if ("a".equals(elementName)) { |
118 |
| - for (int i = 0, n = attrs.size(); i < n; i += 2) { |
119 |
| - if ("href".equals(attrs.get(i)) { |
120 |
| - String url = attrs.get(i + 1); |
121 |
| - String origin; |
122 |
| - try { |
123 |
| - URI uri = new URI(url); |
124 |
| - String scheme = uri.getScheme(); |
125 |
| - String authority = uri.getRawAuthority(); |
126 |
| - if (scheme == null && authority == null) { |
127 |
| - origin = null; |
128 |
| - } else { |
129 |
| - origin = (scheme != null ? scheme + ":" : "") |
130 |
| - + (authority != null ? "//" + authority : ""); |
131 |
| - } |
132 |
| - } catch (URISyntaxException ex) { |
133 |
| - origin = "about:invalid"; |
134 |
| - } |
135 |
| - if (origin != null) { |
136 |
| - text(" (" + origin + ") "); |
137 |
| - } |
138 |
| - } |
139 |
| - } |
140 |
| - } |
141 |
| - super.openTag(elementName, attrs); |
142 |
| - } |
143 |
| - }; |
144 |
| - } |
145 |
| - .allowElement("a") |
146 |
| - ... |
147 |
| - .build() |
148 |
| - |
149 |
| -``` |
150 |
| - |
151 |
| -Preprocessing happens before a policy is applied, so cannot affect the security |
152 |
| -of the output. |
153 |
| - |
154 |
| -## Telemetry |
155 |
| - |
156 |
| -When a policy rejects an element or attribute it notifies an [HtmlChangeListener](https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20240325.1/org/owasp/html/HtmlChangeListener.html). |
157 |
| - |
158 |
| -You can use this to keep track of policy violation trends and find out when someone |
159 |
| -is making an effort to breach your security. |
160 |
| - |
161 |
| -```Java |
162 |
| -PolicyFactory myPolicyFactory = ...; |
163 |
| -// If you need to associate reports with some context, you can do so. |
164 |
| -MyContextClass myContext = ...; |
165 |
| - |
166 |
| -String sanitizedHtml = myPolicyFactory.sanitize( |
167 |
| - unsanitizedHtml, |
168 |
| - new HtmlChangeListener<MyContextClass>() { |
169 |
| - @Override |
170 |
| - public void discardedTag(MyContextClass context, String elementName) { |
171 |
| - // ... |
172 |
| - } |
173 |
| - @Override |
174 |
| - public void discardedAttributes( |
175 |
| - MyContextClass context, String elementName, String... attributeNames) { |
176 |
| - // ... |
177 |
| - } |
178 |
| - }, |
179 |
| - myContext); |
180 |
| -``` |
181 |
| - |
182 |
| -**Note**: If a string sanitizes with no change notifications, it is not the case |
183 |
| -that the input string is necessarily safe to use. Only use the output of the sanitizer. |
184 |
| - |
185 |
| -The sanitizer ensures that the output is in a sub-set of HTML that commonly |
186 |
| -used HTML parsers will agree on the meaning of, but the absence of |
187 |
| -notifications does not mean that the input is in such a sub-set, |
188 |
| -only that it does not contain elements or attributes that were removed. |
189 |
| - |
190 |
| -See ["Why sanitize when you can validate"](https://github.com/OWASP/java-html-sanitizer/blob/main/docs/html-validation.md) for more on this topic. |
191 |
| - |
192 |
| -## Questions? |
193 |
| - |
194 |
| -If you wish to report a vulnerability, please see |
195 |
| -[AttackReviewGroundRules](docs/attack_review_ground_rules.md). |
196 |
| - |
197 |
| -Subscribe to the |
198 |
| -[mailing list](http://groups.google.com/group/owasp-java-html-sanitizer-support) |
199 |
| -to be notified of known [Vulnerabilities](docs/vulnerabilities.md) and important updates. |
200 |
| - |
201 |
| -## Contributing |
202 |
| - |
203 |
| -If you would like to contribute, please ping [@mvsamuel](https://twitter.com/mvsamuel) or [@manicode](https://twitter.com/manicode). |
204 |
| - |
205 |
| -We welcome [issue reports](https://github.com/OWASP/java-html-sanitizer/issues) and PRs. |
206 |
| -PRs that change behavior or that add functionality should include both positive and |
207 |
| -[negative tests](https://www.guru99.com/negative-testing.html). |
208 |
| - |
209 |
| -Please be aware that contributions fall under the [Apache 2.0 License](https://github.com/OWASP/java-html-sanitizer/blob/main/COPYING). |
210 |
| - |
211 |
| -## Credits |
212 |
| - |
213 |
| -[Thanks to everyone who has helped with criticism and code](docs/credits.md) |
| 3 | +Integration branch of https://github.com/OWASP/java-html-sanitizer with out patches applied |
0 commit comments