OWASP Java HTML Sanitizer Change Log
- Cross-licensed under BSD 3 and Apache 2 Licenses.
- Fixed bug:
allowWithoutAttributes(true) was being ignored for
a subset of elements when policies were ANDED.
- Fixed bug: case-sensitivity of URL protocols was ignored
when a set of protocols other than the standard set was used
CssSchema to allow
users to extend the default property white-list.
- Replaced CSS sanitizer with one that does token-level
filtering, and replaces the old CSS lexer that used regular
expressions with one that doesn't back-track, or behave
quadratically on crafted inputs.
- Fixed bug: tag balancer allowed
</p> to close a table, so rewrote tag balancer
to recognize scoping elements per HTML5.
- Fixed bug: missing bit in HTML schema led to text in
<option> elements being elided even when
the elements themselves were white-listed.
- Fixed bug:
implicitly allowing the
a element. Changed this to be
consistent with document: no elements are allowed that do not appear
in a call to
- Add methods to policy builder to specify which
elements are allowed to contain text and change default to disallow
text in CDATA elements whose content is often not plain text.
If custom element policies that change the element type fail,
make sure the policy allows the output element type.
- Restrict where text-nodes can validly appear in output
per HTML5 rules and changed the tag balancer to do better error
recovery on misplaced phrasing content.
- Changed rendering to ensure that the output HTML is
valid XML when the policy prohibits
HTML raw text & RCDATA
elements as is almost always the case.
- Changed lexer to treat
using the HTML5 bogus comment state grammar which agrees with XML's
processing instruction production. Previously, the token ended at
"?>" or end-of-file instead of the first
- Fixed problem with URL protocol white-listing that
caused legitimate URLs to be rejected.
- Cleaned up raw-text tag handling. XMP, LISTING,
PLAINTEXT now handled by substitution in the renderer and
changed NOSCRIPT and friends so they are treated consistently
when elided as when present in output. Added workaround for
IE8 innerHTML wierdness.
- Prevent DoS of browsers via extremely deeply nested
tags. In sanitized CSS, allow CSS property
- Added convenient pre-packaged policies in Sanitizers.
Fixed bug in how warnings are reported via the badHtml Handler.
- Better handling of supplementary codepoints to avoid
UTF-16/UCS-2 confusion in browsers.
- Added new HTML5 URL attributes to list used to
safeguard URL attributes in
HtmlSanitizer.sanitize to allow
null as a valid value for the HTML snippet.