OWASP Java HTML Sanitizer Change Log

  1. Fixed bug: r239 fix was order dependant. Maven jar manifests now include the Implementation-Version property and related properties.
  2. Fixed bug: Sanitizers.STYLES.and(...) dropped style="..." attributes.
  3. allowWithoutAttributes(true) was being ignored for a subset of elements when policies were ANDED.
  4. Fixed bug: case-sensitivity of URL protocols was ignored when a set of protocols other than the standard set was used
  5. Reworked CssSchema to allow users to extend the default property white-list.
  6. Replaced CSS sanitizer with one that does token-level filtering, and replaces the old CSS lexer that used regular expressions with one that doesn't back-track, or behave quadratically on crafted inputs.
  7. Fixed bug: tag balancer allowed </p> to close a table, so rewrote tag balancer to recognize scoping elements per HTML5.
  8. Fixed bug: missing bit in HTML schema led to text in <option> elements being elided even when the elements themselves were white-listed.
  9. Fixed bug: requireRelNoFollowOnLinks() was implicitly allowing the a element. Changed this to be consistent with document: no elements are allowed that do not appear in a call to allowElements.
  10. Add methods to policy builder to specify which elements are allowed to contain text and change default to disallow text in CDATA elements whose content is often not plain text. If custom element policies that change the element type fail, make sure the policy allows the output element type.
  11. Restrict where text-nodes can validly appear in output per HTML5 rules and changed the tag balancer to do better error recovery on misplaced phrasing content.
  12. Changed rendering to ensure that the output HTML is valid XML when the policy prohibits HTML raw text & RCDATA elements as is almost always the case.
  13. Changed lexer to treat <?…> using the HTML5 bogus comment state grammar which agrees with XML's processing instruction production. Previously, the token ended at the first "?>" or end-of-file instead of the first ">".
  14. Fixed problem with URL protocol white-listing that caused legitimate URLs to be rejected.
  15. Cleaned up raw-text tag handling. XMP, LISTING, PLAINTEXT now handled by substitution in the renderer and changed NOSCRIPT and friends so they are treated consistently when elided as when present in output. Added workaround for IE8 innerHTML wierdness.
  16. Prevent DoS of browsers via extremely deeply nested tags. In sanitized CSS, allow CSS property background-color andfont-sizes specified in px.
  17. Added convenient pre-packaged policies in Sanitizers. Fixed bug in how warnings are reported via the badHtml Handler.
  18. Better handling of supplementary codepoints to avoid UTF-16/UCS-2 confusion in browsers.
  19. Added new HTML5 URL attributes to list used to safeguard URL attributes in HtmlPolicyBuilder.
  20. Changed HtmlSanitizer.sanitize to allow null as a valid value for the HTML snippet.