HTML cleaning

The contents of HTML fields can be cleaned both on client-side and the server-side.

Client-side

Client-side HTML cleaning is done by CKEditor itself. This feature is called Advanced Content Filter (ACF). Each plugin and command added to or removed from CKEditor influences the allowed HTML. For example, when there is no plugin to add an image, <img> tags will be removed automatically. This filtering also applies to attributes, which can, for instance, be allowed or required.

ACF can also be controlled per editor instance via the configuration property extraAllowedContent. Note that since Bloomreach Experience Manager 12, extraAllowedContent must be specified in JSON object format. For example:

{
  extraAllowedContent: {q: {}, cite: {classes: 'myclass'}}
}

More information on ACF and how to configure it can be found at the CKEditor documentation website.

Disable client-side HTML cleaning

ACF is enabled by default. To disable ACF, set the CKEditor property allowedContent to true:

ckeditor.config.overlayed.json:

{
  allowedContent: true
}

Server-side

Server-side HTML cleaning is done by an HTML-processor. The HTML-processor checks, cleans, and corrects the output of rich-text fields, as well as management of internal links and images. The configuration of the HTML-processor works on the basis of an allowlist that defines which elements are allowed and the attributes they may contain. If an attribute is not configured as allowed, it is stripped from the output (text nodes from elements are preserved).

By default, server-side HTML cleaning also removes any usage of the javascript: protocol and the data: protocol within <a> href and <object> data attributes. This security feature can be disabled by setting the omitJavascriptProtocol configuration property to false (see next paragraph).

Since Bloomreach Experience Manager 14.3.0 the removal of the javascript: and the data: protocol is no longer confined to the <a> href and <object> data attributes, but will be appllied to all attributes. This can be finetuned by setting the omitJavascriptProtocol and omitDataProtocol configuration properties per element.

Configuration

A CKEditor field is configured with an HTML-processor by setting the configuration property htmlprocessor.id. This property can either be specified in the cluster.options node of a field of a specific document-type, or globally (i.e. for all formatted and/or richtext fields). The value of this property should correspond to the name of the HTML-processor configuration node as defined in the HTML-processor module, which is located at:

/hippo:configuration/hippo:modules/htmlprocessor/hippo:moduleconfig

By default, the CMS is bootstrapped with the following HTML-processor configurations:

  1. formatted: contains an allowlist of elements used in Formatted fields.
  2. richtext: contains an allowlist of elements used in Rich Text fields and manages internal links and images.
  3. no-filter: contains an empty allowlist but does manage internal links and images when applied to Rich Text fields.

The configuration node of an HTML-processor is of nodetype hipposys:moduleconfig and has the following properties available:

  • charset: the character set of the output. Defaults to UTF-8.
  • serializer: the type of serializer to use. Valid values are pretty, compact, and simple. Defaults to simple.
  • convertLineEndings: whether to convert CRLF to LF when storing html, and vice-versa when reading HTML. Defaults to true.
  • omitComments: whether to strip comments from the html. Defaults to false.
  • omitJavascriptProtocol: whether javascript statements are removed from the html. Defaults to true.
  • omitDataProtocol: whether the data protocol  should be removed from the html. Defaults to true. Available since 14.3.0
  • filter: whether to apply allowlist filtering. Defaults to true.
  • secureTargetBlankLinks: whether external links that open in a new tab or window should be secured using attribute rel="noopener noreferrer". For more information, see https://web.dev/external-anchors-use-rel-noopener/.
    Defaults to true. 
    Available since 14.7.0.
  • allowStyleElements: whether to allow <style> elements, defaulting to false because of HTML5 specification. For configurations that have filter=true, like the formatted and richtext ones, a subnode "style" needs to be added as well to add it in the filter's allowlist. 
    Available since 15.5.0, to bring back behavior from before 15.3.0, caused by an upgrade of the third party library HtmlCLeaner, see its release notes.

Allowed HTML elements are defined as childnodes and are of nodetype hipposys:moduleconfig. The name of such a node corresponds with the allowed element name. These element nodes may contain a multi-valued property called attributes to list the HTML attributes allowed on the element.

Since Bloomreach Experience Manager 14.3.0 the following two configuration options can be specified per element.

  • omitJavascriptProtocol: whether javascript statements are removed from the html. Defaults to the value of the global setting.
  • omitDataProtocol: whether the data protocol  should be removed from the html. Defaults to the value of the global setting.

The pretty and compact serializers add some whitespace characters to the HTML source in order to make it human readable. This may result in some unwanted spacing when using super or sub scripts. For this reason, the default serializer is simple.

Disable server-side HTML cleaning

Change the configuration property htmlprocessor.id to no-filter.

Configuration in delivery tier 

HTML cleaning is used as well in the delivery tier, notably in the Page Model API and in the default REST API.

Since 15.5.0, there's a configuration option "htmlcleaner.allowStyleElements" in the HST properties file, which defaults to false. Setting it to true will render <style> elements that are part of an HTML field in the content to the API output. See also  the allowStyleElements option above in the htmlprocessor module configuration. 

This will bring back behavior from before 15.3.0, caused by an upgrade of the third party library HtmlCLeaner, see its release notes.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?