Derived Data
Description of the problem
Derived data are properties that are automatically calculated and set on a document during a session save. An example of derived data is the size of some (e.g. binary) property of a node. Such derived data might have to be stored on the node itself.
Since you don't control all the places where such properties might be set, the derived data needs to be calculated and set implicitly to prevent inconsistent data.
Another reason for this functionality is that the query languages available do not allow you to express all types of realistic queries. For example, XPath does not allow you to query for documents that have two properties that are equal to each other. Naively this could be written down as //*[@a=@b] but this yields no results, even though logically there are. Certain other queries are possible but have huge performance impacts. These are deliberate limitations in the query languages XPATH and JCR-SQL, not bugs.
Facility offered
As a solution for expressing efficient queries and for accessing information about the content without having to know and execute the procedure to obtain the data, the content repository has the capability of triggering derived data functions. A derived data function computes properties that derived from other properties of the document. Derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document.
When editing a document which should contain such derived property, you should not set the value of the derived property yourself, instead, the repository will automatically compute the value of the property during save(). Because the repository guarantees to recompute the property upon a save, the data will always be up to date.
In order for the repository to do this, it must be informed when and how to compute the properties.
- "when" is determined by the JCR nodetype of the data. The repository can be configured to compute a property of a certain node type.
- "how" to compute a property is to be implemented by a class implementing the derived data function interface.
Usage
We will outline how to define, configure and use derived data functions based on a simple example to compute Pythagorean theorem.
Defining the data for which to compute properties
We define a document type that is a core shape definition:
[sample:shape] > hippo:document - sample:a (double) - sample:b (double)
And subsequently a definition that can be added as mixin type to the shape definition to indicate the shape is a triangle:
[sample:triangle] > hippo:derived mixin - sample:c (double)
To indicate certain properties of this type sample:triangle are to be computed using the procedure of derived data we must extend from the hippo:derived mixin node type.
Configuring the repository to compute derived properties for this data
Now we need to configure in the repository how to compute the derived property field of sample:triangle. These procedures are defined in the JCR repository under /hippo:configuration/hippo:derivatives. To compute the c property we can enter the following JCR definition
/hippo:configuration: /hippo:derivatives: jcr:primaryType: hipposys:derivativesfolder /pythagorean: jcr:primaryType: hipposys:deriveddefinition hipposys:nodetype: sample:triangle hipposys:classname: sample.PythagoreanTheorem hipposys:serialver: 1 /hippo:accessed: jcr:primaryType: hipposys:propertyreferences /a: jcr:primaryType: hipposys:relativepropertyreference hipposys:relPath: sample:a /b: jcr:primaryType: hipposys:relativepropertyreference hipposys:relPath: sample:b /hippo:derived: jcr:primaryType: hipposys:propertyreferences /c: jcr:primaryType: hipposys:relativepropertyreference hipposys:relPath: sample:c
First, the hipposys:nodetype property defines the nodetype which contains the properties that should be derived. For any change to nodes of this type, this derived data definition indicates the function to be executed.
The hipposys:classname property contains the name of the class that should extend the base class org.hippoecm.repository.ext.DerivedDataFunction. The class PythagoreanTheorem must have a no argument public constructor. The number stated in the hipposys:serialver property should match the serialVersionUID field in the implementing class sample.PythagorieanTheorem. The definitions in hippo:accessed and hippo:derived node structure indicate the input and output parameters to the derived data function. Here we indicate that relative to the node of type sample:triangle there are two input properties: sample:a and sample:b. The hipposys:relPath properties indicate the relative path to the subject node for which the computation takes place. The value of these two properties are entered as keys "a" and "b" (the name of the hipposys:relativepropertyreference nodes) in a Map the compute method implemented by PythagoreanTheorem takes as input:
public Map<String,Value[]> compute(Map<String,Value[]> parameters);
As result the compute method should return a map where under the key " c" the value for the derived property sample:c can be found. The definition also states the (possibly multiple) computed results by the function as nodes under hippo:derived. The hipposys:relPath again indicates the relative path to the property. The hipposys:relPath may indicate any property below the document for which properties are computed. It may not contain references to other documents.
Supplying the method that computes the derived property
The configuration indicates which class should be used to compute the data. This class must extend the org.hippoecm.repository.ext.DerivedDataFunction base class and implement the compute method. Since derived data is a repository function, add this class to the cms module of your project and not the site module.
package sample; import org.hippoecm.repository.ext.DerivedDataFunction; public static class PythagoreanTheorem extends DerivedDataFunction { static final long serialVersionUID = 1; public Map<String,Value[]> compute(Map<String,Value[]> parameters) { double a = parameters.get("a")[0].getDouble(); double b = parameters.get("b")[0].getDouble(); double c = Math.sqrt(a * a + b * b); parameters.put("c", new Value[] { getValueFactory().createValue(c) }); return parameters; } }
This class can be packaged in a normal plug-in. Upon any change the properties will be computed. Current limitations give however one exception, imported data is not recomputed and must be already correct.
Deriving Data From Another Node
As stated above derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document. In some use cases this is not sufficient. Take for example the following typical node structure representing a document:
/document: jcr:primaryType: hippo:handle hippo:name: "Pretty Name" /document: jcr:primaryType: myproject:newsdocument myproject:title: "Pretty Name" hippostd:state: draft
(node and properties not relevant to the example left out)
There is a hippo:handle node document with one myproject:newsdocument child node with the same name, representing the draft variant of the document. In addition the hippo:handle node has a property hippo:name (from the hippo:named mixin) holding the "pretty name" of the document.
The document's pretty name is entered by the user in the new document dialog when creating a document. Suppose you want to store the same pretty name for the myproject:title property of the new document draft so that the user does not have to enter it again. A Derived Data Function would be a convenient way to implement this. However, the pretty name is not stored on the document node or one of its descendants, but rather on a parent node (the handle). A regular hipposys:relativepropertyreference node can't be used. In such a use case you can use a hipposys:resolvepropertyreference node and reference the sibling node's property as ../hippo:name.
/hippo:configuration: /hippo:derivatives: jcr:primaryType: hipposys:derivativesfolder /title: jcr:primaryType: hipposys:deriveddefinition hipposys:nodetype: myproject:newsdocument hipposys:classname: org.example.NewsDocumentTitle hipposys:serialver: 1 /hippo:accessed: jcr:primaryType: hipposys:propertyreferences /message: jcr:primaryType: hipposys:resolvepropertyreference hipposys:relPath: ../hippo:name /hippo:derived: jcr:primaryType: hipposys:propertyreferences /title: jcr:primaryType: hipposys:relativepropertyreference hipposys:relPath: myproject:title
Enforce Generating Multiple-Value Properties
When a derived data function runs for the first time on a document (for example when a new document is saved for the first time), new, derived properties will be written to the document node. The derived data function returns, for each such property, an array of Value instances, even for properties intended to be single-valued.
By default, the derived data engine makes the assumption that the new property is intended to be single, so it creates a single-valued property and it stores only the first Value instance from the returned array.
Enforcing the created property to be multiple has been possible, via the (document) node type definition in the CND. Specifically, the derived properties can be registered along with the 'multiple' modifier to mark them as multiple:
[myproject:mytypewithderiveddata] > hippo:document, hippostd:relaxed - myproject:mymultiplederivedproperty (string) multiple
This approach is not recommended however, especially when using relaxed CNDs, and in general it is advisable to keep node type definitions as simple as possible.
Since 14.2.0, 14.1.1, and 13.4.3, a new approach exists for marking derived properties as multiple. The boolean property hipposys:multivalue is available, which can be used on output (under /hipposys:derived) hipposys:relativepropertyreference nodes:
/myderiveddatafunction: ... /hipposys:accessed: ... /hipposys:derived jcr:primaryType: hipposys:propertyreferences /mymultiplederivedproperty: jcr:primaryType: hipposys:relativepropertyreference hipposys:relPath: myproject:mymultiplederivedproperty hipposys:multivalue: true
The new configuration property is only applicable for new properties created via the derived data function. In other words, adding or changing the multivalue flag, during the lifetime of a project, will not result in changing existing properties on documents (from single to multiple or vice versa). This is also the case when the CND approach is used. In general, changing this behaviour in a running environment (via either approach) can result in inconsistencies in a project's content, where the same derived property is single in some documents and multiple in others.