Write an Updater Script
On this page
Introduction
Goal
Write a Groovy Updater Script to perform bulk changes to repository content.
Background
In order to perform bulk changes to existing content in a running repository, developers have the option to write updater scripts in the Groovy language. Updater scripts have access to the full JCR API.
With Great Power Comes Great Responsibility
Updater scripts can modify large parts of your repository. Use them with care.Security
The scripts are executed via a custom Groovy ClassLoader which protects against obvious and trivial mistakes and misuse (for example invoking System.exit()). However this is not intended to provide a fully protected Groovy sandbox. This means that technically Groovy Updater scripts can be used to execute external programs, possibly compromising the server environment.Therefore protection against incorrect usage of Groovy updater scripts must be enforced by limiting the access and usage to trusted developers and administrators only.
Create a New Script
Log into the CMS as admin.
Browse to Setup, then System, select Updater Editor, and click on the New button.
Enter a Name for the script.
All other options are execution options, see Run an Updater Script for more information.
Implement NodeUpdateVisitor
Updater scripts are written in Groovy and must implement the interface NodeUpdateVisitor:
Most scripts will extend the base class BaseNodeUpdateVisitor, which provides a logger and default (no-op) implementations of the methods initialize and destroy.
The updater engine uses the visitor pattern. For each visited node, the updater engine will call the script method doUpdate. When the script modifies the node in any way, it should notify the updater engine by returning true from that method.
The default updater script only logs the paths of all visited nodes:
The node parameter is a javax.jcr.Node object with which to gain full JCR access to the repository.
See example 1 (Add a property) at Groovy Updater Scripts Examples for a basic implementation.
Implement Optional Features
Parameters
If your updater script can be reused multiple times without modification of the source, it is useful to set parameters and let your script read the parameters instead of using hard-coded values.
Parameters can be specified in the execution options as a valid JSON string which defines a map of parameter name (String) and parameter value (Object) pairs.
In your script, you may access the parameters by using the parametersMap variable. For example, if you set Parameters to { "basePath": "/content/documents/myproject/news", "tag" : "gogreen" }, then you can access those parameters anywhere (e.g, in #initialize(Session) or #doUpdate(Node) method) in your updater script as follows:
Undo
An updater script can support easy undo of its modifications by implementing the undoUpdate method. That method should revert a node back to the state before doUpdate was called.
Example 1 (Add a property) at Groovy Updater Scripts Examples implements undoUpdate.
Custom Node Visiting Logic
Typically, the nodes visited by the script are specified (in the execution options) by either an XPath query or a repository path. Alternatively, an updater script can provide the logic for navigating one or more nodes to visit, by implementing (overriding) the following two methods provided by the BaseNodeUpdateVisitor base class of the UpdaterTemplate script:
A contrived example usage (visiting all nodes of type hippo:document, e.g. similar to just specifying a XPath query: //element(*, hippo:document) is:
The difference with using a Repository path or XPath query based Updater is that those will first query/iterate through all nodes to be visited before calling the script method doUpdate(Node) method, while (in the above example) that method will be invoked during the query iteration. Which may be (in some use-cases) more efficient. In addition, this way a long running updater script can be cancelled during the query iteration and the node update process, whereas otherwise this only is possible during the node update process.
A different, not advisable, approach sometimes used is with an XPath query to select the rep:root node and implement all custom processing within the (single) doUpdate method call. Which works but cannot be cancelled!
Override Default Behavior
There are two boolean function methods provided in the BaseNodeUpdateVisitor which sometimes might be worthwhile to override the default behavior:
skipCheckoutNodes(): by default (returning false) before visiting a node through the doUpdate method, it will be checked out if necessary to ensure updating the node actually is allowed. If however the updater script only is used for querying and reporting, or performing updates unrelated to versionable content, then unnecessarily checking out nodes can cause substantial overhead. In that case, this method can be modified (overridden) to return true instead.
logSkippedNodePaths(): by default (returning true) all visited node paths for which the doUpdate method returned false are (also) logged as a separate audit trail in the repository. If this is a substantional number of nodes skipped and the audit trail is not needed, this method can be modified (overridden) to return false instead.
Manually Report Updated/Skipped/Failed Nodes
The updater engine automatically records the updated, skipped or failed count on every invocation on #doUpdate(Node) method by default. So, if each unit task of the update process in your updater script matches with each node iteration based on either path or query configuration, this automatic recording and batch processing by the updater engine should be good enough.
However, if your updater script doesn't match with the node iteration based on either path or query configuration but it makes a query and iterates nodes manually, then the generated report would not reflect what the updater script really executed. Such a script can't take advantage of using 'Dry run' option, and its execution is not controlled by the batch processing of the updater engine with the batch size configuration, either. Even worse, it may cause an impactful system overhead (e.g, consuming too much memory) due to uncontrolled batch updates.
To address the potential problem mentioned above, an updater script may report the updated/skipped/failed nodes manually by using visitorContext variable (type of org.onehippo.repository.update.NodeUpdateVisitorContext).
Here's an example using visitorContext to report the updated news document count after changing a field in a manual node iteration:
In the example shown above, it invokes visitorContext.reportUpdated(path) method after setting "demosite:date" property. And so, the updater engine can be aware of how many nodes were updated and do the batch processing (either save or discard session) properly based on the batch size configuration.
Remarks
Default Imports
By default all of the main JCR API packages are already imported by the script classloader: javax.jcr, javax.jcr.nodetype, javax.jcr.security, and javax.jcr.version. You should not have to import package members explicitly anymore.
Restrictions
Some basic restrictions apply to the calls you can make and the classes you can use from your script. Interaction with the local filesystem has been disabled, the following classes cannot be used: java.io.File, java.io.FileDescriptor, java.io.FileInputStream, java.io.FileInputStream, java.io.FileOutputStream, java.io.FileWriter, java.io.FileReader, along with the following packages: java.nio.file, java.net, javax.net, javax.net.ssl. It is also not possible to use reflection, calling Class.forName is illegal and you can't use the package java.lang.reflect. Calling System.exit is also prevented.
There can be additional limitations with respect to the accessible classpath when automatically executing an updater script at startup (see Run an Updater Script), depending on in which environment it is executed.
In a delivery-tier-only environment, only the functionality provided by the Hippo Repository might be available on the classpath.
Portability
The scripts, when executed from within the Updater Editor, are using a classloader in the CMS application context. Therefore, all libraries packaged with your CMS application are available to use by your script. If, however, you wish to develop scripts that can be reused in multiple projects you should take care not to use libraries that are only packaged with that project. The safest bet would be to only use libraries and APIs that are available in the shared class loader only but availability of libraries such as commons-collections and guava can be depended on with some confidence as well.
Furthermore, for automatically executed scripts during startup (see Run an Updater Script) possibly only classes in the Repository context might be available in a delivery-tier only environment.