Check and Fix Repository Inconsistencies
Introduction
Goal
Fix inconsistencies between the database backing the content repository and the JCR model.
Background
Bloomreach Experience Manager is backed by a JCR repository based on Apache Jackrabbit. Under certain conditions the database backing the repository may become inconsistent with the JCR model. In some cases this may lead to errors in the CMS or site and can even lead to the repository being unable to start up. To analyse and automatically fix a database, a consistency check may be run on the database.
Stand-Alone Consistency Checker vs. Embedded Consistency Checker
A consistency check should be run using the stand-alone Checker tool. While Apache Jackrabbit provides built-in support for running a consistency check during starting up of the repository, using our standalone checker will give you more control.
Running the Checker in Check Mode
Download and configure the Checker tool.
To run the checker in check-mode, run:
java -jar hippo-addon-checker-<version>.jar check
This will check the consistency of the entire workspace or set of workspaces you have configured in the checker.properties file as well as the consistency of the version history if you have set the property check.history. Referential integrity will also be checked for all these workspaces if you set the property check.references.
It may be preferential to initially only do an integrity check on the default workspace and separately check the version history and the referential integrity if that is at all needed. The default workspace consistency check is the most important check to run. It is also the fastest. Version history can become quite large compared to the workspace and referential integrity checking requires loading of both the workspace and the version history at once because workspaces may have references to nodes in the version history. No referential integrity check is done for the nodes in the version history itself because the referential integrity requirement is lifted for such nodes by the JCR specification.
You may optionally check the consistency of individual nodes by supplying any number of uuids after the check command:
java -jar hippo-addon-checker-<version>.jar check [UUID1 UUID2 UUIDn] [--recursive]
When supplying UUIDs like this you may also specify the option --recursive to let the checker recursively check all the descendants of the specified nodes as well.
The same applies when running the checker in fix mode.
Running the Checker in Fix Mode
Before running the checker in fix mode against a database that already backs one or more running repository instances, consider the following precaution. Fixing inconsistencies in a live cluster is supported from version 1.02.00 of the checker tool onwards. If you use an older version you must first bring down the entire cluster before running a consistency fix.
For fixing a database in a live cluster you need to configure clustering for both the checker and the other repository instances backed by the same database. The sample checker-repository.xml file created per the generic configuration instructions contains an example cluster configuration that should work. The cluster configuration is needed in order for the checker to notify the other repository instances of the changes it makes during repairs. If other repository instances are not made aware of these, the items they have in their local caches will not be updated and this will cause the repository instance to overwrite changes made by the checker.
You can now run the consistency checker in repair mode:
java -jar hippo-addon-checker-<version>.jar fix [UUID1 UUID2 UUIDn] [--recursive]
To find out what causes data corruption in Jackrabbit and learn the details of the different types of inconsistencies that may occur, read on.
The Bundle Table
Each individual workspace and the version history store their nodes in a separate bundle table. This is a very simple table with a two column layout: a column for the node id and a blob column for the bundle data. The bundle data contains such things as property names and values, type information, and structural information like the parent id and the child node ids. Information about the structure of the repository is therefore stored twice: a parent bundle stores ids to other nodes as its children. Children store parent ids to nodes as its parent. This is the reason why inconsistency of the database can occur: child node entries of one bundle may become inconsistent with parent node ids in other bundles.
Orphaned Nodes
Orphans are those nodes whose parent no longer exists in the bundle table. For instance the parent was removed but the remove operation was not completed and a child node is still in the database. The checker is able to move orphans to a dedicated folder if so instructed. To do so you need to create such a lost+found node beforehand and specify its uuid in the checker.properties. There should be a lost+found node for each separate workspace and it must be of type nt:unstructured. No mixins on this node are necessary. The property you then need to set is: check.default.lostnfound=uuid, where default is actually the value of property check.workspaces. Errors of this type are reported by the checker with the message " NodeState '{nodeId}' references inexistent parent id '{parentNodeId}'".
Abandoned Nodes
These are nodes that have a correct reference to a parent node, i.e. the parent id of the node references an existing parent, but the parent does not have a corresponding child node entry. We can imagine this to be the result of an incomplete remove operation. The child node entry was successfully removed from the parent, but the node itself was never deleted from the table. In these cases, the fix that the checker performs is to put back a child node entry on the parent. Because the name of a node is not part of the bundle information of the node itself it was lost when the child node entry was removed and is no longer available. Therefore a node name is generated for that node. Errors of this type are reported by the checker with the message "NodeState '{nodeState}' is not referenced by its parent node '{parentNodeId}'".
Missing Nodes
The inverse of orphaned nodes is when instead of an inexistent parent we have an inexistent child: a node has a child node entry to a node that no longer exists. As with abandoned node we can imagine this to also be the result of an incomplete remove operation. There may be other causes though. The checker simply removes the child node entry from the node. Errors of this type are reported by the checker with the message " NodeState '{nodeId}' references inexistent child '{childNodeId}'".
Disconnected Nodes
The final category is when a node has a child node that is not missing but that child does not refer back to that node. The case may be that this child node is an orphan as well or it may be that it has an existing parent that may or may not refer back to the child. This type of inconsistency may be caused by an incomplete move operation where the child and new parent were updated correctly but the old parent was not. The fix the checker performs for these nodes is the same as for missing nodes. It removes the child node entry from the parent.
To complete the analysis, orphans and missing nodes may be instances of second order corruption. Missing nodes may have first become disconnected from a parent node and afterwards deleted. Orphaned nodes may first have been abandoned before the parent node was deleted.
The types of inconsistency that cause the most immediate problems to the repository are missing and disconnected nodes. This is because the repository will fail in its traversal of the (former) parent in these cases. Errors of this type are reported by the checker with the message " Node has invalid parent id: '{parentNodeId}' (instead of '{nodeId}')".
Manual Inspection After Fix
As mentioned, orphaned nodes are moved to the lost and found folder. If the checker reported and fixed any orphaned nodes you can now inspect them by browsing to that node using the console. Similarly, abandoned nodes are reattached to their parent node, but they are given a random name. In some cases this can cause problems, for instance because the reattached node is a handle or document node which need corresponding names. Such fixed abandoned nodes should be inspected after running the checker and adjusted accordingly.