Clean Up Version History

Introduction

Goal

Clean up version histories from the content repository.

Use Cases

Hippo Repository maintains a history of versions of all documents in Bloomreach Experience Manager. Each time a document is published, a copy of the current state of the document is stored as a new version. While this feature enables users to restore any previously published version of their document, it comes at the cost of an ever increasing size of the version history storage.

BloomReach offers tools to purge the version history in order to maintain it to a manageable size:

  • To remove orphaned version histories - collections of versions of which the original document has been deleted - use the version history cleanup feature built into the Checker tool.
  • To truncate the version history of existing documents use our special-purpose versions cleaner updater script.

Truncate Version Histories of Existing Documents using the Groovy Script

Warning: the VersionsCleaner script always removes versions, even in "dry run" mode!
This is a result of the VersionManager specification in the JCR 2.0 API, which specifies that changes are dispatched immediately without the need to save the session.

This tool can be used to truncate the version histories of existing documents. It works by visiting all versionable nodes, inspecting their version histories, and removing those versions that qualify.

  1. In the CMS UI browse to the updater execution panel.
  2. In the registry folder, locate the VersionsCleaner script and open it.
  3. Adjust the value of the retainCount and  daysToKeep variables inside the script. The former determines how many versions of each document are retained and the latter determines the minimum number of days to keep the versions in history.
  4. Save the changes.
  5. Click Execute to run the script.

Remove Orphaned Version Histories using the Checker Tool

When a node is deleted from the repository its version history loses its regular JCR access point: the versionable node it is the history of. Note that when you delete a document in the regular way using the CMS UI, this does not happen. The node representing the document is only moved out of the way into an attic, in order for it to be brought back in case of a mistake. But when the attic is cleaned out, the version history of the removed nodes become orphaned. In order to find and remove these orphaned version histories, the version storage must be traversed. For that purpose an orphaned version history cleanup feature was added to the Checker tool.

  1. Download and configure the Checker Tool.
  2. In addition to the generic configuration, specify the property cleanvh.nonempty=true/false in the checker.properties file according to your preference. This property specifies whether to clean up orphaned version histories that are non-empty or only empty ones.
  3. Run the cleanup:
    java -jar hippo-addon-checker-<version>.jar cleanvh

The removal of some of the found orphaned version histories might fail due to broken references in the reference table. In that case you need to run a consistency check and fix with the option check.references set to true first before you can clean all orphaned version histories. 

This must also be done after running a version history cleanup. The version history cleanup has removed version histories in the version history table but references between the nodes contained in the version histories and references between version histories are not yet removed from the reference table.

The InnoDB engine of MySQL never shrinks its files. You need to perform an optimise table operation after running the cleanup tools.
Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?