Repository Maintenance
Removing old revisions from the repository journal
Jackrabbit 2 uses a journal in the database for cluster messaging. This journal can grow quite large over time and it can be useful to clean the old journal entries periodically. How fast the table grows depends on the amount of writes that are done to the repository.
The table REPOSITORY_LOCAL_REVISIONS holds the information up to which journal entry the node in the cluster has caught up. Entries older than the lowest revision in that table can be safely deleted from the REPOSITORY_JOURNAL tables.
Show revisions in the database:
mysql> SELECT * FROM REPOSITORY_LOCAL_REVISIONS; +---------------+-------------+ | JOURNAL_ID | REVISION_ID | +---------------+-------------+ | cms-node1 | 2302839 | | cms-node2 | 2302839 | | site-node1 | 2302839 | | site-node2 | 2302839 | +---------------+-------------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM REPOSITORY_GLOBAL_REVISION; +-------------+ | REVISION_ID | +-------------+ | 2302839 | +-------------+ 1 row in set (0.00 sec)
The entries in the JOURNAL_ID correspond to the jackrabbit cluster id of the repository nodes.
There are two ways to periodically clean the old entries: using sql or let jackrabbit handle the cleanup.
Remove old revisions using SQL
If you want to safely remove old entries from the journal table you can run the following query:
MySQL REPOSITORY_JOURNAL table cleanup:
mysql> DELETE FROM REPOSITORY_JOURNAL WHERE REVISION_ID < ANY (SELECT min(REVISION_ID) FROM REPOSITORY_LOCAL_REVISIONS); Query OK, 0 rows affected (0.00 sec) mysql> OPTIMIZE TABLE REPOSITORY_JOURNAL; +-----------------------------+----------+----------+----------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------------------+----------+----------+----------------------------+ | hippocms.REPOSITORY_JOURNAL | optimize | note | Table does not support | | | | | optimize, doing recreate + | | | | | analyze instead | | hippocms.REPOSITORY_JOURNAL | optimize | status | OK | +-----------------------------+----------+----------+----------------------------+ 2 rows in set (0.03 sec)
Caveats
Adding a node to the cluster
When a new node is added to the cluster it will try to read the entire REPOSITORY_JOURNAL table. If the table is very large this can lead to out of memory errors on startup. To safely add a new node to a cluster it is advisable to clean up the old revisions first.
Removing a node from the cluster
When a cluster node is removed from the cluster, the entry in the REPOSITORY_LOCAL_REVISIONS is not removed for that node. This has to be done manually if you want to safely remove the old revisions:
DELETE FROM REPOSITORY_LOCAL_REVISIONS WHERE JOURNAL_ID NOT IN (%s) AND JOURNAL_ID NOT LIKE '_HIPPO_EXTERNAL%%'
In the query above, replace ā%sā with the array of active JCR cluster node IDs (comma separated).
Lucene index export
Running the above query to remove old revisions will invalidate all but the latest Lucene index export files. The latest export will remain valid due to a dedicated entry '_HIPPO_EXTERNAL_REPO_SYNC_index-backup' in the local revisions table.