Check and Fix Search Index Inconsistencies

Introduction

Goal

Fix inconsistencies between the search index and the content repository.

Use Cases

In Hippo Repository nodes and properties are indexed by Lucene. The index allows efficient query execution. Whenever a change is made to the repository the index is updated. These indexes are not shared between the different nodes of a cluster but are stored on the local filesystem of the cluster node for optimal performance. 

There are circumstances under which the index can get out of sync with the contents of the repository. This can cause problems when queries and searches fail to find content that is in the repository or return hits of content that is no longer in the repository. We have been able to establish that index synchronization problems can especially happen when the repository is not shut down correctly or the corresponding Java process is killed unexpectedly eg: because of Out of memory issue. 

Also a know cause is when too many saves are done too quickly to the repository, then the index can't keep up. That's also why we have the batch size and throttle in the Groovy Updater

Enable the Search Index Consistency Check  

To enable the search index consistency check on repository startup, specify the following parameters in the SearchIndex section of the workspace configuration:

<param name="enableConsistencyCheck" value="true" />
<param name="autoRepair" value="true" />

Verify (and, if needed, add) the following entries to conf/log4j.xml. The provided log levels are the recommended ones for these two categories - adjust them as experience dictates:

<logger name="org.apache.jackrabbit.core.query.lucene.ConsistencyCheck">
  <level value="info"/>
</logger>
<logger name="org.hippoecm.repository.jackrabbit.RepositoryImpl">
  <level value="warn"/>
</logger>

Clustering Support

Efficient, reliable, and complete search index consistency checks and repairs can now be executed during startup of the repository thanks to double checks for false positives. Even in live and busy clustered setups.

Impact and Recommendations

The search index consistency check is disabled by default. Enabling the check causes application startup to take considerably longer. This is the only main disadvantage. Otherwise running the check is safe.
Normally there is no reason to enable the check. We recommend to enabling the check only in cases where an inconsistency in the search index is suspected. Once the check has finished and inconsistencies have been fixed, we recommend disabling the check again.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?