Repository Assets Performance tuning

Do not allow very large assets

If you are using MySQL, the max_allowed_packet variable of your database server may be set to a larger value, to allow for larger assets, see  increase packet size. However this will reduce performance.

Storing pdf assets 

When uploading a pdf via the CMS, on the  hippo:resource node in the repository containing the PDF binary, we  also store the extracted text of the PDF in the binary property hippo:text. The reason for this is that extracting the text for (lucene) indexing from a pdf with the help of Apache Tika is very cpu intensive since the pdf needs to be executed. Hippo Repository however only extracts the text from a pdf if there is not already a hippo:text binary. If this latter is available, the text in that property is used for indexing. The advantages of storing a hippo:text binary are:

  1. Only one repository cluster node needs to do the expensive pdf text extraction : The other cluster nodes use the extracted text in the hippo:text binary.
  2. Reindexing pdf assets is much cheaper and faster

If you upload pdf files yourself without the CMS UI interface but for example via some importer tool or rest endpoint, for best performance, make sure to also store the extracted text in a hippo:text property.

 

 

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?