Visitor Service
The Visitor Service
The VisitorService is the service that retrieves visitors from the visitors (targeting:targetingdata) data store , and a returned Visitor object can be written back to the visitors data store via Visitor#save(). Retrieval methods on the VisitorService as well as Visitor#save() are implemented in the VisitorServiceImpl.
During request processing, the retrieval of a visitor is synchronous since the request processing requires the visitor data to be present to serve a personalized page. The persistence of updated visitor data happens asynchronously such that it does not impact the time it takes for serving the HTTP response to the client.
There are a couple of operation parameters that can be tuned in the visitor service configuration. The default bootstrapped configuration is as follows:
/targeting:targeting: /targeting:services: /visitors: maxRetrieveThreads: 10 maxStoreThreads: 10 retrieveQueueSize: 20 retrieveTimeout: 50
In general, these default settings are good working defaults. If for some reason in your production setup you find that many requests result in a retrieveTimeout, you can tune the parameters.
Parameter explanation
maxRetrieveThreads
The maximum number of threads that are used for getting visitor data. To make sure concurrent http (page) requests do not block each other while their visitor data is being fetched, we advice the maxRetrieveThreads not to be set lower than 10. If you set it lower, then during many concurrent http requests, odds are that the retrieve queue gets exhausted and/or that more http requests will be processed without relevance because there is a retrieve timeout while fetching the visitor data.
maxStoreThreads
Maximum number of threads used for concurrently storing. It is unlikely you ever need to adjust this value. Since storing of Visitor data happens in batches (which accumulate http requests in a batch for 1 second before storing or until the batch contains 100 visitor data objects before storing), only if http (page) requests arrive at a higher speed than the batches can be stored, more than 1 thread is used.
retrieveQueueSize
This is the maximum number of Visitor data retrieve jobs that are queued up before the Visitor Service rejects new Visitor data requests (a rejection results in the http (page) request to continue without relevance). When the server gets to process many concurrent http requests, this queue size limit is a protection that not too many requests are being halted until their Visitor data is retrieved. It can best be seen as an automatic scale down of functionality in case the server can't keep up with all the incoming requests: The application in effect then falls back to serving a response for the request without relevance enabled.
If you increase the retrieveQueueSize, at least make sure it is always smaller (say 20% smaller) than the maximum number of concurrent request processing by the container. So if you set the maximum number of request processing by, say Tomcat, to 100, in that case, do not make the retrieveQueueSize higher than 80. Reason is simple: The queue limit is an automatic scale down protection for the application (rather serve pages without relevance than a starving application or container that returns page errors because the acceptCount is exceeded). If the number of concurrent requests that the container allows to be executed is lower than the retrieveQueueSize, then, the retrieveQueueSize will never be completely filled, and the automatic scale down protection of relevance is in effect disabled. See tomcat-8.0-doc for the Tomcat specific document for maxThreads and acceptCount.
retrieveTimeout
The retrieveTimeout specifies the maximum time in milliseconds that the Visitor Service waits for the Visitor data to be returned. If it takes longer, the http (page) request continues without Visitor data. It is a protection for the application to not wait too long in case for example for some reason the database connection has a hiccup : Rather serve a page without relevance than serving an error.
The retrieveTimeout is the most likely parameter to tune (compared to the other parameters) in case of a production scenario where the connection to the backing visitor data store is not so fast or when the retrieveQueueSize is set to a large value and many concurrent requests are in the queue. What needs to be realized is, that the higher you set the value, the longer the application waits for the Visitor data to become available, and thus, the higher the hiccup in serving a page in case the Visitor data retrieval is slow.
JMX monitoring
The visitor service and implementing stores can be monitored using JMX MBeans.
In case the VisitorStat MBean shows a high value for RetrieveVisitorDataTimeoutCounter, it might be interesting to experiment with the retrieveTimeout setting. See the documentation above for some considerations.