Experiments and Trends dataflow

Dataflow

The Experiments and Trends functionality is based on the concept of a visit.  Requests for a visitor are aggregated and when a period of inactivity is detected (30 minutes by default), the visit is considered complete. A new request will cause a new visit to be created. For Experiments, the completed visits are used to determine whether the visitor converted the goal of one or more running experiments.  The outcomes are used to update the statistical  models of the experiments, adjusting the rate at which their variants are served to visitors. For Trends, all visits are stored in Elasticsearch and the Trends panel allows cms users to mine them for insights in visitor behavior.

Configuration

The Relevance Module periodically runs two jobs, the Visits Aggregator and the Model Trainer. The Visits Aggregator collects all newly arrived request log entries in the Request Log Store and creates or updates the corresponding visit records in the Visit Store. The Visit Store is always in Elasticsearch. The aggregated Visits can be queried via trends. If experiments are present then the Visits are also used to train the experiment's statistical model. The ModelTrainer job looks for Visits that have not been updated for a while and uses them to train the statistical "bandit" models of any running Experiments.

 If you only want to see trends and will never setup experiments, then the model trainer job does not have to run. However, since the overhead of running the model trainer in absence of experiments is insignificant it is best to keep it running.

Parameters on /targeting:targeting:

Parameter Default Description
targeting:newVisitIdleTimeMinutes 30.0 (Double) Maximum period of inactivity in a visit, in minutes.

The Visits Aggregator and Model Trainer jobs can be configured at /targeting:targeting/targeting:dataflow/visitsAggregator and /targeting:targeting/targeting:dataflow/modelTrainer, respectively. Possible parameters are:

Parameter Default Description
running false (Boolean) Whether the job should run. (See Disable Relevance how to switch it of on a single cluster node)
processedUntil

Updated by the job itself to keep track of where to continue on the next run. 

The Model Trainer and Visits Aggregator should not be set to 'running' before the data stores have been properly configured. The 'processedUntil' is updated automatically and never requires manual intervention except if for some reason request aggregation needs to be redone.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?