Experiments and Trends dataflow
Dataflow
The Experiments and Trends functionality is based on the concept of a visit. Requests for a visitor are aggregated and when a period of inactivity is detected (30 minutes by default), the visit is considered complete. A new request will cause a new visit to be created. For Experiments, the completed visits are used to determine whether the visitor converted the goal of one or more running experiments. The outcomes are used to update the statistical models of the experiments, adjusting the rate at which their variants are served to visitors. For Trends, all visits are stored in Elasticsearch and the Trends panel allows cms users to mine them for insights in visitor behavior.
Configuration
The Relevance Module periodically runs two jobs, the Visits Aggregator and the Model Trainer. The Visits Aggregator collects all newly arrived request log entries in the Request Log Store and creates or updates the corresponding visit records in the Visit Store. The Visit Store is always in Elasticsearch. The aggregated Visits can be queried via trends. If experiments are present then the Visits are also used to train the experiment's statistical model. The ModelTrainer job looks for Visits that have not been updated for a while and uses them to train the statistical "bandit" models of any running Experiments.
If you only want to see trends and will never setup experiments, then the model trainer job does not have to run. However, since the overhead of running the model trainer in absence of experiments is insignificant it is best to keep it running.
Parameters on /targeting:targeting:
Parameter | Default | Description |
---|---|---|
targeting:newVisitIdleTimeMinutes | 30.0 (Double) | Maximum period of inactivity in a visit, in minutes. |
The Visits Aggregator and Model Trainer jobs can be configured at /targeting:targeting/targeting:dataflow/visitsAggregator and /targeting:targeting/targeting:dataflow/modelTrainer, respectively. Possible parameters are:
Parameter | Default | Description |
---|---|---|
running | false (Boolean) | Whether the job should run. (See Disable Relevance how to switch it of on a single cluster node) |
processedUntil | - |
Updated by the job itself to keep track of where to continue on the next run. |
The Model Trainer and Visits Aggregator should not be set to 'running' before the data stores have been properly configured. The 'processedUntil' is updated automatically and never requires manual intervention except if for some reason request aggregation needs to be redone.