Varun Wachaspati J

Feb 1, 2019

How Bloomreach Built a Scalable Configuration Management System

Written by Amit Kumar, Naveen Pai, Neel Choudhury, Vaibhav Rastogi, Varun Wachaspati J.

 

Motivation

Bloomreach offers the next generation digital experience platform (DXP) to big and small businesses in eCommerce, some of the products include personalized and learning site-search solution, automated SEO solution, merchandising analytics solution, etc. across the globe.

In eCommerce, a holiday season brings more business opportunities than in a whole year. And any eCommerce technology organization wants it to be seamless, zero-error and scalable. Failure in any desired change of behavior can cost huge. Previously, Bloomreach went through few outage like situations due to lack of a system which could facilitate change in behavior of production environment with ease, consistency and scale. Imagine, there are thousands of rules and hundreds of customers who need the changes effected quickly. For these changes to be effected, serving layer of Bloomreach platform and backend pipelines needed a robust Configuration Management System.

 

Previously

The requirement for a stable cross-org configuration management system was evident very early on, and we built a system, based on MarkLogic, which performed the role of a config store. However, as Bloomreach grew, we began to outgrow the system, and the config store ended up being a bottleneck for many of our use cases. As it was proprietary software, extending MarkLogic for our use cases was not feasible. We needed a robust Configuration Management (CM) System that could manage the complexity associated with storing thousands of configurations while maintaining scalability of read/search/writes and providing features like revision history of writes.

To meet these goals, we first explored available open source technologies. There are multiple open source systems that meet the goals, albeit in pieces. Key-value stores like Redis or NoSQL databases like Cassandra, or document store like MongoDB are good for reading configurations at very low latency. Git or SVN are good for maintaining revision history. Open source libraries like Archaius or Typesafe Config provide support for reading configurations from files stored on disk or from database or by hitting REST APIs. But none of these systems and libraries provide a complete and comprehensive solution. So we decided to use these building blocks to build a best-of-all-worlds config management system.

Breaking it Down

So what does best-of-all-worlds mean to us? To put it simply, it means a system which fulfils the following requirements:

High Availability Read/Search at Scale:

The Config management system is a critical dependency for both the serving infrastructure as well as the backend pipelines at Bloomreach. Since production systems read configurations at runtime, it is crucial to have a system which has high availability (> 99.95%)

Low Latency Read/Search:  

For Read/Search operations,  the main requirement was speed. We wanted sub-50 ms read and  sub-100 ms search to ensure that the Config management system would not be the bottleneck for any system depending on it.

Consistent Writes/Updates:

A particular configuration value can be modified by multiple users. So for writes, consistency and atomicity were the primary requirements.

Revision History and Restore:

Granular per-config level history is an amazing tool for debugging and record keeping. Also, the ability to revert configs to a previous state allows us much more flexibility.

Read/Write Flexibility:

Many of the configs we use internally follow a nested structure, so we wanted the flexibility to read and write configs at any nesting level. Basically to allow reads/updates of a single config value, or of a set of values defined by a JSONPath or XPath.
For example, both the following API requests were valid:

GET /cms/merchant1/service/api_params/api_response_type

{"api_response_type" : "json"}

GET /cms/merchant1/service

{
 "api_params": {
  "api_response_type": "json",
  "api_host": "us - east"
 },
 "config_params": {
  "sort": "default"
 }
}

 

Architecture and Design

Technologies Used

We decided to use a set of tried and battle-tested components as building blocks for the system. JSON key-value pairs backed by a defined schema to store all the configuration strings. The key is a string while the value is a JSON document. For the system infrastructure, we used:

• Redis as a key/value store, with values stored as JSON documents. We built a wrapper on top of base Redis, which can read a key at a given path (XPath or JSONPath) and can search inside a JSON document, or among several JSON documents.

• The ever so stable Git to store configurations in the filesystem. The JSON documents structure is mimicked by nested files and folders.

• Play Framework for the API Servers. Play Framework provides a very simple interface for developing REST APIs and is based on Netty that meets the high scalability requirement. Since we already have production servers running on Play, the framework was a natural choice for this system as well.

 

Putting it All Together

Each CMS Slave consists of an API Server (AS) module and a Redis Slave (RS) sitting behind a load-balancer. The Redis slaves are read-only slaves while the API server module is wrapper which processes the API request and communicates with the Redis slave.

The CMS Master consists of Redis Master and Git Server modules. It exposes an API to CMS slaves to accept modification on JSON documents.

The CMS master is responsible for maintaining the consistency of the system, by synchronizing changes made to both git and redis master, which then flows to the redis slaves. We use the Akka based ActorSystem to synchronize the updates.

On the redis-master, we use redis-pipelining to speed up the query execution and redis-transaction (using the Jedis Client) to make sure that either all the key-value pairs in redis are updated successfully or none of them are updated, thus fulfilling our requirement for atomicity.

How it works

Writes:
As shown below, the API-server validates the params and data of the write call and forwards the request to the CMS master. We store JSON values in a Git repository by mimicking JSON structure with nested folders. Suppose for key “K”, we have the following JSON document as value:

{
 "a": {
  "b": [{
    "c": "v1"
   },
   {
    "c": "v2"
   }
  ],
  "d": "val"
 }
}

This would be represented on the filesystem as:

K 
|---a/
|   |
|   |---b/
|   |   |---b0/
|   |   |     |---{"c": "v1"}
|   |   |
|   |   |
|   |   |---b1/
|   |   |     |---{"c": "v2"}
|   |   | 
|   |
|   |---{ "d" : "val" }

For the same document, the following keys would be populated into the redis master, and propagated to the redis slaves to accomplish Read/Write Flexibility:

"K" : {"a": {"b" : [{"c": "v1"}, {"c": "v2"}]}}

"K/a" : {"b" : [{"c": "v1"}, {"c": "v2"}]}

"K/a/b" : [{"c": "v1"}, {"c": "v2"}]

Reads:
Given that the write API takes care of most of the heavy lifting, this allows Reads to be fairly straightforward. The API server reads the JSON document for a given key from the Redis-slave. If a key is not directly present in Redis then its value for the parent is extracted and the JSONPath library is later used to get the value for the desired key.

For example, using the same schema as above, the following API requests are possible:

GET /cms/merchant1/K/a :
For this request, Redis will directly return the value of the key and the API server will forward it back as the response.

GET /cms/merchant1/K/a/b[0]/c :
This key is not present in Redis,so the API-server reads value for “K/a/b” then it applies the JSONPath library to extract the value for exact key.

Search:
The API server also provides search functionality with which documents that satisfy a given condition can be selected.
The search API call supports two parameters:

  1. Condition:
    It is a conjunction and disjunction of multiple expressions like {k1 : v1}  AND {k2 : v2} OR .... {kn : vn}
  2. Fields:
    Represented as fields={f1, f2 ... , fn}. Here f1, f2, … fn are the fields/keys to be projected.

 

{
    "k1": {
        "f1": 1,
        "k2": 10
    }
} 

 

{
    "k1": {
        "f1": 2,
        "k2": 10
    }
}

 

{
    "k1": {
        "f1": 3,
        "k2": 25
    }
}

 

For this, the API server has a custom-built search library. E.g. Consider the JSON documents shown above.
To search for values of “f1” when the value of field “k2” == 10, the search API query would be:

GET /cms/api/search?condition=k1/k2:10&fields=f1

{
  result: [
           {“f1”: 1},
           {“f1”: 2}
          ]
}

Versioning:
Since we use directory-based nesting of data within the CMS git repository, versioning data for parent key is equivalent to checking git log at directory path of the parent key. When a key is added/updated, the CMS master updates the Git repository by changing the necessary files or directories for a given key.

We use the standard JGit library to do this. This structure also allows us to view historic view for a config, and allows us to revert to a given version of a config or even a set of configs.

Similarly, we can also promote config changes, ranging from single config to a full promotion from one realm to another, since this simply involves copying files and directories from one location to another. This has become a mainstay of our release process.

Conclusion

Using well-known, battle-tested building blocks like Redis, Git and Play Framework, we built a highly available, scalable, low latency configuration management system. The system has been running in production for quite sometime now and has taken the responsibility of configuration storage away from each component of Bloomreach, while also providing a single source of truth for all configurations. This has allowed us to iterate at a much faster pace, with faith that every change made is being validated, tracked and reversible, if required.

We are looking forward to open-sourcing our CM System soon. Follow our Github for updates.