Paul Edwards

Jun 15, 2021

Natural Language Price Range Queries on the Bloomreach Intelligent Index using Duckling

Here we will dive into how to write a query pre-processor using Duckling by Wit.ai / Facebook in order to enable natural langue price range queries.

The idea...

Have you ever woken up thinking: Wouldn’t it be really great if you could integrate the Bloomreach Discovery engine into your website such that your customers could make price based searches using natural language? I did. 

For example, being able to say: “Jeans under 50 Pounds” or, “t-shirts between 20 and 30 pounds” sounds awesome right?

Note: This guide takes the form of a Nodejs Query pre-processor that you can build yourself - on-top of Bloomreach and Duckling.  

How can Bloomreach Enable this?

As you may already be aware, the Bloomreach search API supports range based queries - it’s quite simple to specify that an attribute should have a value of something between x and y using the following query parameter:

&fq=<attribute>:[<over> TO <under>]

In other words, if we wanted to say that a pair of Jeans should cost between 20 and thirty pounds, we would:

&fq=sale_price:[20 TO 30]

The real trick here is in working out that the query includes range information and which attribute to apply that range to.

Enter the Duckling

Duckling is a Haskell library by Wit.ai / Facebook that parses text into structured data. It very conveniently comes pre-packaged with a web server that returns a JSON array. Let’s have a quick look at the output for the query “chair over 100 dollars”:

[
 {
   "body": "over 100 dollars",
   "start": 6,
   "value": { "from": { "value": 100, "unit": "$" }, "type": "interval" },
   "end": 22,
   "dim": "amount-of-money",
   "latent": false
 }
]

As we can already see, this output is super useful. It tells us:

• There is an amount of money in the query

• The phrase that represented the money interval - in this case “over 100 dollars”

• The lower bound of the interval is 100 (from.value = 100)

• The unit is dollars from.unit= “$”

Armed with this information, it’s simple now to modify the query being fired into the Bloomreach engine to implement a facet. Ideally, the resultant query should look like:

&q=chair&fq=sale_price:[100 TO *]

Ok Let’s get coding...

Creating a Duckling Module

You can find out how to get the duckling server up and running by installing the project - it’s available here. Once you’ve got it up and running, you are going to want to call its API - assuming we are using a node.js environment (as I always do), by far the easiest way is just to follow the instructions and do something like this:

 runDuckling = (thisContext) => {
 return new promise(function (resolve, reject) {
   try {
     global.console.log(
       "duckling: we are searching for: " + thisContext.thisInput
     );

     let receivedData = "";
     let thisProcess = spawn("/bin/sh", [
       "-c",
       "curl -XPOST http://127.0.0.1:8000/parse --data 'locale=en_GB&text=" +
         thisContext.thisInput +
         "'",
     ]);
 

     thisProcess.stdout.on("data", (data) => {
       receivedData += data;
     });
 
     thisProcess.stderr.on("data", (data) => {
       console.log("got data:" + data);
     });

     thisProcess.on("exit", (code, signal) => {
       receivedData = JSON.parse(receivedData);
       thisContext.duckling = receivedData;
       processDucklingElements(thisContext).then((thisContext) => {
         resolve(thisContext);
       });
     });
   } catch (err) {
     console.log("duckling error: " + err);
     reject(err);
   }
 });
};

Essentially, we are telling nodejs to spawn a process and curl the server with whatever our query is - then listen to the output of the process until it is finished. It’s not pretty but it gets the job done (and importantly, it solves it quickly)... 

As you can see above, thisProcess.on(exit) is called when the spawn finishes. It then takes the output and generates filter parameters based on detecting money in a function called processDucklingElements:

processDucklingElements = (thisContext) => {
 return new promise(function (resolve, reject) {
   try {
     for (i = 0; i < thisContext.duckling.length; i++) {
       let thisDucklingElement = thisContext.duckling[i];
       if (
         thisDucklingElement.value.type == "interval" &&
         thisDucklingElement.dim == "amount-of-money"
       ) {
         let unitsUnknown = false;
         if (thisDucklingElement.value.from) {
           if (thisDucklingElement.value.from.unit == "unknown") {
             unitsUnknown = true;
           }
         }
 
         if (thisDucklingElement.value.to) {
           if (thisDucklingElement.value.to.unit == "unknown") {
             unitsUnknown = true;
           }
         } 

         if (unitsUnknown == false) {
           //we are searching for a sale-price range.
           let under = "*";
           if (thisDucklingElement.value.to) {
             under = thisDucklingElement.value.to.value;
           }

           let over = "*";
           if (thisDucklingElement.value.from) {
             over = thisDucklingElement.value.from.value;
           }
 
           let thisSearchParam =
             "fq=sale_price:[" + over + " TO " + under + "]";
           thisContext.queryParameters.push(thisSearchParam);
           thisContext.searchText = thisContext.searchText.replace(thisDucklingElement.body, "");
         }
       }
     }

     resolve(thisContext);
   } catch (err) {
     console.log("duckling element processing error: " + err);
     reject(err);
   }
 });
};

 This function looks to make sure we are definitely talking about money and if so, it builds out the query parameter and stores it inside thisContext. Importantly - it then removes the range based query parameter from the search query so it doesn’t introduce any noise into the actual search query.

The Full Duckling Module:

For interest, I’ve included the full Duckling module below:

const util = require("util"),
 https = require("https"),
 promise = require("promise"),
 { spawn } = require("child_process"); 

runDuckling = (thisContext) => {
 return new promise(function (resolve, reject) {
   try {
     global.console.log(
       "duckling: we are searching for: " + thisContext.thisInput
     );

     let receivedData = "";
     let thisProcess = spawn("/bin/sh", [
       "-c",
       "curl -XPOST http://127.0.0.1:8000/parse --data 'locale=en_GB&text=" +
         thisContext.thisInput +
         "'",
     ]);

     thisProcess.stdout.on("data", (data) => {
       receivedData += data;
     });

     thisProcess.stderr.on("data", (data) => {
       console.log("got data:" + data);
     }); 

     thisProcess.on("exit", (code, signal) => {
       receivedData = JSON.parse(receivedData);
       thisContext.duckling = receivedData;
       processDucklingElements(thisContext).then((thisContext) => {
         resolve(thisContext);
       });
     });
   } catch (err) {
     console.log("duckling error: " + err);
     reject(err);
   }
 });
};

processDucklingElements = (thisContext) => {
 return new promise(function (resolve, reject) {
   try {
     for (i = 0; i < thisContext.duckling.length; i++) {
       let thisDucklingElement = thisContext.duckling[i];
       if (
         thisDucklingElement.value.type == "interval" &&
         thisDucklingElement.dim == "amount-of-money"
       ) {
         let unitsUnknown = false;
         if (thisDucklingElement.value.from) {
           if (thisDucklingElement.value.from.unit == "unknown") {
             unitsUnknown = true;
           }
         }

         if (thisDucklingElement.value.to) {
           if (thisDucklingElement.value.to.unit == "unknown") {
             unitsUnknown = true;
           }
         }
 
         if (unitsUnknown == false) {
           //we are searching for a sale-price range.
           let under = "*";
           if (thisDucklingElement.value.to) {
             under = thisDucklingElement.value.to.value;
           }

           let over = "*";
           if (thisDucklingElement.value.from) {
             over = thisDucklingElement.value.from.value;
           }

           let thisSearchParam =
             "fq=sale_price:[" + over + " TO " + under + "]";

           thisContext.queryParameters.push(thisSearchParam);
           thisContext.searchText = thisContext.searchText.replace(thisDucklingElement.body, "");
         }
       }
     }

     resolve(thisContext);
   } catch (err) {
     console.log("duckling element processing error: " + err);
     reject(err);
   }
 });
};

module.exports = {
 runDuckling,
};

The Search Module:

So now that we have a module which enables us to pre-facet our query, we need to be able to run queries against the Bloomreach intelligent index. This is very simple to do, simply hit the API endpoint in the same way as we could directly from the browser. 

To do this, I am using the node https module which allows us to make all kinds of https request - in this case, I am going to perform a get against: https://core.dxpapi.com/api/v1/core - which is the endpoint for search and category requests.

In the previous module, we built out our additional query parameters into thisContext.queryParameters. In this module, we will iterate over each of those and make sure that they are URI encoded before appending them to the query:


let extraParams = ""

for(i=0;i<thisContext.queryParameters.length;i++){
  extraParams += "&" + encodeURI(thisContext.queryParameters[i])
}

We will do the same for the revised search query:

thisContext.searchText = encodeURI(thisContext.searchText)

And finally, construct the rest of the query - for the sake of simplicity, I’ve hardcoded many of the search parameters found in the request path as they are not the focus of this post:

let reqPath =
  "/api/v1/core/?account_id=" +
  settings.account_id +
  "&auth_key=" +
  settings.auth_key +
  "&domain_key=" +
  settings.domain_key +
"&url=www.mysite.com&ref_url=www.mysite.com&request_type=search&rows=20&start=0&fl=pid%2Ctitle%2Csale_price%2Cthumb_image%2Curl%2Cdescription%2CRating&q=" +
 

  thisContext.searchText +
  "&search_type=keyword" +
  extraParams;

let options = {
  hostname: "core.dxpapi.com",
  method: "GET",
  path: reqPath,
  port: 443,
};

Making the actual call to the Bloomreach API is simple from here:

req = https.request(options, (resp) => {
  let data = ""
  resp.on("data", (chunk) => {
    data += chunk
  })

  // The whole response has been received.
  resp.on("end", () => {
    console.log(data)
    resolve(JSON.parse(data))
  })
})

req.end();

 There are countless excellent explanations of the above on the internet - essentially you create the req object and then listen for it to emit events - when you get data, add it to the data that has been received and when it emits an ‘end’ event, callback with the data you have. Ideally, parse it as JSON.

The full module is shown below:

const util = require("util"),
 https = require("https"),
 promise = require("promise"),
 settings = require("./settings.json");
 
let search = (thisContext) => {
 return new promise(function (resolve, reject) {
   try {
     console.log(
       "bloomreach: we are searching for: " + thisContext.searchText
     );

     doSearch(thisContext).then((thisResult) => {
       console.log(JSON.stringify(thisContext));
       resolve(thisResult);
     });
   } catch (err) {
     console.log("search error:" + err);
     reject(err);
   }
 });
};

let doSearch = (thisContext) => {
 return new promise(function (resolve, reject) {
   try {
     let extraParams = "";

     for (i = 0; i < thisContext.queryParameters.length; i++) {
       extraParams += "&" + encodeURI(thisContext.queryParameters[i]);
     }

     thisContext.searchText = encodeURI(thisContext.searchText);

     let reqPath =
       "/api/v1/core/?account_id=" +
       settings.account_id +
       "&auth_key=" +
       settings.auth_key +
       "&domain_key=" +
       settings.domain_key +
       "&url=www.mysite.com&ref_url=www.mysite.com&request_type=search&rows=200&start=0&fl=pid%2Ctitle%2Csale_price%2Cthumb_image%2Curl%2Cdescription%2CRating&q=" +
       thisContext.searchText +
       "&search_type=keyword" +
       extraParams;

     let options = {
       hostname: "core.dxpapi.com",
       method: "GET",
       path: reqPath,
       port: 443,
     };

     req = https.request(options, (resp) => {
       let data = "";
       // A chunk of data has been recieved.
       resp.on("data", (chunk) => {
         data += chunk;
       });
 
       // The whole response has been received.
       resp.on("end", () => {
         console.log(data);
         resolve(JSON.parse(data));
       });
     });
     req.end();
   } catch (err) {
     console.log("bang: " + err);
     reject(err);
   }
 });
};

module.exports = {
 search,
};

Tying it all together

Ok - so now we have a duckling module which will talk to a running duckling server and we have a search module which will make calls out to the Bloomreach search API endpoints. Now we will tie them together with a server which responds to client requests.

First of we will include Express and our Duckling and Search modules:

const duckling = require('./duckling'),
   searchModule = require('./searchModule'),
   express = require('express'),

Now let’s make it listen to get requests /search?q=<our query> - as follows:

app.get('/search', function(req, res) {
  console.log('query is:' + JSON.stringify(req.query))

  let thisContext = {
    thisInput: req.query.q,
    queryParameters: [],
    searchText: req.query.q
  }

  duckling.runDuckling(thisContext)
  .then((thisContext) => {
    return searchModule.search(thisContext)
  })
  .then((result) => {
    res.send({result});
    return
  })

  .catch((err) => {
    console.log(`request error %s`,err)
  })
});

Here we can see, for get requests against /search, we extract the query parameter q, create our context object and pass it through duckling - and then the search module - before returning the response back to the original requester in the response (res) object. Simple.

The full index.js can be found below:

const duckling = require('./duckling'),
   searchModule = require('./searchModule'),
   express = require('express'),
   app = express(),
   port = 3000

app.use(express.urlencoded({extended: true})); // to support URL-encoded bodies``

app.get('/search', function(req, res) {
 console.log('query is:' + JSON.stringify(req.query))

 let thisContext = {
   thisInput: req.query.q,
   queryParameters: [],
   searchText: req.query.q
 }

  duckling.runDuckling(thisContext)
 .then((thisContext) => {
   return searchModule.search(thisContext)
 })

 .then((result) => {
   res.send({result});
   return
 })

 .catch((err) => {
   console.log(`request error %s`,err)
 })
});

app.listen(port);

console.log('Listening on port ' + port + '...');

Results - natural language price extraction applied as a pre-facet to a query!

Ok - so what did quick Proof yield for us? As you can see from the outputs below, extracting price ranges from the natural language query and using them to pre-facet the query works awesomely well. All of the results reflect the natural language intent of the unstructured query.

http://localhost:3000/search?q=Candles less than 10 pounds:

{
 "result": {
   "response": {
     "numFound": 179,
     "start": 0,
     "docs": [
       {
         "sale_price": 5.94,
         "description": "In a blue vintage-inspired glass apothecary jar, our hand-poured candle enlivens a room with the invigorating scent of marine, citrus, jasmine and musk.",
         "title": "Seaside Mist Filled Apothecary Jar Candle",
         "url": "https://pacifichome.bloomreach.com/products/71035",
         "pid": "71035",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/71035_XXX_v1.tif"
       },
       {
         "sale_price": 2.54,
         "description": "Our hand-poured pillar candle in an ivory hue emanates a luxurious aromatic ambience. With its appealing bouquet of sandalwood and hints of rose, cardamom, and rich amber, this candle soothes the senses.",
         "title": "3\" x 3\" Indian Sandalwood Pillar Candle",
         "url": "https://pacifichome.bloomreach.com/products/15499",
         "pid": "15499",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/15499_XXX_v1.tif"
       },
       {
         "sale_price": 3.23,
         "description": "Create a dramatic scene lit by flickering candlelight with our clean and classic white pillar candle. Versatile and fragrance-free, this wax pillar adds simple elegance to any space.",
         "title": "3x3 White Unscented Pillar Candle",
         "url": "https://pacifichome.bloomreach.com/products/32462",
         "pid": "32462",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/32462_XXX_v1.tif"
       },
       {
         "sale_price": 3.34,
         "description": "Made in Germany, our set of two unscented taper candles in a dark green hue add festive illumination anywhere in the home.",
         "title": "Dark Green Taper Candles Set of 2",
         "url": "https://pacifichome.bloomreach.com/products/81564",
         "pid": "81564",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/81564_XXX_v1.tif"
       },
...

http://localhost:3000/search?q=Candles between 10 and 13 dollars:

{
 "result": {
   "response": {
     "numFound": 44,
     "start": 0,
     "docs": [
       {
         "sale_price": 11.71,
         "description": "In an amber vintage-inspired glass apothecary jar, our hand-poured candle enlivens a room with the invigorating scent of sweet clementine fruit and honey.",
         "title": "Clementine And Honey Apothecary Filled Jar Candle",
         "url": "https://pacifichome.bloomreach.com/products/60622",
         "pid": "60622",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/60622_XXX_v1.tif"
       },
       {
         "sale_price": 12.04,
         "description": "Presented in a pretty amber glass jar and topped with a wood lid, our filled jar candle instantly freshens the atmosphere. The hand-poured soy wax is scented with bergamot, cinnamon, white cedarwood and orange peel.",
         "title": "Cider and Clove Wooden Lid Filled Jar Candle",
         "url": "https://pacifichome.bloomreach.com/products/88147",
         "pid": "88147",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/88147_XXX_v1.tif"
       },
       {
         "sale_price": 10.07,
         "description": "Let the feel-good scents of wellness float over you with our luxurious, spa-like fragranced jar candle, presented in a chic black glass vessel and topped with a cork lid. Available in four different scents blended with essential oils, our exclusive candle is hand-poured in Huntington Beach, California, and makes a beautiful gift. Choose from Meditate Lavender, Stress Relief Eucalyptus, Namaste Sweet Orange and Calming Lemongrass.",
         "title": "Wellness Essential Oil Filled Jar Candle with Cork Lid",
         "url": "https://pacifichome.bloomreach.com/products/89459",
         "pid": "89459",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/89459_XXX_v1.tif"
       },
       {
         "sale_price": 12.57,
         "description": "With a volcanic-like pattern of earthy reds and pinks unique to each jar, our reactive glaze candle has an artisan-crafted feel that gives it a distinct sense of style. Made with fragranced soy wax, sought after for its renewable natural vegetable base, slower burn rate and even fragrance release, it is scented with woodsy notes along with highlights of citrus, jasmine, rose and lily.",
         "title": "Dusk Red Soy Wax Reactive Glaze Filled Jar Candle",
         "url": "https://pacifichome.bloomreach.com/products/88133",
         "pid": "88133",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/88133_XXX_v1.tif"
       },
       {
         "sale_price": 10.44,
         "description": "Create a dramatic scene lit by flickering candlelight with our clean and classic ivory pillar candle. Versatile and fragrance-free, this wax pillar adds simple elegance to any space.",
         "title": "4x4 Ivory Unscented Pillar Candle",
         "url": "https://pacifichome.bloomreach.com/products/32472",
         "pid": "32472",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/32472_XXX_v1.tif"
       },
       {
         "sale_price": 12.89,
         "description": "Our hand-poured Mediterranean Sea mottled votive candles capture the essence of fragrant neroli blossoms and musk. These calming candles gently infuse your favorite space with the scent of clean linens and a fresh ocean breeze.",
         "title": "Mediterranean Sea Mottled Votive Candles Set of 12",
         "url": "https://pacifichome.bloomreach.com/products/21867",
         "pid": "21867",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/21867_XXX_v1.tif"
       },
...

http://localhost:3000/search?q=Candles over 15 euros:

{
 "result": {
   "response": {
     "numFound": 48,
     "start": 0,
     "docs": [
       {
         "sale_price": 15.36,
         "description": "Fill your room with the scent of sweet honey and savory, woodsy notes of oak, guaiac wood and sandalwood with our filled candle, contained in a chic, antique-black tin with a fabulous rainbow finish. The delightful fragrances of nature burst forth from the hand-poured wax for a wonderful sugar and spice aroma wherever you decide to place it.",
         "title": "Miel and Pepper Filled Antique Oil Tin Candle",
         "url": "https://pacifichome.bloomreach.com/products/88104",
         "pid": "88104",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/88104_XXX_v1.tif"
       },
       {
         "sale_price": 17.59,
         "description": "Hand poured in small batches, our soy-blend candle is scented with Florida citrus, Georgia peach and pineapple with hints of apple blossom, wild daisy, violet leaves and white musk. It's available in a short glass container with a wood lid or a tall glass vessel, each decorated with a bouquet of flowers on the label.",
         "title": "Wildflower Sweet Aromas Filled Candle",
         "url": "https://pacifichome.bloomreach.com/products/85361",
         "pid": "85361",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/85361_XXX_v1.tif"
       },
       {
         "sale_price": 16.11,
         "description": "Complete with an authentic-looking melted top, our flameless LED pillar candle is made with smooth, premium ivory wax. Available in two sizes, it has a convincing flickering light to create all the ambience of real candlelight and is perfect for homes with kids and pets.",
         "title": "Ivory Flameless Flickering LED Pillar Candle",
         "url": "https://pacifichome.bloomreach.com/products/88162",
         "pid": "88162",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/88162_XXX_v1.tif"
       },
       {
         "sale_price": 18.42,
         "description": "With its perfectly pretty presentation, our filled candle in a purple glass jar is as much fun to give as it is to receive. The wax is scented with the flowery fragrance of peonies, with smooth, woodsy undertones, then topped with a thin gold lid for a luxe look.",
         "title": "Large Misty Cashmere Gold Lid Glass Filled Jar Candle",
         "url": "https://pacifichome.bloomreach.com/products/78076",
         "pid": "78076",
         "thumb_image": "https://pacific-demo-data.bloomreach.cloud/home/images/78076_XXX_v1.tif"
       },
...

Downloading the code and running it

Install and run Duckling from here:

https://github.com/facebook/duckling

Download this code from: 

https://github.com/paulBloomreach/natural-language-price-ranged-search-queries-on-bloomreach-intelligent-index-using-duckling 

Install and run it with:

Npm install

Node index.js

Browse to:

http://localhost:3000/search?q=painting%20over%2040%20dollars

http://localhost:3000/search?q=painting%20between%2010%20and%2050%20dollars

http://localhost:3000/search?q=<type whatever you want and don’t bother url encoding it>

Note: I’ve slimmed down the number of fields returned to make it easy to see the effect on the sale_price…