Extending SOLR to improve your users experience

Improving the user experience is a relentless battle, you constantly have to keep pushing it in order to give your users the best experience possible.

The user story

As a user I want to see the most popular things around me when I am in a specific location.

When you travel, and you are in a certain location, you want to see things around you.

We started this out a while back with just sorting by distance and filtering out less popular places.

We realized, this is not enough. When you travel you are not really thinking in terms of absolute distances, you think in terms of rings or clusters.

You basically say to yourself, I am willing to drive 3-5 minutes, what is the best restaurant I can find in that range?
If you can drive for another minute and find a better one, would you do it?

If you just go around the block and walk 5 more minutes and walk into the best ice cream place you’ve ever been too, will you walk that?
You certainly will!

Here’s an illustration of it. (Numbers represent the relevance percentage for you).

Distance Rings Illustration

When most users travel, they are willing to invest that extra time in order to find the most popular places, but it has to be in an acceptable range.

Things to consider

Implementation

I wanted this to be a built in function that SOLR will use natively.

This came down to either one of two ways

  1. Get Everything from SOLR and do it in memory (which will violate one of the rules we considered coming into this).
  2. Extend SOLR with this function and configure all of the search cluster to use it.

Research

I encountered this post User Based Personalization Engine with Solr that extends SOLR in order to sort results by a personal scoring system (using an API).

After going through and reading the post (and the posts it’s linking to):

I had a very good plan in mind.

The SOLR distance function is defined using the godist internal function, you pass that function the field the location is indexed on and also lat, lng. like so: geodist(location_ll,48.8567,2.3508) asc.

At this point I pretty much decided that I am going to use the same convention but going to sort by distanceScore like so: distanceScore(location_ll,48.8567,2.3508) desc.

The Implementation

Instead of sorting by the distance, I am going to assign a score for each distance range, so all the items in that range will have the same score, resulting in the secondary sort (our internal scoring system) being the tie breaker.

Here’s a basic illustration of it.

Distance Rings Illustration

So, lets look at some results for example.


{
  "docs": [
    {
      "distance": 0.20637469859399163,
      "distance_score": 1000,
      "name": "Brenda's French Soul Food"
    },
    {
      "distance": 0.08686129174919746,
      "distance_score": 1000,
      "name": "Chambers Eat + Drink"
    },
    {
      "distance": 0.1812205524350946,
      "distance_score": 1000,
      "name": "Lers Ros Thai"
    },
    {
      "distance": 0.06320259257621294,
      "distance_score": 1000,
      "name": "Saigon Sandwiches"
    },
    {
      "distance": 0.11542846457258274,
      "distance_score": 1000,
      "name": "Turtle Tower Restaurant"
    },
    {
      "distance": 0.2105972668549029,
      "distance_score": 1000,
      "name": "Olive"
    },
    {
      "distance": 0.21655948230840996,
      "distance_score": 1000,
      "name": "Philz Coffee - Golden Gate"
    },
    {
      "distance": 0.13191153597807037,
      "distance_score": 1000,
      "name": "Pagolac"
    },
    {
      "distance": 0.2152692626334937,
      "distance_score": 1000,
      "name": "Sai Jai Thai Restaurant"
    },
    {
      "distance": 0.21263741323062255,
      "distance_score": 1000,
      "name": "Zen Yai Thai"
    }
  ]
}

As you can see from this result set, the items in the same distance claster are being scored the same distance_score and sorted by our internal scoring system.

This gives the user a great sort, sorting by “Popular places around me”, not necesarily just by distance.

Code

After digging through SOLR source code quite a bit, I have found the 2 classes that do the distance calculation and return the result to the user.

I grabbed those 2 classes into a new Java project, then, instead of just returning the distance I checked the distance for the rings/cluster assignment and returned this number to the result set.

After doing that, you need to compile your JAR and make a bit of configuration changes in SOLR.

Adding the lib folder as another source for classes

  <lib dir="./lib" />

In solrconfig.xml, this line is usually commented out, uncommenting it means that now {core-name}/lib is a directory in which SOLR will search for custom JARs.

Adding a function that you can call

  <valueSourceParser name="distanceScore" 
                    class="com.gogobot.DistanceParser" />

This will add a new valueSourceParser that we can call using the distanceScore which will be parsed by com.gogobot.DistanceParser.

Summing up

Open source

The code is open source here: gogobot/solr-distance-cluster.