Extending SOLR to improve your users experience
08 Jun 2015Improving the user experience is a relentless battle, you constantly have to keep pushing it in order to give your users the best experience possible.
The user story
As a user I want to see the most popular things around me when I am in a specific location.
When you travel, and you are in a certain location, you want to see things around you.
We started this out a while back with just sorting by distance and filtering out less popular places.
We realized, this is not enough. When you travel you are not really thinking in terms of absolute distances, you think in terms of rings or clusters.
You basically say to yourself, I am willing to drive 3-5 minutes, what is the best restaurant I can find in that range?
If you can drive for another minute and find a better one, would you do it?
If you just go around the block and walk 5 more minutes and walk into the best ice cream place you’ve ever been too, will you walk that?
You certainly will!
Here’s an illustration of it. (Numbers represent the relevance percentage for you).
When most users travel, they are willing to invest that extra time in order to find the most popular places, but it has to be in an acceptable range.
Things to consider
- Everything should be done in SOLR
- It should incorporate seamlessly with our current ranking for places.
Implementation
I wanted this to be a built in function that SOLR will use natively.
This came down to either one of two ways
- Get Everything from SOLR and do it in memory (which will violate one of the rules we considered coming into this).
- Extend SOLR with this function and configure all of the search cluster to use it.
Research
I encountered this post User Based Personalization Engine with Solr that extends SOLR in order to sort results by a personal scoring system (using an API).
After going through and reading the post (and the posts it’s linking to):
- From the Field: Relevency Using SOLR for clickstreams
- Solr Custom Function Tutorial
- High Level Tutorial
I had a very good plan in mind.
The SOLR distance function is defined using the godist
internal function, you pass that function the field the location is indexed on and also lat, lng. like so: geodist(location_ll,48.8567,2.3508) asc
.
At this point I pretty much decided that I am going to use the same convention but going to sort by distanceScore
like so: distanceScore(location_ll,48.8567,2.3508) desc
.
The Implementation
Instead of sorting by the distance, I am going to assign a score for each distance range, so all the items in that range will have the same score, resulting in the secondary sort (our internal scoring system) being the tie breaker.
Here’s a basic illustration of it.
So, lets look at some results for example.
{
"docs": [
{
"distance": 0.20637469859399163,
"distance_score": 1000,
"name": "Brenda's French Soul Food"
},
{
"distance": 0.08686129174919746,
"distance_score": 1000,
"name": "Chambers Eat + Drink"
},
{
"distance": 0.1812205524350946,
"distance_score": 1000,
"name": "Lers Ros Thai"
},
{
"distance": 0.06320259257621294,
"distance_score": 1000,
"name": "Saigon Sandwiches"
},
{
"distance": 0.11542846457258274,
"distance_score": 1000,
"name": "Turtle Tower Restaurant"
},
{
"distance": 0.2105972668549029,
"distance_score": 1000,
"name": "Olive"
},
{
"distance": 0.21655948230840996,
"distance_score": 1000,
"name": "Philz Coffee - Golden Gate"
},
{
"distance": 0.13191153597807037,
"distance_score": 1000,
"name": "Pagolac"
},
{
"distance": 0.2152692626334937,
"distance_score": 1000,
"name": "Sai Jai Thai Restaurant"
},
{
"distance": 0.21263741323062255,
"distance_score": 1000,
"name": "Zen Yai Thai"
}
]
}
As you can see from this result set, the items in the same distance claster are being scored the same distance_score
and sorted by our internal scoring system.
This gives the user a great sort, sorting by “Popular places around me”, not necesarily just by distance.
Code
After digging through SOLR source code quite a bit, I have found the 2 classes that do the distance calculation and return the result to the user.
- GeoDistValueSourceParser.java [Github]
- HaversineConstFunction.java [Github]
I grabbed those 2 classes into a new Java project, then, instead of just returning the distance I checked the distance for the rings/cluster assignment and returned this number to the result set.
After doing that, you need to compile your JAR and make a bit of configuration changes in SOLR.
Adding the lib folder as another source for classes
<lib dir="./lib" />
In solrconfig.xml
, this line is usually commented out, uncommenting it means that now {core-name}/lib
is a directory in which SOLR will search for custom JARs.
Adding a function that you can call
<valueSourceParser name="distanceScore"
class="com.gogobot.DistanceParser" />
This will add a new valueSourceParser
that we can call using the distanceScore
which will be parsed by com.gogobot.DistanceParser
.
Summing up
-
When you get a feature like this as an engineer, you need to leave your comfort zone. Even if your thought is to do in in memory later, you need to look at the bigger picture, think of the entire feature and implement the feature in the most appropriate language.
-
I had real fun digging through the source code for SOLR, it is very different from digging through rails source code or any other big Ruby project out there.
-
Finding documentation around adding a
valueSourceParser
wasn’t trivial, seems it should be easier. -
I don’t know whether this would be easier with ElasticSearch, but seems that the documentation is better and the community is more vibrant.
Open source
The code is open source here: gogobot/solr-distance-cluster.