Connecting Redis to Solr for boosting documents
Posted by Kelvin on 07 Jun 2012 at 02:06 am | Tagged as: Lucene / Solr / Elasticsearch / Nutch
There are a number of instances in Solr where it's desirable to retrieve data from an external datastore for boosting purposes instead of trying to contort Solr with multiple queries, joins etc.
Here's a trivial example:
Jobs are stored as documents in Solr. Users of the application can rank a job from 1-10. We need to boost each job with the user's rank if it exists.
Now, to try to attempt to model this fully in Solr would be fairly inefficient, especially for large # of jobs and/or users, since each time a user ranks a job, the searcher has to reload in order for that data to be available for searching.
A much more efficient method of implementing this, is by storing the rank data in a nosql store like Redis, and retrieving the rank at query-time, using it to boost the documents accordingly.
This can be accomplished using a custom FunctionQuery. I've blogged about how to create custom function queries in Solr before, so this is simply an application of the subject.
Here's the code:
public class RedisValueSourceParser extends ValueSourceParser { @Override public ValueSource parse(FunctionQParser fp) throws ParseException { String idField = fp.parseArg(); String redisKey = fp.parseArg(); String redisValue = fp.parseArg(); return new RedisValueSource(idField, redisKey, redisValue); } }
This FunctionQuery accepts 3 arguments:
1. redisKey
2. redisValue
3. the field to use as an id field
Here's what the salient part of RedisValueSource looks like:
@Override public DocValues getValues(Map context, IndexReader reader) throws IOException { final String[] lookup = FieldCache.DEFAULT.getStrings(reader, idField); final Jedis jedis = new Jedis("localhost"); String v = jedis.hget(redisKey, redisValue); final JSONObject obj; if (v != null) { obj = (JSONObject) JSONValue.parse(v); } else { obj = new JSONObject(); } jedis.disconnect(); return new DocValues() { @Override public float floatVal(int doc) { final String id = lookup[doc]; Object v = obj.get(id); if(v != null) { try { return Float.parseFloat(v.toString()); } catch (NumberFormatException e) { return 0; } } return 0; } @Override public int intVal(int doc) { final String id = lookup[doc]; Object v = obj.get(id); if(v != null) { try { return Integer.parseInt(v.toString()); } catch (NumberFormatException e) { return 0; } } return 0; } @Override public String strVal(int doc) { final String id = lookup[doc]; Object v = obj.get(id); return v != null ? v.toString() : null; } @Override public String toString(int doc) { return strVal(doc); } }; }
From here, you can use the following Solr query to perform boosting based on the Redis value:
http://localhost:8983/solr/select?defType=edismax&q=cat:electronics&bf=redis(id,influence,1001)&debugQuery=on
The explain output looks like this:
3.4664698 = (MATCH) sum of: 1.070082 = (MATCH) weight(cat:electronics in 2), product of: 0.80067647 = queryWeight(cat:electronics), product of: 1.3364723 = idf(docFreq=14, maxDocs=21) 0.59909695 = queryNorm 1.3364723 = (MATCH) fieldWeight(cat:electronics in 2), product of: 1.0 = tf(termFreq(cat:electronics)=1) 1.3364723 = idf(docFreq=14, maxDocs=21) 1.0 = fieldNorm(field=cat, doc=2) 2.3963878 = (MATCH) FunctionQuery(redis(id,influence,1001)), product of: 4.0 = 4.0 1.0 = boost 0.59909695 = queryNorm