Idea: 2-stage recovery of corrupt Solr/Lucene indexes
Posted by Kelvin on 09 Sep 2009 at 09:43 pm | Tagged as: programming, Lucene / Solr / Elasticsearch / Nutch
I was recently onsite with a client who happened to have a corrupt Solr/Lucene index. The CheckIndex tool (lucene 2.4+) diagnosed the problem, and gave the option of fixing it.
Except… fixing the index in this case meant losing the corrupt segment, which also happened to be the one containing over 90% of documents.
Because Solr has the concept of a doc uid (which Lucene doesn't have), what I did was write a tool for them to dump out the uids in that corrupted segment into a text file, so after recovering the index, they were able to reindex the docs that were lost in that segment.