Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

Non-blocking/NIO HTTP requests in Java with Jetty's HttpClient

Posted by Kelvin on 05 Mar 2012 | Tagged as: programming, crawling

Jetty 6/7 contain a HttpClient class that make it uber-easy to issue non-blocking HTTP requests in Java. Here is a code snippet to get you started. Initialize the HttpClient object. HttpClient client = new HttpClient(); client.setConnectorType(HttpClient.CONNECTOR_SELECT_CHANNEL); client.setMaxConnectionsPerAddress(200); // max 200 concurrent connections to every address client.setTimeout(30000); // 30 seconds timeout; if no server reply, the […]

Using contextual hints to improve Solr's autocomplete suggester

Posted by Kelvin on 03 Mar 2012 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Context-less multi-term autocomplete is difficult. Given the term "di", we can look at our index and rank terms starting with "di" by frequency and return the n most frequent terms. Solr's TSTLookup and FSTLookup do this very well. However, given the term "walt di", we can no longer do what we did above for each […]

Solr autocomplete with document suggestions

Posted by Kelvin on 03 Mar 2012 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Solr 3.5 comes with a nice autocomplete/typeahead component that is based on the SolrSpellCheckComponent. You provide it a query and a field, and the Suggester returns a list of suggestions based on the query. For example: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="spellcheck"> <lst name="suggestions"> <lst name="ac"> <int name="numFound">2</int> <int name="startOffset">0</int> <int name="endOffset">2</int> <arr name="suggestion"> <str>acquire</str> […]

Book review of Apache Solr 3 Enterprise Search Server

Posted by Kelvin on 28 Feb 2012 | Tagged as: Lucene / Solr / Elasticsearch / Nutch, programming

Apache Solr 3 Enterprise Search Server published by Packt Publishing is the only Solr book available at the moment. It's a fairly comprehensive book, and discusses many new Solr 3 features. Considering the breakneck pace of Solr development and the rate at which new features get introduced, you have to hand it to the authors […]

Apache Solr book review coming soon..

Posted by Kelvin on 27 Feb 2012 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Just received my review copy of the only Apache Solr book on the market.. http://www.packtpub.com/apache-solr-3-enterprise-search-server/book My book review to follow shortly..

Batch convert svg to png in Ubuntu

Posted by Kelvin on 19 Oct 2011 | Tagged as: Ubuntu

sudo apt-get install librsvg2-bin for i in *; do rsvg-convert -a $i -o `echo $i | sed -e 's/svg$/png/'`; done to rasterize the svg at 300dpi, shrinking dimensions by 50%: for i in *; do rsvg-convert -z 0.5 -d 300 -p 300 -a $i -o `echo $i | sed -e 's/svg$/png/'`; done

Mount a .dmg file in Ubuntu

Posted by Kelvin on 11 Oct 2011 | Tagged as: Ubuntu

sudo apt-get install dmg2img dmg2img /path/to/image.dmg sudo modprobe hfsplus sudo mount -t hfsplus -o loop image.img /mnt The .dmg archive is now mounted at /mnt. You can browse it either via command-line or via Nautilus. Courtesy of http://iremedy.net/blog/2010/11/how-to-mount-a-dmg-file-in-ubuntu-linux/

Download KhanAcademy videos with a PHP crawler

Posted by Kelvin on 08 Oct 2011 | Tagged as: programming, PHP

At the moment (October 2011), there's no simple way to download all videos from a playlist from KhanAcademy.org. This simple PHP crawler script changes that. 🙂 What it does is downloads the videos (from archive.org) to a subfolder, numbering and naming the videos with the respective titles (not the gibberish titles that archive.org has assigned […]

Painless CRUD in PHP via AjaxCrud

Posted by Kelvin on 08 Oct 2011 | Tagged as: programming, PHP

I recently discovered an Ajax CRUD library which makes CRUD operations positively painless: AjaxCRUD Its features include: – displaying list in an inline-editable table – generates a create form – all operations (add,edit,delete) handled via ajax – supports 1:many relations – only 1 class to include!! I highly recommend you try it out! Here is […]

What's new in Solr 3.4.0

Posted by Kelvin on 06 Oct 2011 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

If you are already using Apache Solr 3.1, 3.2 or 3.3, it's strongly recommended you upgrade to 3.4.0 because of the index corruption bug on OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0. Solr 3.4.0 release highlights include Bug fixes and improvements from Apache Lucene 3.4.0, including a major bug (LUCENE-3418) […]

« Previous PageNext Page »