Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

Two simple optimizations to DP algorithm for calculating Levenstein edit distance

Posted by Kelvin on 01 Jul 2009 | Tagged as: programming

Levenstein/edit distance is most often calculated using a dynamic programming (DP) algorithm. The algorithm goes like this: 1. given 2 strings, s and t 2. instantiate d, an m x n matrix where m = length of s + 1 and n = length of t + 1 3. for each char in s 4. […]

Trie-based approximate autocomplete implementation with support for ranks and synonyms

Posted by Kelvin on 01 Jul 2009 | Tagged as: programming

The problem of auto-completing user queries is a well-explored one. For example, Type less, find more: fast autocompletion search with a succinct index http://stevedaskam.wordpress.com/2009/06/07/putting-autocomplete-data-structure-to-the-test/ http://suggesttree.sourceforge.net/ http://sujitpal.blogspot.com/2007/02/three-autocomplete-implementations.html However, there's been little written about supporting synonyms and approximate matching for the purpose of autocompletion. The approach for autocompletion I'll be discussing in this article supports the following […]

GNU Screen: Working with the Scrollback Buffer — Samsarin

Posted by Kelvin on 02 Sep 2008 | Tagged as: blogmark

http://www.samsarin.com/blog/2007/03/11/gnu-screen-working-with-the-scrollback-buffer/

100% height iframes

Posted by Kelvin on 30 Aug 2008 | Tagged as: programming

http://brondsema.net/blog/index.php/2007/06/06/100_height_iframe was a solution that worked for me after trying several out.

What are alternatives to Google AdSense?

Posted by Kelvin on 27 Jun 2008 | Tagged as: blogmark

http://www.tech-faq.com/google-adsense-alternatives.shtml

Robert Capra Notes on Solr Update with PHP

Posted by Kelvin on 27 Jun 2008 | Tagged as: blogmark, PHP

http://www.ils.unc.edu/~rcapra/solr-update-php.php

Mohawke's Best of the Best Free and Open Source Software Collection: Mac OS X and Windows software Collection

Posted by Kelvin on 27 Jun 2008 | Tagged as: blogmark

http://www.digitaldarknet.net/thelist/

19 bullet points about the difference between enterprise and web search | Text Technologies

Posted by Kelvin on 17 Jun 2008 | Tagged as: blogmark

http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/

Jeff's Search Engine Caffè: Java Open source Text Mining and Information Extraction tools

Posted by Kelvin on 13 Jun 2008 | Tagged as: blogmark

http://www.searchenginecaffe.com/2007/03/java-open-source-text-mining-and.html

Using Hadoop IPC/RPC for distributed applications

Posted by Kelvin on 02 Jun 2008 | Tagged as: programming, Lucene / Solr / Elasticsearch / Nutch

Hadoop is growing to be a pretty large framework – release 0.17.0 has 483 classes! Previously, I'd written about Hadoop SequenceFile. SequenceFile is part of the org.apache.hadoop.io package, the other notable useful classes in that package being ArrayFile and MapFile which are persistent array and dictionary data structures respectively. About Hadoop IPC Here, I'm going […]

« Previous PageNext Page »