Kelvin Tan - Solr/Elasticsearch Consultant

Solr - Elasticsearch - Big Data

Thoughts about Nutch

Posted by Kelvin on 10 Mar 2005 at 10:56 pm | Tagged as: work, Lucene / Solr / Elasticsearch / Nutch

I've been working on Nutch lately for a client, and its good fun feeling my way around such an ambitious project. Its still rather immature – the code is stable, and there are no major bugs, but the API isn't yet developer-friendly, in that its difficult to extend many classes without patching Nutch directly.

Its interesting to see Doug Cutting put Lucene through its paces in Nutch. It gives an indication of how Lucene can be made to do some interesting stuff. I think Nutch is the best available case study for how to power-use Lucene, and do stuff like distributed indexing and searching.

I would love to see

the crawling part of Nutch extracted into a separate lib, and I made a request on the mailing list for it, but no response..
easier-to-use console apps for manipulating the webdb
…TBD

2 Comments »

2 Responses to “Thoughts about Nutch”

Alex G on 09 Apr 2005 at 12:02 am

I've been playing with Lucene in order to get a fast completely client-side query engine to run on any computer from a CDROM. Java, of course, just doesn't cut it, and without instalation of Jetty, or poorly-supported Java-Javascript communication, it is inellegant to impossible.

I am curious if anyone has attempted to read a Lucene generated index file on the client-side in Javascript? Alternatively converting an index into javascript hashtables or XML.

Otherwise, I'm going to do just that. 😉
Kelvin on 02 Jul 2005 at 5:48 am

Alex, let me know if you're interested in working on this together.. Something like this has been on my backburner for awhile! 🙂

Alex G on 09 Apr 2005 at 12:02 am

I've been playing with Lucene in order to get a fast completely client-side query engine to run on any computer from a CDROM. Java, of course, just doesn't cut it, and without instalation of Jetty, or poorly-supported Java-Javascript communication, it is inellegant to impossible.

I am curious if anyone has attempted to read a Lucene generated index file on the client-side in Javascript? Alternatively converting an index into javascript hashtables or XML.

Otherwise, I'm going to do just that. 😉

Kelvin on 02 Jul 2005 at 5:48 am

Alex, let me know if you're interested in working on this together.. Something like this has been on my backburner for awhile! 🙂

Supermind Search Consulting Blog Solr - Elasticsearch - Big Data

Thoughts about Nutch

2 Responses to “Thoughts about Nutch”

Supermind Search Consulting Blog
Solr - Elasticsearch - Big Data