Thoughts about Nutch
Posted by Kelvin on 10 Mar 2005 at 10:56 pm | Tagged as: Lucene / Solr / Elasticsearch / Nutch, work
I've been working on Nutch lately for a client, and its good fun feeling my way around such an ambitious project. Its still rather immature – the code is stable, and there are no major bugs, but the API isn't yet developer-friendly, in that its difficult to extend many classes without patching Nutch directly.
Its interesting to see Doug Cutting put Lucene through its paces in Nutch. It gives an indication of how Lucene can be made to do some interesting stuff. I think Nutch is the best available case study for how to power-use Lucene, and do stuff like distributed indexing and searching.
I would love to see
- the crawling part of Nutch extracted into a separate lib, and I made a request on the mailing list for it, but no response..
- easier-to-use console apps for manipulating the webdb
- …TBD
2 Responses to “Thoughts about Nutch”
I've been playing with Lucene in order to get a fast completely client-side query engine to run on any computer from a CDROM. Java, of course, just doesn't cut it, and without instalation of Jetty, or poorly-supported Java-Javascript communication, it is inellegant to impossible.
I am curious if anyone has attempted to read a Lucene generated index file on the client-side in Javascript? Alternatively converting an index into javascript hashtables or XML.
Otherwise, I'm going to do just that. 😉
Alex, let me know if you're interested in working on this together.. Something like this has been on my backburner for awhile! 🙂