OC and Nutch MapReduce
Posted by Kelvin on 15 Sep 2005 at 03:52 pm | Tagged as: work, programming, Lucene / Solr / Elasticsearch / Nutch, crawling
(or what's next for OC)…
I've received a couple of emails about what the future of OC vis-a-vis incorporating into Nutch codebase, the upcoming MapReduce merge into trunk, etc.
My thoughts are:
- When MapReduce is merged into trunk, I'll make appropriate changes to OC to support MapReduce.
- This MapReduce-compatible OC will be offered to the Nutch codebase. As of today, I've removed most usages of JDK1.5 features, so the other thing that needs to be removed is Spring Framework dependencies.
- I _might_ keep a version of OC which is more experimental and uses Spring, and can operate on a more standalone basis. The advantages (more autonomy over what I can do and not, Spring support) will have to be balanced against the disadvantages (duplication).
27092005 edit
I'm browsing MapReduce sources now, and I've found the complexity of the Fetcher has rather significantly increased. I'm leaning towards maintaining a simple non-mapred fetcher for folks who don't need the MapReduce scalability.
Comments Off on OC and Nutch MapReduce