Hello World for MapReduce
Posted by Kelvin on 28 Sep 2005 at 02:50 pm | Tagged as: work, Lucene / Solr / Elasticsearch / Nutch, crawling
Here's a Hello World tutorial as part of my attempts to grok MapReduce.
HelloWorld.java :
import org.apache.nutch.mapred.JobClient;
import org.apache.nutch.mapred.JobConf;
import org.apache.nutch.util.NutchConf;
import java.io.File;
public class HelloWorld {
public static void main(String[] args) throws Exception {
if (args.length < 1) {
System.out.println("HelloWorld ");
System.exit(-1);
}
NutchConf defaults = NutchConf.get();
JobConf job = new JobConf(defaults);
job.setInputDir(new File(args[0]));
job.setOutputFormat(ConsoleOutputFormat.class);
JobClient.runJob(job);
}
}
and ConsoleOutputFormat.java (for printing to System.out):
import org.apache.nutch.fs.NutchFileSystem;
import org.apache.nutch.io.Writable;
import org.apache.nutch.io.WritableComparable;
import org.apache.nutch.mapred.JobConf;
import org.apache.nutch.mapred.OutputFormat;
import org.apache.nutch.mapred.RecordWriter;
import org.apache.nutch.mapred.Reporter;
import java.io.IOException;
public class ConsoleOutputFormat implements OutputFormat {
public RecordWriter getRecordWriter(NutchFileSystem fs, JobConf job, String name) throws IOException {
return new RecordWriter() {
public void write(WritableComparable key, Writable value) {
System.out.println(value);
}
public void close(Reporter reporter) {
}
};
}
}
And now create a new directory someplace, and create a new file (say, foo.txt). Fire up your text editor, and in foo.txt, enter:
Hello World
Important: There MUST be a newline at the end of the file, following the words "Hello World". It is also important (for the purposes of this tutorial), that a new directory be created and there is only one file in this directory.
Now, run the HelloWorld application, providing the location of the directory where foo.txt resides, for example
java HelloWorld /tmp/mapreduce/
Note: Its best to run the application directly from your IDE, so you won't have to worry about adding the necessary libs to the classpath.
After some output from Nutch about parsing the config files, you should see the text Hello World.
Congratulations! You've run your first MapReduce program!
Comments Off on Hello World for MapReduce