MapReduce Tutorial: Checkpoint

You now know about all of the basic operations of the Hadoop MapReduce platform. Try the following exercise, to see if you understand the MapReduce programming concepts.

Exercise: Given the code for WordCount in listings 2 and 3, modify this code to produce an inverted index of its inputs. An inverted index returns a list of documents that contain each word in those documents. Thus, if the word "cat" appears in documents A and B, but not C, then the line:

cat A, B

should appear in the output. If the word "baseball" appears in documents B and C, then the line:

baseball B, C

should appear in the output as well.

If you get stuck, read the section on troubleshooting below. The working solution is provided at the end of this module.

Hint: The default InputFormat will provide the Mapper with (key, value) pairs where the key is the byte offset into the file, and the value is a line of text. To get the filename of the current input, use the following code:

FileSplit fileSplit = (FileSplit)reporter.getInputSplit();
String fileName = fileSplit.getPath().getName();

Big Data Analytics

Checkpoint