The WordCount is a trivial problem. What is the next best problem to tackle in Map Reduce? Something in which I can create my own Readers,Writables etc?

I kept wondering about this for some time. And then decided on what about a Search Engine? After all, the concepts of BigData originated at the house of the Search giant Google (from the papers they published about Map Reduce, BigTable etc.)

The first step was to implement an Inverted Index. To lookup the index faster, I decided to use the BigTable implementation for hadoop called HBase. So the Inverted Index creator reads the files on the disk and creates an InvertedIndex as an HBase table. (there is a small bug with respect to inserting data to HBase table, but I have written the InvertedIndex data to a file and that works. Should check it sometime later)

So now the InvertedIndex is ready, the next step is to look it up. Haven’t done that yet! Find the code in my Git Repo.

A good read on how to create Search Engine.