| Hi, I picked up a Hadoop couple of months back and am finding different ways to use it so as to apply what I have learned so far (MapReduce, Hive, Pig, etc). As Hadoop is really used in environments where the data to be queried is large, I started looking around for such kind of data. I came across the Wikipedia data (available for download). Now I am trying to list out the questions that I could as this data. What are the questions that you want answered from the data available in the Wikipedia data? This will help me write some useful MapReduce code , Hive quries or Pig scripts to improve my skills. I just feel that learning by doing is the best form of learning. Thanks. |