Hacker News new | ask | show | jobs
Ask HN: Learning by doing : Hadoop
3 points by phenomenon 4853 days ago
Hi,

I picked up a Hadoop couple of months back and am finding different ways to use it so as to apply what I have learned so far (MapReduce, Hive, Pig, etc).

As Hadoop is really used in environments where the data to be queried is large, I started looking around for such kind of data. I came across the Wikipedia data (available for download).

Now I am trying to list out the questions that I could as this data.

What are the questions that you want answered from the data available in the Wikipedia data?

This will help me write some useful MapReduce code , Hive quries or Pig scripts to improve my skills.

I just feel that learning by doing is the best form of learning.

Thanks.

1 comments

You could replicate http://en.wikipedia.org/wiki/Most_common_words_in_English using Wikipedia as your Corpus
More stuff to find: 1. - Most referenced wikipedia articles. 2. - Most referenced websites in wikipedia. 3. - Calculate the deegrees of separation from Kevin Bacon (that's en.wikipedia.org/wiki/Kevin_Bacon ) for a given wikipedia article.

#1 is interesting. #2 is valuable for SEO. #3 makes a good post on HN and will get you hired somewhere.

Thanks, that does sound like something I can start with.