Hacker News new | ask | show | jobs
by sitkack 4597 days ago
This is the `grep/awk` use case. The nice thing about streaming mr interface to hadoop (calling external programs) is that you can literally take your grep/awk workflow and move it to the cluster. Retaining line oriented records is a huge step in having a portable data processing workflow.