| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bmcfeeley 2170 days ago

I'm no expert, but my understanding is that pig is a combination of

- a language for specifying data transformations, and

- an engine to compile programs written in that language into mapreduce jobs to execute on a hadoop cluster

it was designed to easily map some common functional and SQL idioms (e.g. filter, group by w/ aggregation functions) to parallel execution for processing huge amounts of data.

Impala is another big data project that is an engine for planning and executing SQL on data stored in a hadoop cluster.

Zookeeper is... black magic??