|
|
|
|
|
by bmcfeeley
2170 days ago
|
|
I'm no expert, but my understanding is that pig is a combination of - a language for specifying data transformations, and - an engine to compile programs written in that language into mapreduce jobs to execute on a hadoop cluster it was designed to easily map some common functional and SQL idioms (e.g. filter, group by w/ aggregation functions) to parallel execution for processing huge amounts of data. Impala is another big data project that is an engine for planning and executing SQL on data stored in a hadoop cluster. Zookeeper is... black magic?? |
|