|
|
|
|
|
by mattj
4381 days ago
|
|
I've gone through a similar transition (hive to redshift) in a very large scale data environment. Raw Hadoop / cascading is still very useful for more complicated workflows, but redshift is so vastly superior to hive it's not even funny. I thought I would miss adding my own UDFs, but this hasn't been an issue at all. I'm under the impression presto is a similar improvement, but I haven't spent any time with it. One huge advantage of redshift over hive: you can connect with plain old Postgres libraries, so you can build redshift results into your admin interfaces, one off scripts, and anywhere else you're fine trading a few seconds of latency for extra data. |
|