| Hive is not particularly fast in and of itself; it just has horizontal scaling and a SQL-ish front-end. Looking at AWS RedShift's homepage[1] (emphasis added): > Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes. Column stores databases[2] can be screamingly fast for analytics operations compared to RDBMS or other DB types (ala assorted NoSQL). See Kdb[3] or MonetDB[4] for examples of specific implementations. I'd fully expect a competent column store designed for horizontal scaling to obliterate Hive for a wide range of problems. The usual big-data caveat: you need to pay attention to the fit of your tools against your problem and your data. I don't expect RedShift to be any different. Still, it's pretty exciting to see a new analysis DB tech cropping up like this. And doubly interesting to see this coming from Amazon. [1] https://aws.amazon.com/redshift/ [2] https://en.wikipedia.org/wiki/Column-oriented_DBMS [3a] http://kx.com/kdb-plus.php [3b] https://en.wikipedia.org/wiki/K_%28programming_language%29#K... [4] http://www.monetdb.org/Home |
There is a lot of new DB tech, Redshift doesn't seem particularly competitive at the moment unless you only need to use it a portion of the time, where Amazon excels.