|
|
|
|
|
by jaytaylor
4866 days ago
|
|
I haven't tried redshift before, but coming from a MR/Hadoop/Hive background, this seems to me like quite a sensational claim. I'd be very keen to hear other's thoughts on how widely these kinds of gains would apply for BigData processing. As Carl Sagan said.. "Extraordinary claims require extraordinary evidence" http://en.wikipedia.org/wiki/Carl_Sagan |
|
> Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes.
Column stores databases[2] can be screamingly fast for analytics operations compared to RDBMS or other DB types (ala assorted NoSQL). See Kdb[3] or MonetDB[4] for examples of specific implementations. I'd fully expect a competent column store designed for horizontal scaling to obliterate Hive for a wide range of problems.
The usual big-data caveat: you need to pay attention to the fit of your tools against your problem and your data. I don't expect RedShift to be any different. Still, it's pretty exciting to see a new analysis DB tech cropping up like this. And doubly interesting to see this coming from Amazon.
[1] https://aws.amazon.com/redshift/
[2] https://en.wikipedia.org/wiki/Column-oriented_DBMS
[3a] http://kx.com/kdb-plus.php
[3b] https://en.wikipedia.org/wiki/K_%28programming_language%29#K...
[4] http://www.monetdb.org/Home