Hacker News new | ask | show | jobs
by jaytaylor 4866 days ago
I haven't tried redshift before, but coming from a MR/Hadoop/Hive background, this seems to me like quite a sensational claim. I'd be very keen to hear other's thoughts on how widely these kinds of gains would apply for BigData processing.

As Carl Sagan said..

"Extraordinary claims require extraordinary evidence"

http://en.wikipedia.org/wiki/Carl_Sagan

2 comments

Hive is not particularly fast in and of itself; it just has horizontal scaling and a SQL-ish front-end. Looking at AWS RedShift's homepage[1] (emphasis added):

> Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes.

Column stores databases[2] can be screamingly fast for analytics operations compared to RDBMS or other DB types (ala assorted NoSQL). See Kdb[3] or MonetDB[4] for examples of specific implementations. I'd fully expect a competent column store designed for horizontal scaling to obliterate Hive for a wide range of problems.

The usual big-data caveat: you need to pay attention to the fit of your tools against your problem and your data. I don't expect RedShift to be any different. Still, it's pretty exciting to see a new analysis DB tech cropping up like this. And doubly interesting to see this coming from Amazon.

[1] https://aws.amazon.com/redshift/

[2] https://en.wikipedia.org/wiki/Column-oriented_DBMS

[3a] http://kx.com/kdb-plus.php

[3b] https://en.wikipedia.org/wiki/K_%28programming_language%29#K...

[4] http://www.monetdb.org/Home

SAP HANA has a column store, and a row store, and does OLAP (Analytics) and OLTP.

There is a lot of new DB tech, Redshift doesn't seem particularly competitive at the moment unless you only need to use it a portion of the time, where Amazon excels.

Given the legendary performance issues of Hadoop I am not really surprised.

Hadoop is heavily horizontally scalable, but that's about it.