| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by flavor8 4602 days ago

I'd be interested in hearing from anybody using RedShift in production. Currently shopping around for a database to store 1.5 billion interesting things, and RedShift's at the top of my list.

Specifically - how many (small/large) nodes are you using, how big is your data, how many simultaneous users does your setup support, and what kind of performance do you see?

Thanks in advance.

4 comments

vosper 4602 days ago

Not in production yet, but my company's in the process of moving from Infobright to Redshift, mostly because Infobright, while extremely fast, is a single-server solution and we think that within a year we'll have outgrown its capacity.

Our data is 3 denormalized tables with the same ~1000 column schema, the largest of which contains 7B rows. Most of our queries are simple aggregations down a few columns, with simple constraints like date range + customer ID.

So far we're really impressed with Redshift's load performance and ease of getting up-and-running, but we're still an order of magnitude away from Infobright's query performance. Next steps are playing with distribution and sort keys, as well as trying different cluster configurations; to be fair to Redshift we've not run it on more than a 6 XL cluster yet.

link

onemoreaccount 4602 days ago

We use Redshift in production to power a customer-facing app, but don't have web apps hit it directly (that really couldn't work with the concurrent query limits). Workers query RS and cache results in another database and the web app hits that database. We handle writes in the low 10s of billions per day. Our data format is very simple and we get great compression, so we currently fit everything on two clusters — one of 3 XLs and one of 6 XLs.

Performance is great — except when it's not. Simple aggregations on tables that are < 1B rows and two table joins on tables that are < 100M rows are blazingly fast (maybe 1-30 seconds depending on the query). Larger tables than those and it can start to crawl.

link

dude_abides 4602 days ago

We're using it as a replacement for our enterprise data warehouse. Total size so far ~1 TB compressed, daily delta ~10s of GB compressed. We're using a two XL node cluster. The best part is that you can increase the cluster size without downtime (except you can't switch from XL to 8XL this way). Right now we have a single digit number of simultaneous users (analysts). Did not have to tweak the Workload Management configurations too much so far.

I'd be interested to hear from others with bigger workload than ours. For example, someone from Netflix (They switched their data warehouse from Vertica to Redshift this year)

link

sloth0000 4602 days ago

I have not used Redshift in production but it's been getting pretty good reviews thus far. I did some testing using pretty small dataset which you can read more about here: http://bicortex.com/amazon-redshift-review-data-warehouse-in... I don't think it's the answer to all data storage/warehousing needs but it does look promising and it should only get better.

link

rurounijones 4602 days ago

Opened your page with interest but the tiny white text on black background is eye-searing.

Even with manual zooming it is still a bit harsh due to the colours (Maybe just me though.).

link