| HN Mirror

Agree with your tip about larger RA3 node sizes. An advisor at AWS during a resizing POC clued me into it and having switched one big cluster to ra3.16xlarge, the leader node CPU utilization has definitely dropped for our peak Monday morning workloads.

(Yes, with a very large ra3.4xlarge cluster with substantial Redshift Concurrency Scaling, our leader node was in the peak-most hour nudging above 50%.)

I'm still not clear whether switching to SNAPSHOT_ISOLATION over SERIALIZABLE actually helped or hurt our huge workload. Anecdotally we sort of traded fewer deadlock-killed queries for longer-locked queries which may have grown our overall expenses.

One minor thing I've anecdotally noticed recently that seems to have changed at some point, perhaps around the time of our transition to ra3.16xlarge, not sure: in the old days, Redshift Concurrency Scaling seemed to kick in the minute that one query was queued up. Nowdays it seems to kick in about 5 minutes later.

I'd prefer it if AWS added some ability to only activate concurrency scaling when a queue has X queries queued or queries queued for Y minutes or something.