We actually haven't been running against any limits here. One thing to keep in mind is that postgres remote-fetch operations aren't tuple-at-a-time, so this shouldn't be a bottleneck for our multi-node operations.
Have you done any analysis of your per-core scan rates for simple aggregations like sum/count + group by with a reasonably large cardinality key? Or has anyone published a benchmark you trust on queries of that variety?
An example would be TPC-H Q1, which is a little weak on the group by cardinality, but is good for testing raw aggregation performance.
An example would be TPC-H Q1, which is a little weak on the group by cardinality, but is good for testing raw aggregation performance.