| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 398 days ago

> As recently shown, the median scan in Amazon Redshift and Snowflake reads a doable 100 MB of data, and the 99.9-percentile reads less than 300 GB. So the singularity might be closer than we think.

There is some circular reasoning embedded here. I've seen many, many cases of people finding ways to cut up their workloads into small chunks because the performance and efficiency of these platforms is far from optimal if you actually tried to run your workload at its native scale. To some extent, these "small reads" reflect the inadequacy of the platform, not the desire of a user to run a particular workload.

A better interpretation may be that the existing distributed architectures for data analytics don't scale well except for relatively trivial workloads. There has been an awareness of this for over a decade but a dearth of platform architectures that address it.