| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fnordpiglet 1397 days ago

It is a terrible article. I’ve been on the engineering side of these big data platforms including snowflake in its early days, Paraccel (redshift’s code ancestor), redshift, and others you probably use but don’t realize are actually hyper scale database engines. The author missed the mark consistently. I chortled when he discussed the redshift WLM which I helped design a very long time ago and it’s absolute garbage. Snowflakes entire point is you can decouple the storage and the database from the warehouse query engine to provide total isolation from noisy neighbors. If you’re encountering noisy neighbors you’re using the product entirely wrong.

And you’re right. The motivation snowflake has to improve is survival. It’s not like their architecture is impossible to replicate. Redshift is doing a total reorganization of the product and rewrite to compete more directly with snowflake (redshift aqua etc).

They also seem to completely discount the value of SaaS outsourcing database and storage operations to snowflake whose only focus is operating the database product. Running your own clusters is an exercise that seems smart in the first few months then like a puppy when it grows up you’re stuck with a dog. If you love dogs and train them well then great. But fact is most people are terrible dog owners, and the same is true for MPP clusters. Being able to focus on the query management operations exclusively is really ideal. Highly stateful distributed products are a PITA.

He also rants about snowflake not telling him the hardware. Snowflake runs in ec2, gcp, azure. You can literally guess the hardware types - there’s just not that many saddle point instance types for that sort of workload. Discussing ssd vs hdd is also an obvious sign of ignorance - it’s basic premise is it does very wide highly concurrent s3 gets and scans of the data using a foundation db metadata catalog to help prune. Being in aws, it’s implausible they use hdd and realistically they could elide ssds (I do not remember if they use local disks for caching, but it’s stateless regardless).

The unit costing being hardware agnostic is totally normal too - they don’t have to expose to you the details of their costing because they normalize it to a standard fictional unit.

1 comments

bennyelv 1397 days ago

I'm a snowflake customer and I've felt/am feeling all of the pain that this article talks about. There might be some handwaving over technical complexity that you don't like given your detailed understanding of how the thing is built, but the article is fundamentally right in its message.

The thing it's most right about is the power imbalance and the innovators dilemma. I've had more than one instance of the case where we've found that query performance/cost is too high, complained about it, and Snowflake have "made a configuration change" (undisclosed) that has brought the cost down.

fnordpiglet 1397 days ago

Don’t you have the same issue with any query optimized product? If I’m using redshift and hit a bad execution plan that I can’t get around by tweaking the query I’m SOL, and redshift engineers aren’t going to tweak a configuration change to help me.

This is why products like DynamoDB were created - cost based optimizers are imperfect and unpredictable, and once you’ve stepped over some limit or threshold performance wildly changes. The reasons can be your query, or the data has changed, or there’s a noisy neighbor consuming a resource you depend on for your query. If you need highly predictable times you can reason about you won’t get it from any RDB solution.

Given that, what about snowflake feels different? That the details are obscured from you so you don’t understand why things are happening? Is the lack of ability to deeply introspect making you uncomfortable? My experience had been the ability to introspect rarely leads to any change in outcome but instead leads to me identify the query optimizer has done something stupid I can not do anything about, but at least I can point to the specific resource being exhausted by it.