Hacker News new | ask | show | jobs
by stassajin 1395 days ago
"In my experience, companies are typically evaluating spend on other platforms and after some testing, moving additional workloads there to displace cost elsewhere"

Fair point, some of that net revenue increase is because of consolidation of workloads, although the majority of the cost is likely still driven by consumers expanding usage beyond what they expected. As I mention in my article, the second part of increase in costs has to do with data governance, and my argument is that snowflake doesn't make governance easy. Why can't they stand up a IAM-like service with a nice UI and dashboards? why can't they make integrations with pagerduty, slack, email work out of the box? Why can't I specify team based budgets and instead have to do it on a per warehouse-team basis? Why do I have to build custom bespoke tooling on top to make governance work?

I can unequivocally say that at a certain scale you need to move on and that Snowflake and many of the SaaS providers are too expensive even at medium scale companies. This article describes this paradox better than I could: https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap...

Moreover Snowflake's enterprise pricing model is even more non-scalable. Why do companies often have to pay two times higher price per credit relative to the standard model? Shouldn't guarantees on security or support come with a fixed cost? Shouldn't enterprise offer economies of scale in pricing?

I also wish folks would read my article from end to end because my conclusion in the article is that you don't really have a choice but to use an enterprise solution when your scale is small. If I had to start my own company and had only 2 data engineers, you betcha I would use Snowflake and DataBricks.

--- btw, it really surprises me that nobody has commented on the workload manager. Am I the only one seeing that as an issue? I have enough exposure to compare it with Redshift and I can say that Snowflake's workload manager is just very bad at optimizing throughput.

1 comments

I read your link. My immediate reaction:

1) I think Andreessen Horowitz has probably oversimplified the issue based on the Dropbox outlier. It's easy to say you can build your own datacenter to manage stuff but the costs in people are really hard to offset, especially with the security posture and level of 999s that most companies need. Not only that, but throw in disaster recovery, so now you've doubled the costs (two data centers). Etc. Plus hardware ages rapidly--you want to pay for the "floor sweeps" (as Teradata used to call them) every few years?

Beyond the complexity, some of these companies simply could not exist without the Cloud. Take Snowflake. How big of a data center would they need? How many servers? How much disk? How do they know if Dropbox wants to load 1 GB, 1 TB, or 1 PB of data? Answer: they don't. This type of model only works if you can leverage the essentially unlimited scale of the cloud providers. I don't miss the days of loads failing due to being out of disk space and having to scramble around trying to find things to delete.

2) Regarding pricing policy, Snowflake makes it very clear which features are included with which edition: https://docs.snowflake.com/en/user-guide/intro-editions.html

Your link also says Snowflake paid 44% of their revenue in 2021 on Cloud. If that is true, perhaps Snowflake loses money at standard edition, and presumably there is a larger internal cost to supporting some of the higher end features like Private Link that Snowflake needs to recapture. Regardless, as a Snowflake customer, I can determine what features I need and decide if the price they are charging is worth it or if I should look elsewhere. I can say from experience that some of these features and even paper certifications aren't easy and can be very expensive to maintain.

I will tell a story that will age me. Decades ago I used to work for a company that needed a "business continuity" plan. We had to show that we could continue to function if our data center was destroyed by natural disaster. We paid a company in another region that had essentially a copy of all of our hardware, and once a year we'd send our backups there and bring up all of the systems to prove we could. As you might imagine, this service was insanely expensive.

Flash forward to now. Snowflake has a feature called failover/failback with connection redirect. With a few commands, you can replicate your entire database elsewhere, you can incrementally keep the remote target up-to-date, and you can test it as often as you like with connections failing over generally in under 1 minute. If your company needs something like this, how much would that cost to build yourself? Maintain? Test? Clearly there must be customers who did that evaluation and decided that Snowflake's approach is way cheaper. If you disagree, don't use that level of service, or build it yourself. You say SaaS providers are "too expensive" and that even "medium scale" companies can do better themselves, but that isn't my experience.

3) As discussed in the previous comment, no doubt Snowflake can make improvements. However, what I see from my limited view as a (probably much smaller) customer, Snowflake is doing that. In fact, two of those improvements you call out are already in private preview and were discussed at their recent conference. If my company was briefed on these features post Summit, I'd be highly surprised if yours wasn't.

Thanks for sharing your perspective. It's always useful to get a more experienced viewpoint. I agree that managing hardware is not something that should be taken lightly and that only companies at scale can do that and should do that: uber, facebook, dropbox. I'm not pushing for managing your own data-centers, but I'm overall more hopeful that open source gets better and more data engineers learn the craft, it would be cheaper to run things yourself once your Snowflake bill is in the millions per month.