Hacker News new | ask | show | jobs
by douglasfshearer 2605 days ago
Slack has a 5-year, $50M/year minimum commitment with AWS.

From the document:

> In April 2018, the Company executed an amendment to its existing agreement with Amazon Web Services (“AWS”). The amended agreement was effective as of May 1, 2018 and continues through July 31, 2023. The Company has minimum annual commitments of $50.0 million each year of the agreement term for a total minimum commitment of $250.0 million. As of January 31, 2019, the Company had a remaining minimum payment obligation of $212.5 million to AWS through July 31, 2023.

3 comments

Perhaps I'm just not aware how things work in companies of Slack's size, but... what do you think $4M/mo is spent on, for what essentially amounts to a chat app?

I realise there a lot of extras in Slack (attachments cost S3 storage, video calls take bandwidth, webhooks take some processing), but as of January 2019, they had 10M daily active users. $50M/365 gives us $137K per day. $137K per day just to serve 10M active users? That's nearly $14 per day (over $400 per month!), for just 1000 users using a simple chat app.

This is not including any staff, development, nothing. Literally just hosting costs, which should already be deeply discounted given the amounts and agreements involved. That seems... excessive?

Edit: It's worth noting that $50M is the -minimum- commitment they have to AWS. One can assume the actual bill is higher.

4.2M / month on AWS is a hefty bill but it's not unheard of. Slack is, I suspect, paying for all the marked up extras such as various expensive regulation compliance addons, at-rest encryption, multi-AZ+multi-region backups and mirrors and what not. There's also probably a huge amount of stuff they use on their side to run their own tooling and analytics on their clients; costs that won't necessarily rise per user.

The part that "essentially amounts to a chat app" is probably one of the least expensive portions.

It's still high don't get me wrong, but not shockingly high considering how slack is used and trusted by so many companies.

The thing I find curious is that businesses of this scale are still using cloud hosting. Is it cost-effective or otherwise better in some way to outsource your infrastructure instead of hiring an in-house IT team to manage your own hardware and connectivity at this level?
Generally, cloud is still better for many. Managing datacenters at scale is hard. It also takes time to build up capabilities in house, while cost of delay is usually far greater than cost efficiencies wrung out of infrastructure. Cloud is often a euphemism for “supported hosted software that happens to come with hardware”. Not that different from Dreamhost managing PHP for you , just richer and higher scale. Why build a cheaper internal capability over 6 months when I can have a slightly more expensive service NOW that I don’t have to worry about? This is why we have 3rd party transportation companies, telecoms, power plants, etc.

Netflix still uses cloud hosting for most things. Some like Dropbox have found a way to DIY. On the other hand, Gitlab tried to move to in house kubernetes on bare metal, and reversed that position

Thanks for the insight.

Why build a cheaper internal capability over 6 months when I can have a slightly more expensive service NOW that I don’t have to worry about?

If it really is just slightly more expensive, that seems like it would be a good investment for many businesses. I was just curious because this isn't a field I've been working in directly for a while.

Last time I looked, but that was several years ago, there was a sweet spot for a lot of the cloud infrastructure services but at both the lower and the higher end the pricing didn't seem to make much sense in most cases. On that higher end, you could have bought the equipment outright, hired a substantial team of good people to manage it, and established your own presence in serious data centres with good connectivity, and still been considerably better off.

I wonder what has driven the change in cost/benefit since that time. Maybe it's just that cloud hosting is better understood and has better tooling, and those in turn make the market more competitive now?

In general, AWS/GCP/Azure hosting rates are only slightly marked up as compared to DigitalOcean, Vultr, etc. especially with reserved instances.

These rates compare favourably to rolling your own DC (rack or more).

It’s the bandwidth costs that are inflated by 500-1000%, which is where all the margins come from and it creates a lock in effect as getting your data out is expensive.

It comes down to cost/benefit of delay on actions with their own window of opportunity and rates of return. If I delay taking actions because I’m waiting for IT services, that’s a real opportunity cost that should be weighed against the higher unit cost of cloud services. It does me little good if I have all this cost effective hardware and software managed internally but it still takes a week/month/ More for a developer to get an extra 5 TB and 100 CPU cores, or to get a firewall rule opened, or to get a new subnet created, or a new DNS zone.

Procuring gear, hiring a team, contracting connectivity, testing, integrating, scaling, etc, takes months. It also presumes you’ll attract, hire and be able to fund management that understands modern processes and can get things done in a timely, quality fashion. Even the best in the business are 50/50 at getting this right, so there will be growing pains. Whereas a top 5 cloud provider almost always has world class practices and processes behind their services and are a credit card transaction away. Much less capital commitment, much less time commitment.

Put another way, why prematurely optimize when you don’t necessarily know what you need long term ? Startups or even new products at large companies need to focus on product/market fit and responsiveness. Their processes and structures should be more like a tent city with gradually paved cowpaths than a planned city.

In the case of a venture funded startup, time is more valuable than capital. In the case of a large enterprise, it depends - sometimes time is more valuable, sometimes operating cost needs to be squeezed. Cloud of all forms (private, public) has become very lucrative in enterprise because of the slow pace and intransigence of IT teams that were assembled in an era where technical and software services could suck and take years to solidify. These days software needs to suck a lot less, and quickly - customers are demanding it. Cloud is not mainly about where you do your computing, it’s about how you do it: on demand, fungible resources, granular billing, API-driven access. I’m sure I can get the costs down if I own all the gear and have a flexibly contracted network, but I still need to ensure I have the automation, processes, and practices that meet the business need for velocity. I can’t risk hiring a team that might put up a ticketing system and manage every request by Excel spreadsheet if they don’t know better.

Building good software means providing developers with infrastructure and tools they need to act quickly with safety, and most importantly, giving them the ability to change their minds without a major cost/capital hit. Cloud (or, as I say above, on demand, fungible infrastructure and rented software) is a major path (but not the only) to get there.

Netflix uses AWS for their site and developement, but the most expensive part (streaming) is still happening from their own connect boxes that they provide to ISPs[1]

[1] https://openconnect.netflix.com/en/

My company recently dropped our in house data center and moved almost entirely to AWS. We had a few reasons; we just don't have the multiple data centers to gauruntee uptimes. Our product isnt data security so our limited staff can't keep the data as secure as AWS whose entire business is around security. And we just got to a point where it was cheaper to host on AWS than just to maintain our own data center. I hope that offers a little insight
I don't think moving things to AWS makes things more secure by default. It is actually easier to create services in AWS or GCP (never used azure) that are publicly open than implement proper security (I remember specifically RDS defaulting to public IP unless you set up private subnets, same with GCP SQL (although they blocked all access but default, though once again it is easier to unblock it for all than e.g. using their proxy), GCP VMs automatically get public address unless you explicitly disable it, don't remember EC2 but I think it was similar.. So the argument of not needing to have someone who knows about security is a good one. You need that person as much (if not more) with public cloud.
$0.01/AU/day isn't crazy. I'd actually argue its low, compared to some of the other enterprise or social media bills we've seen.

You can't just say "I could serve 1000 users on a $240/year VPS, they should be able to serve their 10M users on around $2M/year!" Things don't scale like that. The complex dynamics of the world can't be represented by linear extrapolation. Moreover, there's a balance that needs to be struck between the complex process of "spending less on cloud" and the complex process of "developing the product". Everything costs money.

Scale, redundancy/backups and security.

The 10M daily active users are spread across every continent, in different time zones. All with the expectation of near real time delivery of messages, push notifications/emails and file uploads. The expectation that everything is immediately searchable and that you can search across messages and files thought the entire history of your slack usage. The expectation that there is an audit log of every message, whether it's been deleted or not and that data is never lost (due to HR/legal needs). And the expectation that everything is secure.

A Slack team is a self-contained unit, however, if I understand correctly. It -should- be easily horizontally scalable (please correct me if I'm wrong). Each team could have its own database, its own app servers running on whatever region(s) was/were needed. So it's not like they would have some mammoth central database that requires strong scale engineering. Furthermore, you know in advance how big each team is because they all pay you for X users, so you can allocate resources to them appropriately.

Lyft's AWS bill (from their S-1) is much higher, but their application has very different scaling constraints to something like Slack, it's not as easily horizontally scalable. Even though their bill is high, I suppose it can be hand-waved away as "oh scaling's expensive".

And a lot of the redundancy/security comes built into AWS services. S3 has redundancy built in, there are Multi-AZ RDS instances with easy support for at-rest encryption, and there's container orchestration these days for easily handling app server redundancy and worker servers. So a company starting out, like Slack, just a few years ago, would have access to all of that without much additional overhead.

I'm seriously fascinated by what it is that makes it so expensive. I suppose the real explanation might just be that there's no incentive to optimise for costs. It's like Slack's own app: A native app -could- be built that is super efficient and light, but there's no incentive to optimise for that.

Small slack teams are easily horizontally scalable; for a small team, the web server, the app server, and the db could probably run on a single EC2 instance, and AWS offers some rather large instance sizes.

Lets start there, though. 70k stand-alone (paid!) slack teams means 70k stand-alone systems. How do you operate, well, all of them, simultaneously? With one mammoth central database, there's one database to upgrade; if it goes down, there's one database to fix. With 70k small databases there's 70,000 problems! With 70,000 systems, how do your engineers deploy code, and how many times per day can they do it (it had better be well into the double digits)? How do you roll them back? What do you do if an upgrade goes wrong? With 70k different apps, one small problem quickly becomes 70k small problems, which is harder to manage than 1. Some things can (and I'm sure are) scaled horizontally but the isolation that grants you does not come for free.

And then, what about past that? Looking at the customers listed on Slack.com, they serve some larger enterprises, who are going to need the "expensive" level of scaling. No database is going to be able to scale to that level without team to manage it (no matter the technology), so then you need a queue as well as a db, plus a team to manage each of those, and then how do you do searching/indexing. You also can't ever take a single database node offline, so then it's a database cluster, with hot spares, and also large enterprises operate globally so then their slack team system needs to run multi-region hot as well, and then and then and then? I've got Slack open all the time on both my (work) phone and my (work) laptop as do the majority of my coworkers, which means their webservers have heavier requirements compared to Lyft, which I use for a few minutes whenever I take a ride.

Slack usage will hit a lull outside of business hours, so you'd want it to scale resources that serve that - I'll bet a non-insignificant portion of the $4M/month probably goes to resources that are only used during the business day - so in some sense, Slack is paying AWS a premium to not pay them for unneeded resources at 3:30 AM.

Slack's optimized their app for development cost (much to my laptop's sadness), it doesn't seem that far fetched that slack has also done some optimization of server side costs. future money isn't worth as much as money today is, and this fact is reflected in AWS RI offerings.

>so in some sense, Slack is paying AWS a premium to not pay them for unneeded resources at 3:30 AM.

If you're running a server for 1/3 of the day (8 hours), you're probably better off using dedicated instance (60% discount with 3 year reservation) than trying to optimize around on-demand instances (66% discount with perfect allocation, ignoring the engineering cost). The economics are even worse if you consider imperfect allocation, or consider self-hosting (probably cheaper if you're as big as slack).

You say "chat app" like that means it's easy.

How is running a chat app any easier than, say, running Facebook?

My pet peeve with HN. People here boil everything down to a trivial engineering problem. That Slack is "just a chat app" is grossly underselling all the other parts of the business - sales, marketing, design, product - that makes it tick.
The app itself is non-trivial, too. The scale is astounding.
You are comparing to Facebook. Obviously Facebook has been trying to get into this market as well with their corporate offerings (which they've been surprisingly quiet about lately, suggesting that effort went nowhere). Maybe more appropriate for comparison would be their whatsapp team given that it is notoriously quite small given their enormous user base.

Slack is indeed just a chat app. There are many like it. Most of which look and feel very similar at this point. I administer a slack setup for our company and it's fine but it's nothing special. However, for what it does and what it cost, I'm not in a mood to replace it with something else. The hassle would cost us more and we'd not save a lot of money or gain any functionality that we need or indeed solve a problem we have.

We switched to Slack a few years ago from hipchat which at the time was very similar in scope, feature set, and cost. The reason we switched was that we wanted to get rid of bitbucket and some other Atlassian stuff (in favor of Gitlab, and later Github). I've also used stuff like IRC and even NNTP in the past, neither of which is appropriate for non techie teams. Lately, I've been considering switching to keybase which has a nice and easy to set up team component (I've actually set this up already). I'd probably go with that for new teams though it is still a bit rough in some respects.

Slack has awesome brand recognition but ultimately it doesn't have that many unique selling points beyond that. They've clearly grown by converting investor cash into customer acquisition. It's a common pattern with VC funded SAAS companies: compensate for a lack of unique selling points or technical edge with stupendous amounts of marketing and sales. If you think their hosting is expensive, their marketing and sales are likely way more expensive. It never was a proper tech company where things like algorithms, their awesome infrastructure, or patented stuff are the key things. It always was just another chat app done well.

Their hosting cost is quite high and suggests that they tend to throw money at problems instead of engineering talent. That's both fine and common for VC funded startups but it also suggests they will go through some lengthy rounds to re-architect internally and optimize their cost structure in the next few years after they IPO when shareholders are going to be obsessing about shareholder value.

From a technology point of view, they should indeed be able to run at a fraction of the cost but right now that's not a priority for them as they are very well funded and have a need to grow as fast as they can. Cutting cost through lengthy and complicated re-engineering projects is probably very low on their todo list and would be likely to just slow them down.

They are actually surprisingly middle of the road in terms of what they do. They do it well but when you look at their feature set there's nothing really that remarkable or unique. Their UI is alright but generic electron/react (?) which is notoriously not that fast but gets the job done. There are a lot of electron based chat apps out there and whatever your point of view on those is, slack is nothing special in that sense. Sure their UX is awesome and they clearly have some design hipsters running the show and obsessing over things like color schemes, logos, smileys, etc. But in the end it's just a generic chat app. E.g. Telegram, Signal, Facebook, Skype, Whatsapp, FB Messenger, (and Google's many attempts to compete with the Cartesian product of those) etc. each have very similar client side architectures and feature sets as well.

Sever-side they probably use Elasticsearch (which I'm well familiar with) and they seem to have a lot of centralized infrastructure and plumbing. From having used it, their search engine isn't actually that sophisticated or impressive. Clearly search ranking is not a huge attention area for them. Obviously a complicating factor is that they are running their stuff in multiple data centers across the globe. Adding to their complexity is enterprise needs for backups, auditing, security, compliance, etc.

Given their age and hipness, they probably bought into micro-services in a big way. That just means they run a lot of stuff that they scale by throwing more hardware at it. Over-provisioning is a great way to hide any performance issues. If you have dozens of micro services running in multiple data centers, things add up quickly. Add hosted data bases, search engines, queues, analytics, monitoring, devops, etc. to the mix and you are looking at some hefty hosting bills at the scale they are running it. Also many of their bigger customers probably insist on dedicated setups for them. When you grow rapidly, a lot of that stuff is just a side effect of Conway's law where you end up with a lot of moving parts because you have a lot of different teams.

> You say "chat app" like that means it's easy.

You're right, it takes a mind-boggling amount of wasted effort and negligence to turn a simple thing like corporate chat into something as bloated and broken as Slack.

Making something simple and easy is harder than just letting entropy destroy your product.

Every team needs a dev/test environment. You know how you've always wanted slick metrics collection to make your job easier? They have that, at scale. Fraud detection and bot throttling? Running at scale with machine learning. It all adds up fast.
That’s nothing for a company making that much revenue, they have to keep all the websocket connections open for notifications, index all the messages for search, store files, host video chats / phone calls, etc... it’s not trivial.
There is a lot of behind the scenes processing for Search, Analytics, Security, Enterprise enablement, internal tooling etc. It may not seem a lot, but the processing/storage needs are Mammoth.
Much of this goes into geo redundancy too. These 10M people are not in one place. So, you need redundant highly availably deployments in many regions. This does add lots of infrastructure duplication but is also super speedy for end users. This adds to the cost big time! For example, I was working with a gaming company and they had 10+ regions around the world, all using this type of setup, just to keep latency to an absolute minimum. I'm sure slack is doing the same.
There's a big difference, though; they're not all connected. Each team is its own separate entity. A team with 10 people might pay Slack $100/mo, and all be in the UK. Those people and that Slack team's database doesn't need to interact with anyone else in the system. There should be no big scaling constraints here, unlike your gaming company, where everyone needs to be connected from anywhere in the world, at the lowest possible latency.
My company is not particularly big but just in my team of a dozen or so we have members in Boston, Australia, Phoenix, and Seattle, and regularly deal with those in London, Tokyo, etc. It's not that unusual; in fact it's part of the reason Slack (and Hangouts) are so important.
Hm, I may lack the proper knowledge. Why do I need redundant infrastructure? Why can’t I have instances in the cheapest region (300 ms delay is not going to kill anyone in a chat app), and if an instance fails, bring up a new instance, and if the region fails, bring up instances in another region. I don’t see why there should be redundant, idle instances running. Maybe duplicate the database / make it highly available.

I also don’t understand how duplicated infrastructure makes it super speedy for end users when they are from around the world. Yes, they could connect to regional instances, but then the regional instances must synchronize with the other regional instances on the other side of the planet, which gains nothing.

> which gains nothing.

It gains tens or hundreds of milliseconds, especially for channels/chats between people in the same region. You may not feel a "chat app" requires this level of performance, but the improvements there.

Some of this infrastructure duplication ought to go down by that scale, as there are likely a good number of users at any given location.
Cannot this be a method to take money out from company? I.e. Slack as a company pays $50M/year and Amazon pays back 20% of this as a "reward"?

At such scale it should be easier to have its own infrastructure rather than pay Amazon's overpriced bills.

Theoretically any payment can be. That’s why we have corporate auditors, money laundering and financial investigators, etc.

I doubt Jeff Bezos is running some slush fund laundering scheme.

I'm sure a big amount of this is data costs, especially as they store more analytics, natural language processing, and ML training on the data.
Knowing how famously inefficient the Slack application is with end user resources, I wouldn’t be surprised if their backend has been poorly optimized.

I serve more than 1000 users on a $3.50/mo VPS with a not-particularly-efficient runtime just fine.

Cool, 10000 more of those boxes and you can rebuild slack "in a weekend for $35k/month".

Don't mind the lawsuits when you lose your customer data and forgot to pay for backups for "cost optimization" reasons.

Don't mind the complaints when your uptime is barely one-nine.

Don't mind the customer cancellations when your 1000-user VPS doesn't actually scale to 20k concurrents.

And don't mind the HN snark when you release your slack competitor without voice/video chat support, webhooks, apps, zero debugging capabilities when things go wrong, etc, etc.

I don't even like Slack and I think their bill is too high but please don't be delusional, it's a tired HN trope.

I think you're being overly negative. I think you can do a lot with a little.

Does a chat app need to be fully searchable? I don't think so. Does it need backups? I don't think so. These are features users could opt-in for if it was that important to them. Could be something simple as sending encrypted nightly diffs to an email address.

The whole world doesn't need to be over engineered to have a useful product.

> Does a chat app need to be fully searchable? I don't think so. Does it need backups? I don't think so. These are features users could opt-in for if it was that important to them.

They do, that's why people pay for it.

> Does a chat app need to be fully searchable? I don't think so.

Having had a meeting this afternoon where the critical information required for me to pick up a project was only available in Slack messages, yeah, I think it does.

(Personally I would be roasting the two people involved for not doing that discussion over email or at least not writing the damn things down afterwards but hey ho, I'm not in charge.)

It's not an unfair point, given how big MVP culture is here.

Unfortunately, "sending encrypted nightly diffs to an email address" comes off very much the same way as the infamous dropbox comment: https://news.ycombinator.com/item?id=9224

Having a search feature, and doing useful backups isn't over engineering, they're features that users are opt-ing in to paying for, to the tune of $400 million/yr.

>Does a chat app need to be fully searchable?

This is the main reason people pay for Slack instead of sticking with the free tier, so I'm guessing that for a lot of people, it does.

> Does it need backups? I don't think so.

When all the communications of you and your coworkers suddenly aren't there the next morning, you might think so.

> Does a chat app need to be fully searchable?

Absolutely, yes. There is a ton of tribal knowledge contained in chat history.

Scaling and cost is not linear, my friend.
thats still only half as much as Lyfts $100M/year: https://news.ycombinator.com/item?id=19282624
And Snap spends $400mm a year with Google Cloud, not to mention another $50mm on AWS.
Lyft is arguably solving a much harder problem than essentially IRC
Slack is way more than IRC. Literally dozens of nontrivial features more.
You could also say "Slack is arguably solving a much harder problem than essentially a taxi company"

Both are solving really tough problems involving a lot of users and a lot of data.

Slack to IRC is what a library is to a bulletin board.
How is that possible since even instagram spent around 10k per month for few million users? Am I missing something?