Hacker News new | ask | show | jobs
by ihid 656 days ago
We're paying ~$1k/month for all our webservers (which is a dozen ECS instances). They're handling about 3,000 requests per second (but that does sometimes massively spike to tens of thousands if not more).

We're paying ~$1k/month for all our tooling servers (so the 150 different test runners, representers, analyzers that are used to check peoples code. There's >=1 of those running every second). Bare in mind, we're running student's code in over 70 languages here. Each is a docker container (often many gb large) - so we pay for HDD too.

The biggest actual cost is the database at $2k/month. We have about 600 queries per second, and around 10MB per second (spiking to 47MB per second) of read throughput. It's an autoscaling database, but AWS determines that it's at the level it needs to be, and if I turn that down, performance suffers (I've tried).

Beyond that, all the other individual services are ~$300/m, so quite small amounts, but for things we rely on (e.g. caching servers, a shared filesystem amongst all those servers, and other things).

$1.2k on tax is also fun.

6 comments

Thank you so much for sharing your costs! I was very curious to see what the real expenses for AWS services look like, as I've always assumed AWS is massively overpriced.

From the numbers you've provided, it seems like your total cost is around $5.5k, so I assume the remaining $2k is attributed to traffic.

Everything looks quite reasonable, except for the database and traffic costs. I've run MySQL servers handling 150k reads/sec and 50k updates/sec with no issues, even on very cheap machines (around €30 per month). Years ago, we were serving over 100 million pages (of heavy content) per month, and we didn’t even bother looking at traffic statistics because, here in Germany, it’s hard to hit the traffic limits that most hosting providers impose.

That being said, AWS is less expensive than I initially thought. At the same time, I’m confident you could reduce your hosting bill by up to $2k without even leaving AWS by setting up your own database server. Moving away from AWS entirely might be challenging, as managing a fleet of about 30 servers would likely take one or two days of work per week (I'm managing a dozen mostly idling servers and I work one day per month on them). When your hosting bill reaches $30k, I'm very sure it would be cheaper to hire someone (hint, hint ;-)), that moves everything to dedicated servers and manages them.

I was curious, so I just checked on one of our customers (I work at a small MSP in the UK) by way of comparison and we have on chugging along happily at more than double those numbers on a $288 Linode dedicated CPU instance. And we're only on that size for ease of disk space handling as the database is several hundred GB. CPU is basically at zero, it's the disk IO that actually gets you on some of these busier databases (from my experience).

RDS is extremely expensive. All managed databases are.

That said, it's a trade off of convenience and being in the AWS bubble, and weighing up the pros/cons of separating out services. Data Transfer is another thing to consider too of course. Sticking your database elsewhere might cost more in egress traffic communicating with it from your other AWS infrastructure. If you're all in on other AWS services, sometimes the RDS price is just worth it when it comes to the total price. Sound like this might be the case for your setup.

I hope you do manage to work things out. The service you have is great.

PS - Side note on RDS sizing. You might already know but sometimes it's worth increasing the storage size on gp3 type storage above 400GB (if you haven't already) as you get 12,000 IOPS baseline against 500MiB/s throughput[0] when you have that much storage. That's 4 times the below 400GB baseline performance but you only pay for the additional storage cost. It can make a difference if you're IOPS constrained or trying to deal with bursty traffic but want to use the smallest instance size possible otherwise to save costs.

[0]https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_...

A part of me is tempted to say 'maybe some cost reduction in cloud bill is possible' but for the scale you operate at and cost you already have, I feel like refactoring the revenue model is the greater strategic 'bang for your buck'
I don't want to armchair ops your decisions, but maybe do consider moving to some dedicated Hetzner servers, at least for the runners. You can probably reduce that cost tenfold relatively simply.
Move everything to Hetzner (or the like) at double the capacity that AWS has on average and you still spend about 10% of what you spend on AWS.

That is how stupid costly AWS is and 99% of people using AWS are wasting money like there is now tomorrow.

Problem is Hetzner has such stringent user verification its tough to get hosting on there.

You don't even need to use Hetzner, I'm actually quite surprised they are spending 2k on DB. For the env isolation i understand they need ec2 but...

it shouldn't cost anywhere that amount, for instance I have 3 million users on a $40/month digital ocean VPS with just Django + Postgres and I have far more reads and writes.

for env isolation you could spin up a $5/month VPS and shut it down when idle

Sure Hetzner ist just one example. Many provides sell at about 5 to 10% of AWS prices.
Eh? It's not "tough" to get hosting there... It's super easy if you use legit details and live in a country they are allowed to do business with.
> 99% of people using AWS are wasting money like there is now tomorrow

Why do you think that is? Are that many people that dumb or are there other reasons they use managed cloud?

AWS will charge for network traffic, both in and out. Would need to calculate what is cheaper.
The runners take some code and return text, right? That shouldn't be too much traffic, hopefully.
Not to be that guy, but I'm pretty sure that these volumes (talking about ECS and database here) could be handled quite well by a few dedicated $100/mo servers, provided the code isn't hugely unoptimized. So I'm sure you could save (very conservatively) half your budget when going on-prem. That said, it probably won't help you much overall, I would imagine.
I've mentioned this before on HN, but we had a single postgres machine primary read/write server that did 4k QPS 24/7. It had DDR ram as storage on a PCIe card, of course, but this was before "SSD" was a thing. It was for a site that hosted portfolios of images, for both people in the images and people who took the images, and such. The front end data (the images and text) was, iirc, 3TB. Sometimes we'd need a server in a new location, so a locked metal briefcase was carried from the DC where the front-end data lived, to our offices, where one of our "IT" people would then carry it on to the new location and offload it to those servers in that location.

Anyhow that database server was probably ~$35,000 all in. That's 5 months of your current AWS spend. One of the things i did during that time was take a 2 generation newer server, a $35,000 1u Dell with 512GB of ram, and mirrored the postgres database into tmpfs and enabled replication, then we set that machine as primary. The new machine didn't break a sweat. So much so that one of the things me and the (really very awesome and nice; Hi, Chuck, if you're out there!) DBA did was set postgres to use no more than 640KB of memory, then ran the entire site, with 4k QPS, on that postgres instance with 640KB of memory (not counting the 280GB of tmpfs storage, of course!), just to prove it would work. It did - although some of the bookeeping queries (not sure what they're called) were taking a very long time, and would have had to be refactored to use less temporary memory, and such.

anyhow my point is, there are people out there that can do things cheaper, or faster, or more efficiently than whatever you got goin on right now. Your statistics on "per second" usage and the like don't sound too demanding. If you could squirrel away $500/month for a few months, and you ask around for someone that can rack metal and has peering, there are people (including me) who could get you co-lo in <16U[0] with redundancy, where your only monthly infra charges would be the co-lo fees.

[0] Old, extremely beefy, but large servers are generally 4U, but dirt cheap for what you get. Ex: 80 thread, 512GB RAM, 8 SAS bay HP server, $800 shipped. And i bought those 6 years ago. However: 5950x, 128GB RAM, 24 SATA port can be had for <$2000 (i'm guessing based on what i paid a few years ago), and that's roughly equivalent in power (kernel compile takes 3 seconds longer on the 5950x but it uses 1/4th the power at the wall). The reason i tagged on <16U is because at most you're gunna need 4x4U, two "front end" and two "back end" machines, with duties split and everything redundant in the rack. I haven't looked in a while to see what's available on ebay as far as more density, but for sure 16U or less!

The issue becomes: how do you find someone who knows how to do all that, that is willing to work for next to nothing because they believe in the ngo/nfp? Maybe there's a tech forum that people like that read, who knows.

good luck, and thank you for doing things to help other people. I hope it all works out in the end.

email in profile.

RAM as storage for DB? So data loss on reboot? Sounds like very specific use case.
https://en.wikipedia.org/wiki/Fusion-io

maybe i misspoke - at one point there was battery backed DDR ram they used in PCIe, but by the time i came around they were using Fusion IO PCIe devices, which i guess were NAND flash, not DDR. or, alternatively, that is how it was explained during onboarding - "it's like DDR on a PCIe card, so the iops are 1000x that of SAS 10k drives"

unless you're talking about our experiment of tmpfs - then yeah, the use case was "genewitch heard bill gates say 640k should be enough for anyone; here's a super beefy machine to test that theory; theory tested." We didn't run the site live on that machine for more than 10 minutes or so, we switched it back to the fusion-io backed server immediately. It was a proof of concept about one of the things we could do with these new servers - read replicas with the DB in tmpfs for extreme speed and no IO blocking.