Hacker News new | ask | show | jobs
by andrewstuart 1295 days ago
I found graviton to be a mixed bag. It was certainly extremely fast when using the very high end instances and I tested it successfully using a Rust based message queue system I was writing and it got some ridiculously fast number like 8 million messages a second, from memory, using the fastest possible graviton instance (this was about 18 months ago).

I did try to switch some of my database servers to it a couple of years ago and after random hangs, I gave up and went back to intel. I tried again further down the track and same thing - random hangs. I assume this sort of thing comes with a new architecture but I'd be hesitant to move any production infrastructure to it without extensive long term testing.

In the case of graviton based GPU instances I found that the GPU enabled software I wanted to use didn't work.

If you are comparing performance, I'd suggest buying a fast AMD machine and run it locally and compare performance - local servers tend to be much faster and cheaper than cloud. And if your application uses GPUs then if you possibly can then its very much in your interests to run local servers.

7 comments

Arm has a much looser memory model than x86 [1 for a comparison]. It's possible that the random hangs are due to a race condition in PG that doesn't show up in x86 because memory visibility doesn't require as much synchronization.

1: https://www.nickwilcox.com/blog/arm_vs_x86_memory_model/

There are huge differences in the machine generations. We found that for our workload Graviton3 (c7g) is the best, followed by AMD (m6a), followed by Intel (m6i) with Graviton2 (m6g) somewhat lagging. We can't use Graviton3 however because of memory limitations, so we're using AMD. The difference to the old machine types (m5) is staggering, the m6a is basically twice the performance of m5, while being cheaper.

However, I've seen a lot of benchmarks telling a different story, so it is important to actually measure your workloads.

I'd argue just find a different cloud provider.

GCP, Azure, Supabase, Cloudflare etc if you want managed services.

If you want a mix of managed services and raw compute, look more at Fly.io, Linode, Digital Ocean perhaps?

I have found AWS being the "cheapest" or even "reasonable" in the cost department to be slimmer every year.

Steer clear of Digital Ocean.

They've had senior staff on HN justifying security lapses that commenters were describing as a "clownshoes operation".

Cloudflare doesn’t let you host Docker containers or offer managed Postgres do they?
Its all about how they may fit in your stack. Most definitely fly.io does. I think Cloudflare as far as I'm aware is they're looking at supporting Docker.

I just listed managed services (not all of them may fit I imagined)

I’ve been enjoying them here and there but I’ve also found that for some of my workloads a high clock Intel node is required. Even the Epyc nodes couldn’t keep up. I don’t completely know why, never dug too far into it.
I'm curious about that Rust-based message queue system
What do you want to know? It was a prototype. I was trying to learn Rust (didn't succeed), but I did manage to hack together a message queue that used HTTP for client interaction.

I'd previously written a SQL database message queue in Python which worked with Postgres/MySQL and SQL server. This worked well but it was not fast enough for my liking. My goal was to build the fastest and simplest message queue server that exists, with zero configuration (I hate configuration).

I used Rust with Actix and I tried two strategies - one strategy was to use the plain old file system as a storage back end, with each message in a single file. This was so fast that it easily maxed out the capability of the disk way before the CPU capabilities were maxed out. The advantage of using plain old file system as a storage back end is it requires no configuration at all. So I moved on to a RAM only strategy in which the message queue was entirely ephemeral, leaving the responsibility for message persistence/storage/reliability to the client. This was the configuration that got about 8 million messages a second.

As far as I could tell my prototype left almost all message queue servers in the dust. This is because message queue servers seem to almost all integrate "reliable" message storage - that makes the entire solution much, much more complex and slow. My thinking was to separate the concerns of storage/reliability/delivery and focus my message queue only on message delivery, and push status information back to the client, which could then decide what to do about storage and retries.

I gave up because I didn't see the point in the end because it wasn't going to make me any money, and I was finding Rust frustratingly hard to learn and I had other things to do.

It seems very diplomatic of you to say you found Rust hard to learn, rather than that it was hard to make Rust do what you wanted. You seem very clear on what you wanted to do.
I managed to build it without really grasping Rust by hacking around and looking at examples of how other stuff worked that I wanted to do, and by avoiding doing things in Rust that I didn't understand - stuff as basic as function calls.

The resulting code worked but was garbage and at the end of the day Rust had not clicked for me and being fluent in it still felt a distant goal.

I love the idea of Rust but I don't like the implementation.

I'm hoping in time there will be a new language created that has the memory and thread safety of Rust but is 50 times more simple.

If you don't mind, what where the things that were hard to deal with? You mention "function calls", but I'd like to understand what that actually means in practice. This kind of negative feedback is useful to improve Rust for any other newcomers, even if we have already soured you for good.
I could not even succeed in doing the simplest thing like putting code into a function and executing the function and getting a result back. I can't recall why.
> If you don't mind, what where the things that were hard to deal with?

The language syntax itself. Looks like the language developers wanted rust to stand out and purposefully went out of way to make it extremely difficult to learn. This is one area where Go shines, you can learn the syntax in half a day.

fluvio.io ?
> local servers tend to be much faster and cheaper than cloud.

Of course, running a server in your house is not going to achieve five or even three 9's of reliability, and even colocating a single rack in a single location might be more expensive than putting that infra in AWS (depending on how data-heavy your use case is, given AWS' exorbitant data transfer costs).

you can hit three nines even if you're down for 1.5 minutes every day, or ten minutes a week. It's really not as hard to hit as it sounds. For a compute heavy process that isn't end user facing (e.g. batch processing) it's perfectly viable.

https://uptime.is/

Also, most cloud providers don't guarantee five nines anyway. GCE SLA is 99.5 on a single instance, 99.99 on a region

https://cloud.google.com/compute/sla

Which database are you using?
pg
Was this a while ago or was it a recent experience? I'm asking because I'm planning on using a serverless instance of PG and was interested in trying the ARM64 version.
About 18 months ago.

Try it - it might work fine for you.

We've been running RDS PostgreSQL on r6g instances for the past few months with no issues.