Hacker News new | ask | show | jobs
by t-writescode 1985 days ago
I've written a high-availability service with Akka.NET and RabbitMQ and I remember when I was working with that infrastructure, my biggest question around Akka Cluster was "why would I use this when I already have a message queue infrastructure?"

Maybe real Akka is better than Akka.NET when it comes to Akka Cluster?

2 comments

Akka Cluster works in-memory, RabbitMQ doesn't.

Say you want to have multiple actors (one per user / customer or whatever) and you get HTTP requests and want that exactly this actor handles them (to guarantee consistency), then you can't really do this with RabbitMQ.

I mean, you can make the machine that receives the request push it to the queue and keep the http connection alive, have the machine that is responsible for the user read it from a queue and then somehow tell the first machine how to respond the http request... but then you pretty much re-implemented Akka Cluster in a worse way.

Persistent queues and Akka Cluster solve different usecases.

At that point, you're still operating on a single machine, though, and you don't need Akka Cluster for that.
I don't understand what you mean.

In my scenario, multiple machines are used and necessary for the same service (otherwise there is no point in using Akka Cluster).

Example: online games. You have room/game being created on the fly and it is destroyed an hour later. There are many of these rooms and they are distributed over multiple machines. You can't really use rabbitMQ here, it's not performant enough.

And even if you did, you would pretty much reimplement what Akka Cluster does for you: instance synchronization, different strategies for handling split brain scenarios, dead letter handling, direct message forwarding, persistence (if needed) and so on

If you want to host a game state object in memory (vs serialized and saved to SSD) because you have lots of frequent write/reads happening to that object in a very short window of time such that the IO cost and CPU cost (ser/deserialisation) is higher and the incremental latency is a blocker, then this design of hosting a full-blown object in memory within your runtime makes sense (number of reads per game object per second must be high and must be sustained for a good number of seconds for this tradeoff math to be in this design's favor, given today's SSD costs vs memory costs).

But I wonder if you will suffer from random GC pauses, inability to carefully isolate different behaviors into different resource clusters, resulting in uncontrolled blast radius etc.

If you are anyway doing persistence (because you care to not lose game progress), and whenever a cluster node dies you need to resurrect game state from persistence, I wonder if you will get the game state restored within a bounded latency.

If this happens frequently enough (to affect say 5% of your users – enough to kill your game experience), is the benefits of latency gain from in-memory object reads wiped out?

I mean, you are right that Akka Cluster is JVM based and hence can bring the problems you mentioned. But then again, most high frequency trading also runs on the JVM, so it can often be worked around.

> If this happens frequently enough (to affect say 5% of your users – enough to kill your game experience), is the benefits of latency gain from in-memory object reads wiped out?

For this specific use-case, I don't think there is really an alternative, except for specifically a hand-crafted system (or non-scale, such as everyone hosts and manages their own server).

> But then again, most high frequency trading also runs on the JVM, so it can often be worked around.

JVM is an awesome piece of technology. And you can do robotic control systems to high-frequency trading systems with it with careful programming.

But I've seen a lot of Java code running in production suffering from latency jitters and needing continuous profiling and optimization by a small group of performance engineers while the majority of application engineers keep adding to GC load.

> For this specific use-case, I don't think there is really an alternative, except for specifically a hand-crafted system

Yes, but I think the handcrafted system doesn't need to be very complex. It can be quite simple and easy to understand and tame to your needs as your scale and complexity grows.

You're not just talking online games, you're talking enormous-world MMOs, and that's such a far cry from the areas that I have worked with and thought about how to work with in any detail that I can't usefully add anything.

If you're trying to manage the concurrent state of 10s of thousands of players in a game, all server-side, and you want specific actors to handle that single player and you don't want a globally persistent state, then I suppose this makes sense.

I've never worked with anything of that scale though.

> You're not just talking online games, you're talking enormous-world MMOs

I am talking about bigger scala here of course. But not necessarily what you describe.

Take games like League of Legends or Counterstrike as examples. Having one game per actor seems like a sensible design to me.

But yeah, I think that traditional techniques still get you very far. I heard that Slack was running just on (multiple) postgres for the longest time.

Interesting! Not trying to divert from Akka which is a wonderful piece of engineering, but your comment reminded me of Microsoft’s take at the Actor model with Project Orléans - which was used for the backend side of the Halo / XBox MMO [0].

I think the GA version of Project Orléans is now called Service Fabric, although I never had the pleasure to try it.

[0] https://youtu.be/I91ZU8tEJkU

Orleans and Service Fabric are different. Orleans has been running in production for some time now and is actively developed on GitHub: https://github.com/dotnet/orleans. Teams inside Microsoft run it on top of Service Fabric (and Kubernetes, etc.) More details in this talk: https://youtu.be/KhgYlvGLv9c

Service Fabric has something called Reliable Actors which are heavily inspired by Orleans.

Source: I'm the project lead for Orleans

Queues overlap with the messaging aspects of actors but not the supervision aspect of actors.
For that we have cluster management like kubernetes
An actor can fail for reasons other than infrastructure issues. i.e.: unhandled exceptions.

In fact, unhandled exceptions are encouraged (i.e.: "let it crash" approach to fault tolerance).

Sure, but that’s almost entirely local within a process, where regular Akka processes would work.

So, in the system I worked with, individual applications were Akka powered and cross-process/cross-vm communication was done through queues

With Cluster + Sharding you can have zero(ish) downtime though when you scale horizontally. Messages sent to Sharded actors are buffered if their nodes ever go down and things will just resume as normal.
You can have that with a queueing system, too. As long as you don't ack the message before you're done processing it, the messages that are processing will be sent to the next available client on crash
What if you have a message that is specifically for another actor that may or may not be on the same server?
That doesn't exist in the models I've designed. Microservices still exist and they each do their requisite task, but different work is done by different actors in the actor cluster. Actors are still a fantastic way of handling concurrency, since they can each be treated as single-threaded tiny programs that only think about themselves, and so I've used them in that way.

The actor just won't exist elsewhere, but another microservice that happens to use actors might exist. It would be sent a web request or a message in queue or similar.