Hacker News new | ask | show | jobs
by 10000truths 509 days ago
The key to managing this complexity is to avoid mixing transport-level state with application-level state. The same approach for scaling HTTP requests also works for scaling WebSocket connections:

* Read, write and track all application-level state in a persistent data store.

* Identify sessions with a session token so that application-level sessions can span multiple WebSocket connections.

It's a lot easier to do this if your application-level protocol consists of a single discrete request and response (a la RPC). But you can also handle unidirectional/bidirectional streaming, as long as the stream states are tracked in your data store and on the client side.

2 comments

Functional core, imperative shell makes testing and this fast iteration a lot easier. It’s best if your business logic knows very little about transport mechanisms.

I think part of the problem is that early systems wanted to eagerly process requests while they are still coming in. But in a system getting 100s of requests per second you get better concurrency if you wait for entire payloads before you waste cache lines on attempting to make forward progress on incomplete data. Which means you can divorce the concept of a payload entirely from how you acquired it.

> system getting 100s of requests per second you get better concurrency if you wait for entire payloads before you waste cache lines

At what point should one scale up & switch to chips with embedded DRAMs ("L4 cache")?

I haven’t been tracking price competitiveness on those. What cloud providers offer them?

But you don’t get credit for having three tasks halfway finished instead of one task done and two in flight. Any failover will have to start over with no forward progress having been made.

ETA: while the chip generation used for EC2 m7i instances can have L4 cache, I can’t find a straight answer about whether they do or not.

What I can say is that for most of the services I benchmarked at my last gig, M7i came out to be as expensive per request as the m6’s on our workload (AMD’s was more expensive). So if it has L4 it ain’t helping. Especially at those price points.

When you've profiled the code running in production and identified memory bottlenecks that can not be solved by algorithmic/datastructural optimizations.
Currently another thread is going[1] which advocates very similar things, in order to reduce complexity when dealing with distributed systems.

Then again, the frontend and backend are a distributed system, so not that weird one comes to similar conclusions.

[1]: https://news.ycombinator.com/item?id=42813049 Every System is a Log: Avoiding coordination in distributed applications