Hacker News new | ask | show | jobs
by jwr 507 days ago
My SaaS has been using WebSockets for the last 9 years. I plan to stop using them and move to very simple HTTP-based polling.

I found that scalability isn't a problem (it rarely is these days). The real problem is crappy network equipment all over the world that will sometimes break websockets in strange and mysterious ways. I guess not all network equipment vendors test with long-lived HTTP websocket connections with plenty of data going over them.

At a certain scale, this results in support requests, and frustratingly, I can't do anything about the problems my customers encounter.

The other problems are smaller, but still annoying, for example it isn't easy to compress content transmitted through websockets.

6 comments

I always recommend looking at Server-Sent Events [0] and EventSource [1]. It's a standardization of old style long-polling, mapping very well to the HTTP paradigm and is built in to the web standard.

It's so much easier to reason about than websockets, and a naive server side implementation is very simple.

A caveat is to only use them with HTTP 2 and/or client side logic to only have one connection open to the server, because of browser limits on simultaneous requests to the same origin.

[0] https://developer.mozilla.org/en-US/docs/Web/API/Server-sent... [1] https://developer.mozilla.org/en-US/docs/Web/API/EventSource

The last project I worked on went in the same direction.

Everything works great in local/qa/test, and then once we move to production we inevitably have customers with super weird network security arrangements. Users in branch offices on WiFi hardware installed in 2007. That kind of thing.

When you are building software for other businesses to use, you need to keep it simple or the customer will make your life absolutely miserable.

This is likely a misconfiguration or bugs on your end. Our products use WebSockets extensively for both business logic and for media delivery. We have significant traffic, including from 3rd world countries with extremely poor networks. When the server and the browser are proper, the reliability of the WebSocket protocol and the software stack is basically not different from raw TCP. When we have issues, the bugs are always on our end due to 1) ingress software (firewalls, WAPs, reverse proxies, TLS termination). 2) HTTP server protocol parsing and processing of the WebSocket data stream. 3) Web/HTTP framework issues. We never have any issues due to networks on the users end, apart from the quality of the connection itself. When seen from the equipment side, the WebSocket connection is an opaque stream of data. No different from a fps gameplay or a livestream. The equipment can break it but then the underlying HTTP breaks as well, giving very clear errors in browsers. Reconnection and keepalive for WebSockets in browsers are very robust which you can actually prove by tests...
What are the typical payload sizes in your WebSocket messages? Could you share the median and p99 values?

I've also discovered similar networking issues in my own application while traveling. For example, in Vietnam right now, I was facing recurring issues like long connection establishment times and loss of responsiveness mid-operation. I thought I was losing my mind - I even configured Caddy to not use HTTP3/QUIC (some networks don't like UDP).

I moved some chunkier messages in my app to HTTP requests, and it has become much more stable (though still iffy at times).

I transmit a lot over websockets. Large messages and large amounts of data. I don't think it makes sense to move bigger messages to HTTP requests while keeping the websockets — I heard that advice, but if I am to do that, I'd rather go all the way and stop using websockets altogether.
This is surprising to me as I would expect network equipment to just see a TCP connection given both HTTP and Websockets are an application layer protocol and that long lived TCP connections are quite ubiquitous (databases, streaming services, SSH, etc).
Found this same issue trying to scale streamlit. It's just not a good idea.