I could be wrong, but I wouldn't think the autoscaling or load balancing would affect the websocket connection. There may be another aspect of the infrastructure that's preventing the connection though. Can you share more about the setup? Strange that the connection would succeed sometimes but not others.. This could be related to the configuration of some intermediate network layer. Eg. if Nginx is used, you may have to look into the settings that are needed to ensure websockets work well. Take a look at these pages:
Hi, We're not using Nginx. We are using Docker image & everything is configured using cloudflare (whose details are something which I am not aware of properly).
Though, here the problem with websocket is that they are stateful and whenever a connection is established it is directly getting established with one of the instance from the list of several instances due to loadbalancer. Now, whenever a new Webhook response comes as it's a normal post request and it doesn't have information regarding which instance was used earlier for making the websocket connection, it may send request to one of the instance where the connection was not established and thus our backend is not able to process this request from this particular instance.
One thing to consider would be Kafka. When a webhook comes in, you publish a message to your topic that includes some reference to the user such as the payment id or user id. Each server consumes from the topic and if it finds a message for which it has an active websocket (or long polling) connection, then it pushes that message back to the user. This page shows how to have all consumers consume all messages: https://stackoverflow.com/questions/23136500/how-kafka-broad.... Spring Boot has really convenient integration with Kafka which makes setup pretty straightforward. The AWS Kafka service is also quite easy to setup. Having all servers consume doesn't scale ideally since more payments and webooks will result in more kafka messages sent to each server, but partitioning would probably be too tricky with a variable number of servers due to autoscaling.
Another approach could be to save the association between the server and the session in a database. When a webhook comes in, if the current server doesn't have the target session, lookup which server does and make a request to an internal endpoint on that server to send the message over.
You could also look into Redis for this. Have the server which is handling the websocket subscribe to key changes for a key associated with the user's payment. When a webhook is received, just update that key in Redis
Though, here the problem with websocket is that they are stateful and whenever a connection is established it is directly getting established with one of the instance from the list of several instances due to loadbalancer. Now, whenever a new Webhook response comes as it's a normal post request and it doesn't have information regarding which instance was used earlier for making the websocket connection, it may send request to one of the instance where the connection was not established and thus our backend is not able to process this request from this particular instance.