Hacker News new | ask | show | jobs
by dtwwtd 4572 days ago
Are the servers you're building serving concurrent clients? An exception could take out multiple in-flight requests.

(not against your idea, just curious how you handle it)

2 comments

One advantage of forking servers; kill and reboot the parent, and you don't loose in-flight connections.

That said, I do this as well, even the best behaved daemon can get... funky... after a few months. Planned outages for a daemon restart are ok in my experience, particularly if you can fail over to other nodes as part of a rolling restart.

Of course, this refers to planned restarts, though forking servers helps with unplanned exceptions as well.

Don't you find performance suffers? AIUI this approach means you can only handle as many concurrent requests as you have processes, and the OS scheduler has less information to work with than if you were using threads.
Not typically: using forking daemons does not mean that you can't also use threads. The ideal model probably uses both - so they can be stuck to a processor, but still use threads per request. It's nice to have a library to abstract the implementation details away for you, but not necessary.
Right, but doesn't the error handling approach you describe mean allowing a whole process to fail whenever an error condition occurs, which would cause any requests that were being handled by other threads of that process to fail even though they were perfectly valid?
Can you really tell me that you know after an exception that the other threads are really in a well-defined state let alone 'perfectly valid'?

Look at something like ZeroMQ that is being rewritten specifically to avoid the non-determinism inherent in throwing an exception.

Once you're using threads it's pretty much anyone's guess as to what state the system is in at any point, add exceptions and it just gets worse.

An exception in one thread shouldn't affect others - why would it?

I agree that unstructured use of threading primitives leaves you with an unpredictable system, but it's possible to build safer, higher-level abstractions and use those.

Why do you need to handle requests concurrently when something like the disruptor pattern can handle 6 million/sec on a single core.
Disruptor would be even worse for this - you'd lose all the perfectly valid messages in all the ringbuffers.
It's more of a question of what the semantics of failed requests are than concurrent clients.

The key is to segment your system so processing is orthogonal to the implementation details of the networking protocol on a given system. Just because on a particular OS when a process closes the TCP/IP connections are dropped does not mean that every time your process crashes that client connections are dropped.

In the case of a webserver you can use something like mongrel2 / nginx that maps physical connections to backend processes so that a process restart doesn't mean a dropped connection, or failed request.

Forcing your machines to reboot early and often makes you think about and deal with these problems rather than simply delaying them until one of your nodes dies and takes out a bunch of client connections anyway.