Hacker News new | ask | show | jobs
by Lukasa 3603 days ago
I think that's a tempting conclusion to draw, but it's not quite right. The standard interfaces of Go reduce the pain, but the design principle ("Don't do I/O in parser or state machines") remains solid.

The reason for this is basically that I/O and protocol logic are separate concerns, and whenever they start influencing each other too much they impose costs on each other.

The best example is actually testing. If your protocol code includes calls to Golang's `Reader`/`Writer` interface methods, it causes a few problems.

The easiest thing to see is that it causes testing problems. For example, for each call to `Reader`/`Writer` methods, in addition to testing all possible reads/writes (a protocol concern), you need to test all possible I/O failures (timeouts, closed connections, weird kernel problems) in order to actually cover the complete failure space.

However, if your code doesn't have reads/writes mixed in with protocol logic, your testing scenarios are much easier. Bytes just come in and go out. Reading/writing problems aren't an issue.

This is just basic separation of concerns stuff, but it really does help, even in languages with "blessed" I/O mechanisms.

1 comments

I'm not sure I follow.

Why not test with something that implements the reader or writer interface, but shuffles bytes around in memory? That should alleviate the testing explosion.

And whatever interface you do have must be putting bytes in our getting bytes out of the protocol layers. Why not call that interface reader or writer?

I'm not seeing the distinction.

Sorry, let me be clearer.

The reason that just having an in-memory Reader or Writer doesn't solve the problem is that the failure modes don't match up. An in-memory reader/writer has basically no failure modes beyond ENOMEM. That's why in the no-I/O implementation, this is exactly what we use: write to an in-memory buffer.

Real I/O on the other hand has many failure modes. For an example, consider timeouts. If your parser does I/O, you need to test timeouts at every location that your parser does I/O. You need to confirm it handles those timeouts appropriately. And you need to decide what "appropriately" means here: do you retry? Do you abort? Do you attempt to unwind that state transition?

All of these are expansions of your state space. This means your protocol parser has to handle this combinatorial explosion of possible outcomes: at every point you have a Read/Write you need to be ready and prepared to handle all possible error conditions that can come out of that.

If your parser does no I/O, though, and only writes to buffers, this problem does not exist. That allows you to have two totally isolated sections of code: one part manipulates bytes in memory (the parser), and another bit is responsible for getting those bytes to and from the network. Each can be tested separately. If we need `n` tests for the no-I/O parser, and `m` tests for the I/O without parser, then to achieve equivalent test coverage your combined code requires `n * m` tests to achieve equivalent logical coverage of the possibility space.

Small, isolated components are good.

Oh. So it is less about I/O vs no-I/O and more about push parsing vs pull parsing.

Because testing with a reader and writer interface lets you test errors too, but now you are talking about error recovery strategies (pull has to pass through or have smarts, push can know nothing).

I agree in many cases push has the advantages being discussed. I just wouldn't have called it no-I/O since that doesn't really have the right connotation.