| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by DougBTX 2807 days ago

> each server/client update dance has to be mostly atomic

There’s a nice write up showing how that’s a risk but not an absolute restriction of using messages with required fields here: https://martin.kleppmann.com/2012/12/05/schema-evolution-in-...

The basic idea is to acknowledge that all systems with forward and backward compatibility will have a translation layer, the question is just how is that defined and implemented?

If all fields are optional, it means that all readers need to handle any field being missing, in other words all readers must be able to process empty messages. A user update message might be missing a user id, and the reader will have to handle that. A couple of options come to mind: do nothing if there is no user id, or return an invalid message error. The key is that this is a translation layer that can noop or error before the message reaches the service business logic.

Then another thought is that message schemes needn’t be defined with version ids, trying to define a strict ordering between message versions is hard as you say, especially when handling non-linear updates, eg rollbacks or readers and writers skipping versions.

Instead, let’s define message schema compatibility. The user message processor could be defined to say it will only process messages with user ids - which practically speaking will be the case regardless of the message definition format - then a message without a user id can be rejected by common message parsing code, without per-service per-field translation code.

With a clear set of compatibility rules, it is even possible to write sensible reusable schema compatibility checking, eg: https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/...