Hacker News new | ask | show | jobs
by notsureaboutpg 2135 days ago
Most commenters are focused on the optimizations made, but I actually think the custom routing and verification mechanism is the interesting bit.

That kind of a tool could be handy in lots of scenarios (comparing the same service written in two different languages or with different dependencies, etc).

But how does their verifier mechanism deal with changes in the production database between responses? If the response of the legacy service comes first and the response of the new service comes after, in between both responses (the request being the same) couldn't the data from the database change and thus result in the responses not passing verification when they otherwise should have? How do they manuever around that issue?

Great write-up by the way! I really liked it :)

1 comments

Differing inputs causing verification failures is indeed an issue. In addition to data access races, replication latency also causes this. The legacy service always reads from the primary MySQL instances per shard, but the new service always reads from replicas for scalability and geo distribution.

One slightly helpful mitigation we have in place relies on a data versioning system meant for cache invalidation. The version is incremented after data changes (with debouncing). To reduce false negatives, we throw out verification requests where the two systems saw different data versions. It's far from perfect, but it's been effective enough.