Hacker News new | ask | show | jobs
by Lx1oG-AWb6h_ZG0 2410 days ago
Just curious, what was the rationale for randomizing the spanId at each hop? (As opposed to a more structured format that could let you track the request tree without relying on another field like timestamp)
2 comments

Existing tracing systems (Dapper, Zipkin, Dynatrace, Stackdriver, etc.) already randomize with each hop, and there was a desire to be consistent with the models that they already used. It's also more straightforward to implement.

There's a discussion about "correlation context" inside of this W3C group called , which maps to what you're describing. It'd be worth reaching out to Sergey (one of the other co-chairs) if you want to find out more.

Timestamps across distributed systems don't work well as correlation tools as time tends not to be accurate enough to order application retries particularly but also fan out type requests. You really want parent / child or follows from relationships to collect and represent the graph correctly.

Source: Working on distributed tracing at Twilio and Stitch Fix