Hacker News new | ask | show | jobs
by georgewfraser 2075 days ago
It seems to me that under this scheme, if I make a single erroneous identify call, I will irreversibly merge two users. This is a surprising approach. Given that identify calls may occasionally be wrong, I would expect that

  identify(anonymous_id, new_canonical_id)
would map anonymous_id => new_canonical_id, but would leave the rest of the set find(anonymous_id) alone.
1 comments

Yea, it seems like compared to all of the other data they're logging per user, separately preserving the parent id and canonical id in the tree would have little cost and allow them to fix canonicalization errors later.

Then there's a write throughput vs read latency trade-off for reading statistics aggregated by canonical ID, but my guess is that trade-off can be made in a way they're happy with in exchange for the ability to undo mistakes.