Hacker News new | ask | show | jobs
by AstralStorm 2340 days ago
You always need some sort of data normalization scheme, and one that makes sense for the task you're running.

(This including things such as Unicode normalization and looking at other fields to determine if it's the same thing.)

And you get to handle duplicates too.

That is just the start, problem gets even more interesting in a real sharded scenario because eventual consistency is hard.