|
|
|
|
|
by caust1c
411 days ago
|
|
How does the deduplication itself work? The blog didn't have many details. I'm curious because it's no small feat to do scalable deduplication in any system. You have to worry about network latencies if your deduplication mechanism is not on localhost, the partitioning/sharding of data in the source streams, and handling failures writing to the destination successfully, all of which cripples throughput. I helped maintain the Segmentio deduplication pipeline so I tend to be somewhat skeptical of dedupe systems that are light on details. https://www.glassflow.dev/blog/Part-5-How-GlassFlow-will-sol... https://segment.com/blog/exactly-once-delivery/ |
|