Hacker News new | ask | show | jobs
by remram 1412 days ago
The problem is that pathological cases are things like a repeating pattern (or repeating byte). Another issue is deliberate attacks: if a Dolt user can craft datasets for which single row changes translate to a duplication of the entire tree (and dataset), this becomes an obvious DOS vector for hosted Dolt platforms.
1 comments

From what I've seen the likelihood of triggering a pathological case with real-world non-malicious data is actually low enough to be ignored, given that the rolling hash function is well-crafted. I do agree that crafting malicious data to break deduplication in Dolt should be relatively easy, but I do not see how this could lead to DOS on e.g. a hosted Dolt platform. If I understand correctly, your proposed attack would only affect the rate of deduplication and by extension disk space used, and I would expect a hosted Dolt platform to have strict disk-space limits or use storage-based billing.