Hacker News new | ask | show | jobs
by pbnjay 1461 days ago
Pretty cool! I did a similar project for flatfiles, but used bloom filters to generate an “index” of row contents to test against later. I feel like a similar idea could work for identifying divergent rows within your segments more quickly/with less repeated work.

Making that work across databases could be a huge pain though, I had some success in Postgre but bitfields in the other DBs were painful.

1 comments

> Making that work across databases could be a huge pain though

That was indeed the main challenge. Each DB has a different syntax, different set of features, different format for timestamps and floats, different max precision, and so on. I'd say most of our work on data-diff went to making sure the behavior of the different DBs aligned with each other.