|
|
|
|
|
by pbnjay
1461 days ago
|
|
Pretty cool! I did a similar project for flatfiles, but used bloom filters to generate an “index” of row contents to test against later. I feel like a similar idea could work for identifying divergent rows within your segments more quickly/with less repeated work. Making that work across databases could be a huge pain though, I had some success in Postgre but bitfields in the other DBs were painful. |
|
That was indeed the main challenge. Each DB has a different syntax, different set of features, different format for timestamps and floats, different max precision, and so on. I'd say most of our work on data-diff went to making sure the behavior of the different DBs aligned with each other.