Hacker News new | ask | show | jobs
by lihan 2139 days ago
How does it work behind the scene? Is it simply sample a portion of the data then do the diff? What if I need 100% accuracy?
1 comments

If diffing datasets within the same physical database, generate SQL, execute in the database, analyze and render results.

If diffing datasets across physically different databases, e.g. PostgreSQL <> Snowflake or 2 distinct MySQL servers, pull data in our engine from both sources, diff, and show results.

Sampling is optional but helpful to keep compute costs low for large Mill/Bill/Trill-row datasets.