|
|
|
|
|
by snidane
1464 days ago
|
|
What benefit does it give me over running table diff query in SQL? select * exclude (date_uploaded) from dev_table
except
select * exclude (date uploaded) from prod_table
or select *
from (select * exclude (date_uploaded) from dev_table) dev
natural full join
(select * exclude (date_uploaded) from prod_table) prod
where dev.date_uploaded is null
or prod.date_uploaded is null
The only issue with the above is that EXCLUDE/EXCEPT is missing from standard SQL and even from market leaders like Snowflake, making this a massive pain in the ass. Second, natural joining in presence of null fields is going to produce a mess instead of something useful. Again - analytics db providers would rather boast about adhering to an ancient standard from the 1970s than listening to users and actually making SQL work after all those decades of pain.Without the stupid default behavior of SQL, this wouldn't be a problem. I'm curious if Data Diff solves this or some other use case. |
|
- data-diff can compare tables across different databases. Your query is limited to one database.
- For very big tables, your 'select' will time-out. data-diff splits the diff into small segments, so we side-step this issue.
- data-diff supports running in threaded mode, which means it can finish a LOT faster. (especially for cloud databases.)