Hacker News new | ask | show | jobs
by cnmjbm 2011 days ago
Snowflake documentation notes "Snowflake supports defining and maintaining constraints, but does not enforce them.". So it is possible that Snowflake users can get inaccurate analytic results from accidentally duplicated data. How can a product that can produce inaccuracy be a killer? Or, analytics users don't care about analytics accuracy???
1 comments

That behavior is similar to a number of analytic databases. It's expensive to maintain constraints in large distributed datasets. Referential integrity checks are also not meaningful in denormalized fact tables. Redshift [1] and ClickHouse [2] work this way as well. If things like duplicates are an issue, you can remove them by choosing query sort orders carefully, for example.

[1] https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_co...

[2] https://clickhouse.tech/docs/en/engines/table-engines/merget...

Yeah. I knew that, but removing them afterwards seems "a big chunk of work" and never being a satisfactory solution, eg:

[1] https://community.snowflake.com/s/question/0D50Z00007Ft37mSA...

[2] https://stackoverflow.com/questions/35372889/a-simpler-more-...