Hacker News new | ask | show | jobs
by hodgesrm 2011 days ago
That behavior is similar to a number of analytic databases. It's expensive to maintain constraints in large distributed datasets. Referential integrity checks are also not meaningful in denormalized fact tables. Redshift [1] and ClickHouse [2] work this way as well. If things like duplicates are an issue, you can remove them by choosing query sort orders carefully, for example.

[1] https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_co...

[2] https://clickhouse.tech/docs/en/engines/table-engines/merget...

1 comments

Yeah. I knew that, but removing them afterwards seems "a big chunk of work" and never being a satisfactory solution, eg:

[1] https://community.snowflake.com/s/question/0D50Z00007Ft37mSA...

[2] https://stackoverflow.com/questions/35372889/a-simpler-more-...