Hacker News new | ask | show | jobs
by supercoco9 943 days ago
Hi. Sorry if my query offended you.

I basically executed literally what Clickhouse recommends at their guides for deduplication https://clickhouse.com/docs/en/guides/developer/deduplicatio....

Of course you can also materialize with aggregations or just use a group by, or even force optimize of the table. But my point is that you don't really get exactly once guarantees. Whoever is querying that table needs to be aware than a `SELECT * FROM tb` might contain duplicates and needs to adapt their queries accordingly.

1 comments

I believe there are 0 people working with CH and ReplacingMergeTree and don’t know that they have to use final or group by in order to get non duplicate data. It’s mentioned in the table engine page, their knowledge base everywhere.

Also i have not recently seen anyone not recommending it. It might have been the case a few years ago, but performance of final has improved and it’s faster than alternatives. People suggest to use MergeTrees obviously but if no alternative, replacing is the way to go.