Hacker News new | ask | show | jobs
by paol 588 days ago
Seems to be some normalization problem with the data, right in the 1st page of the default query there's a duplicate entry.
2 comments

Good spot, will deduplicate in the next iteration.

However titles are repeated often due to the region/language variations.

Since you're denormalizing to a single table, I think the correct way to handle this would be to aggregate all the titles into the title column.

Although "Untitled Pixar Animation Project" is basically garbage data, but that's a harder problem to solve...

deduped all rows with a simple .uniq() call in polars before saving