Hacker News new | ask | show | jobs
by wskish 1300 days ago
I noticed a lot of dups on the HN Summary bot (https://github.com/jiggy-ai/hn_summary) so was wondering if we needed an embedding similarity search to filter them. So I checked the database of recent stories and found 194 instances of duplicates with exact same Story Title or Story URL in the last few days that the bot has been running.

There were all story items that made it into the /topstories hacker news api endpoint:

https://gist.github.com/wskish/c8c6dbcb1c036882f3eb11b0660c0...