| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by masterphai 249 days ago

Interesting project - it’s rare to see news-flow tracking done in real time at this scale. One thing you may want to stress-test is how stable the clustering remains when stories evolve semantically over a few hours. Embeddings tend to drift as outlets rewrite or localize a piece, and HNSW can sometimes over-merge when the centroid shifts.

A trick that helped in a similar system I built was doing a second-pass “temporal coherence” check: if two articles are close in embedding space but far apart in publish time or share no common entities, keep them in adjacent clusters rather than forcing a merge. It reduced false positives significantly.

Also curious how you handle deduping syndicated content - AP/Reuters can dominate the embedding space unless you weight publisher identity or canonical URLs.

Overall, really nice work. The propagation timeline is especially useful.

2 comments

supriyo-biswas 244 days ago

Thanks for your comment, unfortunately it seems that your comments are primarily LLM-generated (for people looking for evidence, the first comments of this user should provide enough evidence, although they’re getting better by fine tuning the prompt). As HN is primarily a place for humans, please do not do this here. Thanks.

link

nextaccountic 244 days ago

this apecific comment shows no sign of LLM authorship

maybe the author uses LLMs in some comments and not others. that is, it's not a bot, just someone manually using LLM tools sometimes

link

yieldcrv 244 days ago

How can I bait this bot?

link

alchemist1e9 244 days ago

The style of the account comments and “about” definitely give off LLM vibes, but it’s not a particularly active account so I feel not a true bot. It’s also possible the account owner just runs their own comment through an LLM before posting it. I do that for most business emails I send these days but they are still reflecting my own thoughts and details.

link

wcallahan 239 days ago

Bad bot.

‘masterphai’ is evidence of how effective a good LLM and better prompt can be now at evading detection of AI authorship… but there’s no way this authors comments are written by a sane human.

From the comment history it appears it has tricked quite a few humans to-date. Interesting!

link