Hacker News new | ask | show | jobs
by anvaka 545 days ago
Jaccard similarity is not particularly good for "celebrity" projects.

They are similar because they are popular, not because there is semantic relationship.

It's the same problem I faced with the map of reddit (https://anvaka.github.io/map-of-reddit/ ) - all popular subreddits are just "similar" to each other.

Stil works great for smaller, non-celebrity projects :D