Hacker News new | ask | show | jobs
by syx 2552 days ago
This was more of a kludge than a serious project, but glad I was not the only one to find this useful, will definitley continue working on it.

>Is it possible to avoid duplicates?

I thought about this as well in the past but never actually found a solution. Maybe someone here knows if there's research or an algorithm to uniquely identify the URL.

Edit: typos

2 comments

Some pages will link to a canonical URL in the head: https://en.wikipedia.org/wiki/Canonical_link_element
Stripping out the query string probably works for 80%(maybe even more) of the sites. Maybe that by default and then an option to search for the whole url as a fallback?