Hacker News new | ask | show | jobs
by steve1820 2099 days ago
Vaporware is better than going nowhere! (Get it...noware...haha).

Congrats on getting started.

I agree with Obsidian - I think that most people forget the maintenance time it takes to build a lifelong Knowledge Management System.

I like your idea - document similarity is a well known area in ML.

Feel free to take my Chrome Extension and use the parts where it tracks key paragraphs in an article (using a user's click/ hover/ attention behaviour) and use that as the corpus for your ML similarity models.

Intuitively it makes more sense to run document similarity on key points/ paragraphs than the whole web page.

If you want the whole web page though, there's code in the Chrome Extension that use's Mozilla's readability lib (https://github.com/mozilla/readability) to purify the web content.

1 comments

Thanks for the tip on the readability library. I don't have much experience with webdev, so my extension was just saving a local copy of whatever was returned every time the browser made a request, I should be able to cut down on storage space if I can use the readability library to skip saving things like trackers and images.