| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dx87 2098 days ago

I'm really glad you put a lot of focus into automatically tracking the things the user is likely interested in. I tried using Obsidian, but it felt like I was spending more effort remembering to save information and create proper back-links than I was actually retaining any information.

I've recently started working on something similar as an excuse to learn machine learning, but it's still mostly vaporware outside the firefox extension I wrote. I think that by saving some basic metadata (when a page was viewed, what browser was used to view it), and using ML to judge how similar the contents of a page is to another, it should be able to automatically create links between related information. Ideally, it'd be able to handle information outside the browser. For example, if a log file is saved, then a web page is viewed with similar contents to the log file, it would be able to detect that the web page is probably a reference for the log file.

Like I said, it's mostly vaporware, but I think that products like these are going to be the future of collaboration tools.

2 comments

steve1820 2098 days ago

Vaporware is better than going nowhere! (Get it...noware...haha).

Congrats on getting started.

I agree with Obsidian - I think that most people forget the maintenance time it takes to build a lifelong Knowledge Management System.

I like your idea - document similarity is a well known area in ML.

Feel free to take my Chrome Extension and use the parts where it tracks key paragraphs in an article (using a user's click/ hover/ attention behaviour) and use that as the corpus for your ML similarity models.

Intuitively it makes more sense to run document similarity on key points/ paragraphs than the whole web page.

If you want the whole web page though, there's code in the Chrome Extension that use's Mozilla's readability lib (https://github.com/mozilla/readability) to purify the web content.

link

dx87 2098 days ago

Thanks for the tip on the readability library. I don't have much experience with webdev, so my extension was just saving a local copy of whatever was returned every time the browser made a request, I should be able to cut down on storage space if I can use the readability library to skip saving things like trackers and images.

link

gradys 2098 days ago

I'm super interested in this area, with my own vaporware attempt at building it (fully abandoned, unlike yours).

I'm an ML engineer focused on NLP applications. Contact info in my profile if you ever want to chat, e.g. about different approaches for estimating document similarity.

link