Hacker News new | ask | show | jobs
Show HN: Librarian - Semantic Bookmark Search Using Transformers (github.com)
95 points by kashyapa95 872 days ago
Search for your bookmarks by content!

@ashwinlokkur and I built this Chrome extension that scrapes your bookmarks' content and does semantic search using transformer embeddings.

Free and private since it's all in-browser. No LLM API calls ;)

9 comments

I also built such an extension half a year ago [0]. The first iteration was a local-first natural-language full-text search for the browser history [1]. The second iteration was focusing on bookmarks [2].

None of these could spark enough interest to get feedback on what users want. I am sharing this experience so that you may study my attempt, if you want.

[0] https://getpinbot.com/

[1] https://www.youtube.com/watch?v=GYwJu5Kv-rA

[2] https://www.youtube.com/watch?v=PQh1qhvxZzc

I love it. I wrote an RFS[0] based on a similar idea. The big value prop in my mind isn’t the single user experience but rather automated trust signals on content quality based on access by members of your network graph and their proximity.

Hopefully yourself, author, and others continue to work on this idea.

[0] https://www.nothingeasyaboutthis.com/replacing-google-search...

Thank you! Will take a look at these. We mostly made this to solve something for ourselves and as a learning exercise, but hoping this may resonate with others too
This is great but I want just simple full text search on all the history. Not title and url search, but full text. If it has semantic embeds on top, all the better. I am losing too many of the things I find.

Wondering why browsers neglected bookmarks and search history so much. They never progressed in the last 2 decades. Storage is cheap, computers are fast and multi-core, yet we live with the mentality of paucity and don't save our digital crumbs.

Thank you! Yeah history is something we've been asked multiple times now. I'm sure this could be extended easily once we solve a few things (scraping pages faster and being smarter about what text we embed, parsing out irrelevant stuff). Will keep you posted.
If you use Safari, there's https://andadinosaur.com/launch-history-book for content saving, for your browsing history.
Web search generates profit, local search doesn't.
Because search killed bookmarks. Have you requested this feature in any browser? Maybe nobody asked for it.
Bookmark search is for rookies. I'd pay for a search tool over my 877 open tabs.
raindrop.io or devonthink
Curious how this works. I've experimented with in-browser vector search using victor[1] with mixed results. Hadn't heard of this orama lib before checking out your project.

[1]: https://github.com/not-pizza/victor

So the extension scrapes all your bookmarks' content, selects key parts of it (we have a naive heuristic for now), embeds them using Sentence Transformer, and indexes them in the browser local storage with Orama's vector DB. When you want to search, we embed the query and do a vector search against the index to get the semantically most similar ones. All in-browser so no data going to any API.

Didn't try victor, is that just for nodejs runtime or does it run at the edge as well? Orama's been pretty good, at least semantically. Haven't done any speed benchmarking so not sure if it's as fast as say HNSW.

I would assume it runs on the edge, since it runs in the browser via WASM. It's also implemented in Rust, which provides some flexibility. It will likely depend on the specific edge runtime though. The runtime would need something resembling a file system.

The thing I like about Victor is that it uses OPFS for storage when running in the browser, meaning it doesn't keep everything in memory. I looked at Orama a bit and from my reading I think they keep everything in memory, although I didn't dig too deep and would like to be wrong about that.

This is similar to Voy [1], which runs entirely in memory.

Unfortunately, for my own usage running in memory feels like a non-starter since the data size is unbounded.

[1]: https://github.com/tantaraio/voy

Thanks for releasing your work, this is useful and interesting.
Why is this necessary, I can just export my bookmarks, use ChatGPT to create a python script to download all of their contents, put it all in a big text file, and then CMD+F what i am looking for
It's so you don't need an exact term, you can search for a synonym or something "similar" in concept.

So more like a search engine than just a Ctrl+F in a file for a string of tokens

Sounds really interesting, and I'd also love a Firefox version :)
Will try to make one soon!
Yes. Sounds useful.
Finally. Bookmark search is in massive need of fixing
How do I install this?
You'll have to do this for now: https://developer.chrome.com/docs/extensions/get-started/tut...

we were gonna get it published on the web store after some initial feedback

Had to move manifest.json from public dir to src dir and load src dir in chrome.

EDIT: And the popup doesn't do anything at all. Can see no js included in popup.html.

my bad, forgot to mention you need run npm build. Added instructions here: https://github.com/oto-labs/librarian?tab=readme-ov-file#set...
Firefox version?