Show HN: Librarian - Semantic Bookmark Search Using Transformers | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

Show HN: Librarian - Semantic Bookmark Search Using Transformers (github.com)

95 points by kashyapa95 872 days ago

Search for your bookmarks by content!

@ashwinlokkur and I built this Chrome extension that scrapes your bookmarks' content and does semantic search using transformer embeddings.

Free and private since it's all in-browser. No LLM API calls ;)

9 comments

klavinski 872 days ago

I also built such an extension half a year ago [0]. The first iteration was a local-first natural-language full-text search for the browser history [1]. The second iteration was focusing on bookmarks [2].

None of these could spark enough interest to get feedback on what users want. I am sharing this experience so that you may study my attempt, if you want.

[0] https://getpinbot.com/

[1] https://www.youtube.com/watch?v=GYwJu5Kv-rA

[2] https://www.youtube.com/watch?v=PQh1qhvxZzc

bberenberg 871 days ago

I love it. I wrote an RFS[0] based on a similar idea. The big value prop in my mind isn’t the single user experience but rather automated trust signals on content quality based on access by members of your network graph and their proximity.

Hopefully yourself, author, and others continue to work on this idea.

[0] https://www.nothingeasyaboutthis.com/replacing-google-search...

kashyapa95 871 days ago

Thank you! Will take a look at these. We mostly made this to solve something for ourselves and as a learning exercise, but hoping this may resonate with others too

visarga 872 days ago

This is great but I want just simple full text search on all the history. Not title and url search, but full text. If it has semantic embeds on top, all the better. I am losing too many of the things I find.

Wondering why browsers neglected bookmarks and search history so much. They never progressed in the last 2 decades. Storage is cheap, computers are fast and multi-core, yet we live with the mentality of paucity and don't save our digital crumbs.

kashyapa95 871 days ago

Thank you! Yeah history is something we've been asked multiple times now. I'm sure this could be extended easily once we solve a few things (scraping pages faster and being smarter about what text we embed, parsing out irrelevant stuff). Will keep you posted.

dorian-graph 871 days ago

If you use Safari, there's https://andadinosaur.com/launch-history-book for content saving, for your browsing history.

dandanua 872 days ago

Web search generates profit, local search doesn't.

esafak 872 days ago

Because search killed bookmarks. Have you requested this feature in any browser? Maybe nobody asked for it.

panarky 872 days ago

Bookmark search is for rookies. I'd pay for a search tool over my 877 open tabs.

esafak 870 days ago

https://addons.mozilla.org/en-US/firefox/addon/falcon_extens... via https://news.ycombinator.com/item?id=30696451

BOOSTERHIDROGEN 872 days ago

raindrop.io or devonthink

iansinnott 872 days ago

Curious how this works. I've experimented with in-browser vector search using victor[1] with mixed results. Hadn't heard of this orama lib before checking out your project.

[1]: https://github.com/not-pizza/victor

kashyapa95 871 days ago

So the extension scrapes all your bookmarks' content, selects key parts of it (we have a naive heuristic for now), embeds them using Sentence Transformer, and indexes them in the browser local storage with Orama's vector DB. When you want to search, we embed the query and do a vector search against the index to get the semantically most similar ones. All in-browser so no data going to any API.

Didn't try victor, is that just for nodejs runtime or does it run at the edge as well? Orama's been pretty good, at least semantically. Haven't done any speed benchmarking so not sure if it's as fast as say HNSW.

iansinnott 870 days ago

I would assume it runs on the edge, since it runs in the browser via WASM. It's also implemented in Rust, which provides some flexibility. It will likely depend on the specific edge runtime though. The runtime would need something resembling a file system.

The thing I like about Victor is that it uses OPFS for storage when running in the browser, meaning it doesn't keep everything in memory. I looked at Orama a bit and from my reading I think they keep everything in memory, although I didn't dig too deep and would like to be wrong about that.

This is similar to Voy [1], which runs entirely in memory.

Unfortunately, for my own usage running in memory feels like a non-starter since the data size is unbounded.

[1]: https://github.com/tantaraio/voy

deepnet 872 days ago

Thanks for releasing your work, this is useful and interesting.

mccruz 871 days ago

Why is this necessary, I can just export my bookmarks, use ChatGPT to create a python script to download all of their contents, put it all in a big text file, and then CMD+F what i am looking for

llanowarelves 871 days ago

It's so you don't need an exact term, you can search for a synonym or something "similar" in concept.

So more like a search engine than just a Ctrl+F in a file for a string of tokens

mshekow 872 days ago

Sounds really interesting, and I'd also love a Firefox version :)

kashyapa95 871 days ago

Will try to make one soon!

ioshaan 872 days ago

Yes. Sounds useful.

markocalvocruz 871 days ago

Finally. Bookmark search is in massive need of fixing

smusamashah 871 days ago

How do I install this?

kashyapa95 871 days ago

You'll have to do this for now: https://developer.chrome.com/docs/extensions/get-started/tut...

we were gonna get it published on the web store after some initial feedback

smusamashah 871 days ago

Had to move manifest.json from public dir to src dir and load src dir in chrome.

EDIT: And the popup doesn't do anything at all. Can see no js included in popup.html.

kashyapa95 871 days ago

my bad, forgot to mention you need run npm build. Added instructions here: https://github.com/oto-labs/librarian?tab=readme-ov-file#set...

Sabinus 872 days ago

Firefox version?