Hacker News new | ask | show | jobs
by kashyapa95 871 days ago
So the extension scrapes all your bookmarks' content, selects key parts of it (we have a naive heuristic for now), embeds them using Sentence Transformer, and indexes them in the browser local storage with Orama's vector DB. When you want to search, we embed the query and do a vector search against the index to get the semantically most similar ones. All in-browser so no data going to any API.

Didn't try victor, is that just for nodejs runtime or does it run at the edge as well? Orama's been pretty good, at least semantically. Haven't done any speed benchmarking so not sure if it's as fast as say HNSW.

1 comments

I would assume it runs on the edge, since it runs in the browser via WASM. It's also implemented in Rust, which provides some flexibility. It will likely depend on the specific edge runtime though. The runtime would need something resembling a file system.

The thing I like about Victor is that it uses OPFS for storage when running in the browser, meaning it doesn't keep everything in memory. I looked at Orama a bit and from my reading I think they keep everything in memory, although I didn't dig too deep and would like to be wrong about that.

This is similar to Voy [1], which runs entirely in memory.

Unfortunately, for my own usage running in memory feels like a non-starter since the data size is unbounded.

[1]: https://github.com/tantaraio/voy