Hacker News new | ask | show | jobs
by cyrusthegreat 1733 days ago
Hi everyone!

Over the years, I've found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:

Store embeddings durably and with high availability

Allow for approximate nearest neighbor operations

Enable other operations like partitioning, sub-indices, and averaging

Manage versioning, access control, and rollbacks painlessly

It's still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you'd like to see!

Repo: https://github.com/featureform/embeddinghub

Docs: https://docs.featureform.com/

What's an Embedding? The Definitive Guide to Embeddings: https://www.featureform.com/post/the-definitive-guide-to-emb...

3 comments

In the "Definitive Guide to Embeddings", in the figure "An illustration of One Hot Encoding", the "One Hot Encoding" table doesn't make any sense whatsoever. Am I wrong?
no you're right ahahah wth are these
You are both right. I just realized this and would be embarrassed if I wasn’t laughing so hard. I gave an original drawing to our designer with the correct values and we didn’t inspect their final image. We’ll get this fixed, thanks for pointing this out and sorry for the confusion :)
Holy shit, this looks amazing!

I see you've got examples for NLP use cases in your docs. Can't wait to read them. Embeddings are a constant source of complexity when I'm trying to move certain operations to Lambda, this looks like it would speed the initializations up big time.

We're so glad to hear that! We'd love your feedback as we keep building. Please join our community on Slack: https://join.slack.com/t/featureform-community/shared_invite...
Curious about how your solution is different / better than nmslib which I've tried in the past?
We actually use HNSWLIB by NMSLIB on the backend. NMSLIB is solving the approximate nearest neighbor problem, not the storage problem. It’s not a database, it’s an index. We handle everything needed to turn their index into a full fledged database with a data science workflow around it (versioning, monitoring, etc.)
That's great. I've been very impressed by the performance of nmslib in my scenarios. I'll definitely check out eh - thanks for sharing!