| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cyrusthegreat 1780 days ago

Hi everyone!

Over the years, I've found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:

Store embeddings durably and with high availability

Allow for approximate nearest neighbor operations

Enable other operations like partitioning, sub-indices, and averaging

Manage versioning, access control, and rollbacks painlessly

It's still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you'd like to see!

Repo: https://github.com/featureform/embeddinghub

Docs: https://docs.featureform.com/

What's an Embedding? The Definitive Guide to Embeddings: https://www.featureform.com/post/the-definitive-guide-to-emb...

3 comments

ypcx 1780 days ago

In the "Definitive Guide to Embeddings", in the figure "An illustration of One Hot Encoding", the "One Hot Encoding" table doesn't make any sense whatsoever. Am I wrong?

link

make3 1780 days ago

no you're right ahahah wth are these

link

cyrusthegreat 1779 days ago

You are both right. I just realized this and would be embarrassed if I wasn’t laughing so hard. I gave an original drawing to our designer with the correct values and we didn’t inspect their final image. We’ll get this fixed, thanks for pointing this out and sorry for the confusion :)

link

JPKab 1780 days ago

Holy shit, this looks amazing!

I see you've got examples for NLP use cases in your docs. Can't wait to read them. Embeddings are a constant source of complexity when I'm trying to move certain operations to Lambda, this looks like it would speed the initializations up big time.

link

cyrusthegreat 1780 days ago

We're so glad to hear that! We'd love your feedback as we keep building. Please join our community on Slack: https://join.slack.com/t/featureform-community/shared_invite...

link

localhost 1780 days ago

Curious about how your solution is different / better than nmslib which I've tried in the past?

link

cyrusthegreat 1780 days ago

We actually use HNSWLIB by NMSLIB on the backend. NMSLIB is solving the approximate nearest neighbor problem, not the storage problem. It’s not a database, it’s an index. We handle everything needed to turn their index into a full fledged database with a data science workflow around it (versioning, monitoring, etc.)

link

localhost 1779 days ago

That's great. I've been very impressed by the performance of nmslib in my scenarios. I'll definitely check out eh - thanks for sharing!

link