Hacker News new | ask | show | jobs
by refulgentis 797 days ago
I know vector DBs x embeddings, so I'm afraid I'm just awful at communicating: to wit, and much to my consternation, I have to write and maintain code for both image and text embeddings, on 6 platforms.

I think we're getting to the heart of my confusion, and I only assume it's because of different use cases/expectations on privacy.

Lets say I'm CEO of Mousetrap Inc., and I got this .txt file, our top secret plan for a better mousetrap.

I want genAI to pick out the parts about the new metal alloy.

I upload the file to B2BAI LLC, who turns it into List<String>, then we give it to the model and get back List<List<Float>>.

Vector DBs store the List<String> and the List<List<Float>> for retrieval.

I, the top secret mouse-trap inventor, do not want my plan stored on any 3rd party computer.

But, this app I use puts it in an a16z approved Vector DB™.

The vector DB provider now has the embeddings (List<List<Float>>) and the chunks (List<String>), which violate my desire to not have my top secret mousetrap plan stored at rest anywhere .

1 comments

This is silly.

Big companies who are extremely protective of their secrets use the cloud. Even the US government isn't afraid to store classified information in AWS, and they're not joking around with secrecy.

Unless you're acting specifically against American interests, I can't imagine a situation in which a cloud company would actually steal your secrets.

If anything, I'd be afraid of a vector DB vendor getting hacked, but I don't think that most non-tech companies who want to use vector embeddings for their documents can provide better security themselves.

you're right, my threat model is vector DB provider gets hacked, like you.

It's not silly because it takes 1 swe week, max, from start to finish, to just do it in memory locally. You don't need the Vector DB(tm)