Hacker News new | ask | show | jobs
by bvan 1157 days ago
Great, but I would have to run this in >my< private cloud. no way any business is going to upload its docs into a third-party cloud, no matter what the small print says.
7 comments

You've gotta do the slack-style growth strategy. Give users a free tier and market directly to end users. Let your users ignore their own company policy for their own convenience. Eventually they will end up dependent enough on it that their organizations will be forced to accept it.
I get what you’re saying but the reality is many industries just can’t do this. I have strict data residency and sovereignty requirements - there are potential criminal charges. It’s a non-starter for lots of industries
My statement was a bit tongue in cheek. This strategy works better than it should. More industries should be like yours.

I suspect we'll see data leaks through misguided trust in AI models at some point in the near future, and it'll end up being a mess to clean up.

Here is a fully self-hostable solution that connects to PDFs in your google drive folder: https://github.com/ai-sidekick/sidekick

Uses weaviate so that even the vectorstore can be self-hosted

Does Sidekick use OpenAI, or are local models supported?
Plenty of good open-source options https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT... . LLM choice is a bit harder but the composability of it all lets you easily choose alternatives.
I also will not upload a proprietary document to this service. But mine and many other organizations do upload proprietary documents into third-party clouds (e.g., Azure, Google).
You might not dump your internal documentation or confidential files to it, but I can see something like this being very useful if you can chuck a user manual for a product into it and ask common-sense questions about the product. So many parts these days come with a multi-hundred-page, questionably-written manual that technically does contain all the required information but buries it in waffle.
Or for legal contracts ... though no-one is going to go there with a commercial product unless they can indemnify themselves somehow against erroneous answers.
But can you trust ChatGPT's explanations of a legal text?
No!
There are various degrees to self hosting: for nice outputs, you need OpenAIs APIs to generate at least the answers. There are alternatives, but not as good.

If you are interested in this, feel free to reach out to me and I can help you with setting this up.

This seems vault-ai based which has instructions to self-host:

http://github.com/pashpashpash/vault-ai

Vault still uses Pinecone as a 3rd party service and your embeddings do get sent there.