Hacker News new | ask | show | jobs
by Fischgericht 251 days ago
Offer self-hosted and I would buy.

Do not assume that companies are willing to put ALL of their intellectual property into your hands. Even if you would not be some startup where any sysadmin could steal and sell my data any time without you even noticing it, you will get hacked just like everyone else that stores interesting data. The data you have access to is absolutely perfect for the global data blackmailing gangs. As soon as you are successful, you will have every black hat hacker and their dog knocking on your doors.

3 comments

Onyx.app has a self hosted option. I just did the docker setup yesterday. It’s not a great home user option imo but seems like it’s functional for enterprise.
Just had a quick look - while they have that self-hosting option, they still assume you will use a cloud LLM. I started digging because I got confused of them not mentioning any GPU when it comes to resource requirements. There is some documentation on using it fully self-hosted including the LLM, but the emphasis here is on "some".

To be clear: I am looking at this from a CEO perspective, not a "I will play with it in my spare time" nerd one.

Going off on a tangent here but, does anyone have a good guide on how to set up an AI/LLM/GPT chatbot/agent for a small business?

Not looking to spend millions but a couple thousand are alright. TIA

Open-webui works well for us: https://github.com/open-webui/open-webui
u/tinodb, thanks
Hmm I thought this would be a competitive space already i.e. integrating llm chat bots on any website
Yep, makes a lot of sense. We architected our system to be easy to self-host & open-source in the future for this very reason, though we decided to launch with hosted because it's easier to improve and iterate.
Understood. Not my startup, but I would have started the other way round.

Businesses that would be willing to pay (a lot) for such a benefit often will be very conservative. In Germany the majority of medium sized businesses using SAP for example still refuse to be moved to SAP's cloud instead of on-premise.

C-Level types typically are not worried putting their email credentials etc into Outlook cloud and getting hacked this way. They are used to "everything is in the cloud". However, as soon as you mention, depending on the type of business "patents", "sales contacts", "production plans" C's will change their mind.

In Germany, where I am originally come from, all of these businesses are worried about their trade secrets to end up in China, and rightly so.

As self-hosting is very complex you could either make good money with consulting (but this means setting up tech teams in all target markets around the globe, using actual competent humans), or by selling it as a plug&play appliance. With that appliance simply being a rack server with a suitable GPU installed.

And again, for your business strategy the long-term risk of pretty much everyone trying to hack you on a daily basis appears too high to me. You might not have on your radar how serious industry spionage is. You will definitely have a fake utility company worker coming into your offices, trying to plug in a USB keylogger into some PC while nobody is looking.

As an example, proven strategy: Find targets internet uplink. Cut it. Customer calls ISP for help. You then send a fake ISP technician that arrives before the real one does. You put a data exfiltration dongle between the modem and the LAN. You then fix the cut outdoor line. Customer is happy that you have fixed it. Later the actual ISP guy arrives. Everyone will be a bit confused that the problem was already fixed, but then agree that it's probably just the ISP once again having screwed up their resource management. Works pretty much every time.

> You put a data exfiltration dongle between the modem and the LAN.

Sounds interesting, and could be used in a movie, but it doesn't look like it is practically applicable in real life. You will have a hard time making sense of the data without full-MITM'ing with SSL decryption, installing your CA certificate on all machines and browsers on the LAN, and solving the certificate pinning problem.

A USB keylogger may be a simpler solution even though it can't sniff the whole LAN.

Well, as this is standard practice the movie would be a ... documentary? ;)

I wasn't clear here enough: The device at this point enables you to typically see all devices on the LAN and WLAN on L2. Which means you can do ARP spoofing and all that kind of stuff. One of the first things you then would look at is what printers are available to infect. People often print interesting things :)

And yes, of course the USB keylogger is the cheap lazy solution. These days due to second factors not that useful as it used to be, but still... you can deploy it in seconds pretty much in every office, shop or governmental institutions.

But to not further drift into off-topic:

I am serious about all this. Should Grapevine be successful and for example one day put out a press release like "Procter & Gamble is now using our services", you will have in addition to state actors (China, Russia, Israel) a thousand kids looking up that P&G makes a profit of $15 Billion or whatever per year, and that they surely will pay 1% of that for not having all of their company data published.

If you look at existing knowledge management system that are deployed in physical-world-companies, you will see that they actually are not allowed to index all the data, but as you would be running against a lot of laws and management best practices if in the next coffee brake everybody would laugh about poor Tony who once had a really stupid concept, created a draft document of it, but then noticed that it won't work and make him look like a fool.... Thought not giving it to his manager would solve that "problem", but it got indexed as company knowledge..

So, erm, yeah: Existing knowledge management systems to a large extend are about NOT sharing knowledge.

Sorry for this raw brain dump of mine into this thread :)

Two things to think about:

a) Due to privacy laws, no European country would right now be allowed to use your service. The data your customers wants to index will always contain stuff that allows to identify a human, and once you are there it's basically "game over" for handing over data to a third party provider like you.

b) My organization is tiny. But we are in a sector were we must be ultra paranoid when it comes to security. We do not use a single external service whatsoever, everything is self-hosted. I would love to be able to AI-index all of our collected knowledge and would pay for the value this provides. So far have been unable to find any plug & play solution. Then open source nature you have mentioned is important so that your system security can be be validated, but in the end I would rather want to pay for it being plug&play AND on-premise AND open source.

consider allowing customers to deploy into their own AWS/Azure infra as a managed service. Your CICD can reach the deployment and you will be one step closer to enterprise customers.
u/eambutu, any timeline for the self-hosted version?

Also willing to buy.

controlcore.io was brought to the market for the same exact reason. Not AI Powered, but to control AI and its interactions with your Data, APIs,. Applications etc. And yes, we just give our service as a self-hostable solution. However is the encryption and SOC compliance be, we want our clients to know that none of their internal data or interaction transaction leave their control.