Hacker News new | ask | show | jobs
by mahmoud-almadi 315 days ago
Are you referring to the LLM being used or where the actions (click, type, etc) are being executed? The actual actions can be executed on any windows machine, so the actual execution can take place locally on your device. The LLMs we're using right now are cloud LLMs. We haven't done an LLM self hosting option yet. Can I ask what reservations you have about running in the cloud? We have zero-date retention signed with our LLM vendors, so none of the data getting sent to them ever gets retained.
2 comments

If this can't run full-local, isn't that basically a botnet? You're talking about installing a kernel-level driver that receives instructions on what to do from a cloud service.
Great point! Yes you are correct in that the actual "agent" lives in the cloud and its actions are executed by a proxy running on the desktop. Hopefully at some point we can set up a straightforward installation procedure to have the AI models running entirely on the desktop, but that's constrained by desktop specs for now. VMs and desktops with the specs to handle that would be prohibitively expensive for a lot of teams trying to build these automations.
Out of curiosity, what would the minimum specs need to be in order to run this locally?

My PC is just good enough to run a DeepSeek distill. Is that on par with the requirements for your model?

There isn't a viable computer use model that can be ran locally yet unfortunately. Am extremely excited for the day that happens though. Essentially the key capability that makes a model a computer use model is precise coordinate generation.

So if you come across a local model that can do that well, let us know! We're also keeping a close watch.

Haven’t looked into them much but I thought the Chinese labs had released some for this kind of thing
You are correct in that ByteDance did releas UI-TARS which sounds like a really good open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
What would it take to train your own?
I don't know too much about training your own computer use model other than it would probably be a very hefty, very expensive task.

However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!

I'm talking about the LLM (and any other infrastructure involved). Reasons are:

- Pricing. If I grow to do this at scale, I don't want to be paying per-action, per-month, per-token, etc.

- Privacy. I don't want my data, screenshots, whatever being sent to you or the cloud AI providers.

- Control. I don't want to be vulnerable to you or other third parties going bankrupt, arbitrarily deciding to kill the product or it's dependencies, or restructuring plans/pricing/etc. I also want to be able to keep my day to day operations running even if there's a major cloud outage (that's one reason we're still using this "old fashioned", non-cloud software in the first place).

I think I'm simply not your target market.

I advise several companies who could be (they run "legacy" software with vast teams of human operators whose daily tasks include some portion of work that would be a good candidate for increased automation), but most of them are in a space where one or more of the above factors would be potential deal breakers.

The retention agreements between you and your vendors are great (I mean that sincerely), but I'm not party to them so they don't do anything for me. If you offered a contractual agreement with some teeth in it (eg. underwritten or bond-backed to the tune of several digits, committing to specific security-related measures that are audited, with a tacit acknowledgement any proven breach of contract in and of itself constitutes damages) it could go a long way to address the privacy issues.

In terms of pricing it feels like the core of your product is an outside vendor's computer-operating AI model, and you've written a prompt wrapper and plumbing around it that ferries screenshots and directives back and forth. This could be totally awesome for a small scale customer that wants to dip their toes into AI automation and try it out as a turnkey solution. But the moat doesn't seem very big, and I'd need to be convinced it's a really slick solution in order to favour that route instead of rolling my own wrapper.

Please don't take this the wrong way, it's just one datapoint of feedback and I do wish you luck with your venture.

These points you're making are excellent!

Self hosting is inevitably a part of our roadmap. Cyberdesk will have a future where we host our entire agentic framework on your own servers. AI models and the whole backend included.

I can totally see myself having the same preferences as you if I were you with regards to cost, privacy, and control.

The unique value in Cyberdesk lies beyond being a wrapper around a computer use AI model. Our intelligence caching is built on large evals that help us produce prompts that are highly reliable for the intelligent caching to work well in the first place. On top of that there are several tools that allow the agent to be useful (import/export files, failsafes, taking actions using data that was read during the same run). Rebuilding Cyberdesk, while possible, will require several weeks at the very least of very rapid iteration. So for a dev team that wants to build the best computer use agent in the world, I guess that's doable. But for a team trying to be the best "X" in their particular industry, it's probably going to be a time sink that will take away from their ability to compete well in their space, hence why Cyberdesk is a great choice for them.

I hope you keep an eye on what we're doing! I really like your insights here and I'm curious to see what you think as we evolve over the next months and years. Maybe when we do full self hosting you'll be a customer :)