Hacker News new | ask | show | jobs
by wavemode 486 days ago
There are technical limitations, sure, (getting an AI to parse a screen and interact with it via mouse and keyboard is harder than it sounds - and it sounds hard to start with) but the main limitation is still economical. Does it really make sense to train a multi-billion-parameter AI to click buttons, if you could instead just make an API call?

There's an intersection between "high accuracy" and "low cost" that AI has not quite reached yet for this sort of task, when compared to simpler and cheaper alternatives.

1 comments

People are using huge capable LLMs to answer things like "what's five percent of 250"; I don't see a big leap in using them to skip APIs.

On the other side, a lot of user access methods are more able than an API call equivalent, people already exploit things like autohotkey to work around such limitations -- if people are already working around things that way that must indicate the presence of some sort of market.