| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mindcrime 542 days ago

Not sure I follow you. If you're building a computer system where the "user" is intended to be another computer, why would you need anything besides an API? Or are you talking about a system where both humans and other computers are meant to consume it's services? If so, that would be somewhat interesting I guess.

But it seems to me that if you're going to use an LLM to "use" some other software, the way to go is use tool-calling support to call an API, and/or something like Anthropic's MCP (Model Context Protocol) stuff. There's some exiting work to, around "agent to agent" communications that one could use to integrate one kind of AI system with another computer system (whether or not the other system has any AI abilities). They range from things like FIPA-ACL, KQML, KIF, etc., through all the SemanticWeb standards, to some more recent specs that are being worked on. For example, the forthcoming ECMA TC56[1][2] standard for Natural Language communication between agents. And a similar'ish effort is mentioned in a recent arXiv paper[3].

[1]: https://ecma-international.org/technical-committees/tc56/

[2]: https://github.com/nlip-project

[3]: https://arxiv.org/abs/2411.05828v1

1 comments

itstomo 542 days ago

Hi, thank you for your feedback.

Tool calling is very helpfull when humans can manually define those functions. However, it's also very limited because humans have to define those functions manually unless the function is very simple without any external resources such as accessing an external API with API keys, etc.

Someday, we'll have AI agents that work like our secretaries or employees. But just like a CEO doesn't have to know how to use all the apps that his employees use, some apps would be created only for AI agents. In such a scenario, the interface of these apps would look very different from HCI. For example, they most likely wouldn't use vision or sound as a part of the main UI, unlike HCI.

Interestingly, most apps can be re-rewritten as a Decision Tree. If you push this button you'll be navigated to this page, where a different list of actions is available, etc. I think an AI-Computer Interface if it is ever invented, might look like a text-based Decision Tree without vision or sound unless they are absolutely necessary.