Hacker News new | ask | show | jobs
by krisoft 675 days ago
> My guess since programmer blog post writing (plus autism?) assumes “Everyone already knows everything about my project because I do!”…

Really unnecessary and distasteful to speculate like this. Just ask your question if you don't understand something.

> Is this to the effect of running a local LLM,

Yes. That is what ollama does.

> that reads your prompt

Yes.

> and then decides which correct/specialized LLM to hand it off to?

No. It does not hand it to a correct/specialized LLM. (or in general that is not the interesting use case) It hands it off to a traditionally coded program. Something written without any AI in it. This traditionally coded program does some job for the AI agent and then returns a result to it. The AI agent using that can use the result to answer the prompt.

Imagine as an example a calculator. Imagine if you want the AI to answer the following prompt: "How much will I have to pay if I bought an apple $2 and two bananas ($3 each) and the sales tax is 3%?"

To answer that question the LLM has to perform three steps: 1; understand that the above text stands for (2+23)1.03 and 2; perform the arithmetic correctly. 3; format the answer in an appropriate way (For example "You will have to pay 8 dollars and a quarter for tax.")

You can try to train an LLM which does all 3 steps internally. It parses the input and outputs the output. But in general you will have a lot of trouble with that approach.

So instead that you train the LLM to parse the prompt and output something like "<calculator (2+23)1.03>" Then your UI intercepts this output from the LLM and recognises that it is asking for a tool to be used. In this case it is trying to use the "calculator" tool with the parameter "(2+23)1.03". So your UI doesn't display anything to the user but passes the "(2+23)1.03" to a traditionally coded binary/script. That script calculates the result using normally programmed logic. Then the UI prompts the LLM again this time the prompt contains the initial prompt text, the LLM's call for the tool, and the output of the tool. Now the LLM can just see the right response in front of it, and using the full context of the original prompt formats an answer.

What can a tool do? Anything really. It can open the pod bay door. It can reach out to a database. It can use a geo api to plan a route between two cities. It can read a wikipedia entry. It can write to a knowledge base. It can activate a nuclear bomb. Whatever is appropriate in your use case.