Hacker News new | ask | show | jobs
by SV_BubbleTime 675 days ago
My guess since programmer blog post writing (plus autism?) assumes “Everyone already knows everything about my project because I do!”

Is this to the effect of running a local LLM, that reads your prompt and then decides which correct/specialized LLM to hand it off to? If that is the case, isn’t it going to be a lot of latency to switch models back and forth as most people usually run the single largest model that will fit on their GPU?

3 comments

No, this is a bit different. When GPT 4.o came out OpenAI also added new features that allow the models to perform actions. This allows you to do that, but locally.

The reason this is cool is because it allows you to integrate with things like Home Assistant, so you can ask your chat bot or whatever to actual take actions. "Hey bot, turn on the lights in the basement" as an example.

Nit pick, but function calling, which is essentially what Tools are(an earlier evolution), was released before GPT-4-o, in June 2023.

https://openai.com/index/function-calling-and-other-api-upda...

No that's a good call out, I got my timing a bit off there.
Llms are not a niche target and tool use is a major component. It's fair to say, as an author, Im assuming the reader has some comprehension- whether thats frm a widespread base of that the topic is only interesting to those with prior knowledge.

You wouldn't make this complaint against a JS framework blogposting about their new MVC features.

As an aside its actually incredible that these days we idly accuse people of being actual autists just because they didn't condescend to our level first.

> My guess since programmer blog post writing (plus autism?) assumes “Everyone already knows everything about my project because I do!”…

Really unnecessary and distasteful to speculate like this. Just ask your question if you don't understand something.

> Is this to the effect of running a local LLM,

Yes. That is what ollama does.

> that reads your prompt

Yes.

> and then decides which correct/specialized LLM to hand it off to?

No. It does not hand it to a correct/specialized LLM. (or in general that is not the interesting use case) It hands it off to a traditionally coded program. Something written without any AI in it. This traditionally coded program does some job for the AI agent and then returns a result to it. The AI agent using that can use the result to answer the prompt.

Imagine as an example a calculator. Imagine if you want the AI to answer the following prompt: "How much will I have to pay if I bought an apple $2 and two bananas ($3 each) and the sales tax is 3%?"

To answer that question the LLM has to perform three steps: 1; understand that the above text stands for (2+23)1.03 and 2; perform the arithmetic correctly. 3; format the answer in an appropriate way (For example "You will have to pay 8 dollars and a quarter for tax.")

You can try to train an LLM which does all 3 steps internally. It parses the input and outputs the output. But in general you will have a lot of trouble with that approach.

So instead that you train the LLM to parse the prompt and output something like "<calculator (2+23)1.03>" Then your UI intercepts this output from the LLM and recognises that it is asking for a tool to be used. In this case it is trying to use the "calculator" tool with the parameter "(2+23)1.03". So your UI doesn't display anything to the user but passes the "(2+23)1.03" to a traditionally coded binary/script. That script calculates the result using normally programmed logic. Then the UI prompts the LLM again this time the prompt contains the initial prompt text, the LLM's call for the tool, and the output of the tool. Now the LLM can just see the right response in front of it, and using the full context of the original prompt formats an answer.

What can a tool do? Anything really. It can open the pod bay door. It can reach out to a database. It can use a geo api to plan a route between two cities. It can read a wikipedia entry. It can write to a knowledge base. It can activate a nuclear bomb. Whatever is appropriate in your use case.