Hacker News new | ask | show | jobs
by Ozzie_osman 1173 days ago
If you (like me) were wondering how these works, the LLM is given a prompt like:

  Answer the following questions as best you can. You have access to the following tools:
  Search: Use this to search the internet.
  Calculator: Use this to do math.

  Use the following format:
  Question: the input question you must answer
  Thought: you should always think about what to do
  Action: the action to take, should be one of [{tool_names}]
  Action Input: the input to the action
  Observation: the result of the action
  ... (this Thought/Action/Action Input/Observation can repeat N times)
  Thought: I now know the final answer
  Final Answer: the final answer to the original input question

  Question: What is the age of the president of Egypt squared?
  Thought:

To which the LLM will generate a completion like:

  Thought: I need to find the age of the president of Egypt.
  Action: Search
  Action Input: Age of president of Egypt
  Observation:

At which point, the code (langchain, haystack, etc) will parse out the requested tool (Search) and input (Age of president of Egypt), and then call the right tool or API, then append the output of that action into the prompt.

This all happens in a loop, at each step, the LLM is given the entire past prompt history, and given the opportunity to do a completion to choose the next tool and input to the next tool, after which the code parses those out, executes the tool, and repeats until the LLM decides it has the final answer and returns.

7 comments

I just released something like this embedded in a browser extension. Except the prompt includes a TypeScript interface that GPT4 is asked to follow. Works very well and reliably uses tools like Calculate, RequestDOM, etc.

https://github.com/cantino/browser-friend

I did a manual version of this where I played a dispatch controller in a robot, relaying inputs and outputs from GPT4, which I told was the reasoning brain in this robot. It was very remarkable to watch its train of thought in considering sensor inputs and then giving me actions to take in response.
This looks similar to the WebGPT paper, is that referenced in any of langchain or haystack's publications?

Introducing the mechanism of internal thought is very interesting, I wonder if there's a way to make it implicit in the model's architecture.

I think the ReAct paper also popularized this approach: https://arxiv.org/abs/2210.03629
Perhaps these papers are also just coincidence. This field is so new and this type of reasoned completion chaining seems like it was an inevitability. I imagine many other active gpt products that got started early hand rolled similar systems.
Haystack's agent is indeed using the approach suggested in the ReAct paper
My understanding is that the patterns are similar (in that you're enabling an LLM to use external tools/information), and all those patterns would fall under the "agents" pattern.

But, I think the difference is that WebGPT was actually fine-tuned / retrained for its specific use case, while the agents in these libraries just use the generic model without fine-tuning. My guess (and I'm not an expert here) is that fine-tuning these models for specific agent uses cases would probably result in better outcomes... Though as the models get more powerful, they might just perform well enough out of the box. (Also, some of the most recent OpenAI models don't support fine-tuning, and even for the ones that do, you'd need to generate the data to fine tune).

Is ChatGPT plugins using something comparable to this under the hood?
Yes and no. Whatever they are doing seems more robust than anything else I have tried. Especially with being able to bring context in the conversation to later invocations of tools. I haven't managed to get langchain to do that well.
With Haystack you can also combine the use of [hosted] LLMs and smaller, local models, and different pipelines under the Agent too.
I had some fun with a similar approach, but when generating large outputs, or retrieving large contexts, it can easily run into the context window limit.

I think this could be partially solved by intelligently summarising parts of the prompt history, while storing the original in some vector db, so the relevant parts can be retrieved at will.

Quite fun.

Does observation cover reflexion? Self-observation or is that something else?

Or maybe before Final Answer, you could do: Double Check: I think I have the final answer, but does it look right? IF yes: go to final answer if no: Go back up the loop.

For those interested in an explainer in reflexion (asking the LLM if it made an error and allowing it to correct itself) I found this breakdown useful: https://youtu.be/5SgJKZLBrmg

> GPT 4 can self-correct and improve itself. With exclusive discussions with the lead author of the Reflexions paper, show how significant this will be across a variety of tasks, and how you can benefit.

(also tbc I took these example prompts from LangChain.. not sure if Haystack uses different prompts (LangChain actually has a bunch of versions, this is probably the easiest one)