Hacker News new | ask | show | jobs
by hereforcomments 976 days ago
There will be a Hackathon at work and with my team mate we are preparing with some kind of hierarchical memory/knowledge solution.

Briefly: we tell ChatGPT what API based tools we have, explaines them in 1 sentence and where it can reach their documentation. We added documentations as endpoint. example.com/docs/main is always the starting point that returns high level overview of the app and all available endpoints to call. Every endpoint has its own documentation as well. E.g.: /geocoder has /docs/geocoder documentation endpoint that describes what it does, what input it expects and what it will return.

We also provieded ChatGPT with actions like read_docs, call_endpoint and end_conversation. An action is a structured JSON object with a set of parameters. If ChatGPT wants to interact with the mentioned resources, it emits an action, it gets executed and the answer fed back to it.

With this I can do a task like: "Get a 30 minutes drivetime polygon around 15 Bond Street, London and send it to Foster."

It plans and executes the following all alone. First it calls the geocoder to get the coordinates for the isochrone endpoint, then gets the isochrone by calling the isochrone endpoint and saves it, calls Microsoft Graph API and queries my top 50 connections to find out who Foster is and calls the MS Graph API's send mail endpoint to send the email with attachment to Foster.

It can hierarchically explore the available resources so we don't need a huge context window and we don't have to train the model either. Also we could implement multiple agents. 1 would be a manager and there could me multiple agents to perform each task and return the results to the manager. It would furthet reduce reduce the required context window.

Very likely some BS app will win the Hackathon like always like a market price predictor using Weka's multilayer perceptron with default settings but we believe our solution could be extremely powerful.

1 comments

This is interesting. Can you expand on how this gets around the context window problem? Are you thinking the agent does a one-off task rather than continuing back and forth with the user?

I do think this will be way less than having all of the functions listed to begin with though. I think the discoverability is a novel approach. Honestly, I'm surprised ChatGPT with plugins doesn't do something like this by default rather than making you pick which plugins you want at the beginning of the conversation.

First, the discoverability reduces the required context window. We don't have to explain every app we have, it's enough to tell ChatGPT one sentence about them and it will go deeper if it thinks that would help it to perform the task.

Also, we have not implemented it, we can have one or multiple level of managers just like at a company and each would delegate a task to a worker (who could also be a manager) and they would report back the result. Just like in real life, a manager doesn't have to know how something is done, it should only know it's done and the get the results.

We work for a large company and very likely have 100s of apps. We could build wrappers around them e.g. using selenium and we could interact with even old apps.

We could also do the same approach with databases. The db itself would have docs, each table and each field as well. So we could ask ChatGPT to query data from the db and it could fully understand the data before writing the sql query.

I've written about some hierarchical manager system with some friends when exploring how to use AI for larger set tasks. While the easy answer is simply using something with much larger context - `Claude` is amazing with an API key if you're on the waitlist - we definitely followed the same idea of splitting up the context into individual groups.

We had some success actually with layering another AI into the mix - having one AI look at a summary version of the context as a whole, and decide which pieces of context to assign to each manager. This of course requires a sidestep into another database of some kind to store the "master context" (AKA the full conversation, so you likely already have it in some form of storage), and of course a lot more calls to the AI which overall increases latency quite a bit.

1. Use an AI to provide a short summary of each piece of logical context and map it by access ID 2. Use another AI to determine which pieces contain the most useful additional context to the piece of the task being evaluated 3. Build the context from the generated ID list and pass to individual task manager AI