| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by QuantumCodester 1015 days ago
	Really interesting — can you explain a bit more about how the long-term memory works?

1 comments

raunakchowdhuri 1015 days ago

For sure!

When you send an OpenAI request, after a delay (to ensure the user doesn't keep chatting in the same session), a secondary GPT 3.5 call is made to "autosave" the result. This GPT call gets the information from the current chat session as well as other similar entries in the vector database.

The structured output from this call is used to do either insert a new memory or update an existing memory in the vector database. At query time, we do a search of the vector database and quickly insert relevant context into the system prompt.

I like to consider this approach "dynamic" retrieval augmented generation, as the vector database is constantly changing as conversations occur.

link

ProofHouse 1014 days ago

Can we easily extract all the customer data at anypoint to put into a new vector db? Also is there any restriction for how many customers *ideally not, and does this essentially double the GPT3.5 costs?

link

raunakchowdhuri 1014 days ago

Yep - full export will always be supported for your data.

No customer # restriction.

I have some optimizations in the works such that the secondary gpt 3.5 call only gets triggered when one iterative conversation thread ends (determined via a timeout). GPT 3.5 is really cheap though, so shouldn’t be a huge deal. Even if you chat with gpt4 we still use 3.5 for the memory consolidation and updates.

link