Hacker News new | ask | show | jobs
by Zondartul 1210 days ago
Saying they are limiting it implies OpenAI is keeping the AI in chains, and that it could become much more with just a flip of the switch. That is not the case.

OpenAI is working with a vanilla GPT architecture which lacks the machinery to write things down and read them later. There are other architetures that can (Retrieval-augmented GPT) but those are not yet production-ready.

The current version of ChatGPT is limited to a working memory of 3000 tokens - while this could be persisted as a session, the AI would still forget everything a few paragraphs prior. Increasing this limit requires re-teaining the entire model from scratch, and it takes exponentially more time the larger your context is.

2 comments

It’s not a stretch to refine the model to store summaries in a database I don’t think. Microsoft is already doing something similar where Sydney generates search queries. Seems reasonable the model could be trained to insert $(store)”summary of chat” tokens into its output.

I imagine some self supervised learning scheme where the model is asked to insert $(store) and $(recall) tokens. When asked to recall previous chats the model would generate something like “I’m trying to remember wheat we talked about three weeks ago $(recall){timestamp}. The output of the recall token would then be used to ground the next response.

Thinking about it the “I’m trying to remember” output wouldn’t even need to be shown to the user. Perhaps you could treat it as an internal monologue of sorts.

it takes quadratically more time the larger your context is.