Hacker News new | ask | show | jobs
by cjbprime 1056 days ago
The limit here is the "context window" length of the model, measured in tokens, which will quickly become too short to contain all of your previous conversations, which will mean it has to answer questions without access to all of that text. And within a single conversation, it will mean that it starts forgetting the text from the start of the conversation, once the [conversation + new prompt] reaches the context length.

The kind of hacks that work around this are to train the model on the past conversations, and then rely on similarity in tensor space to pull the right (lossy) data back out of the model (or a separate database) later, based on its similarity to your question, and include it (or a summary of it, since summaries are smaller) within the context window for your new conversation, combined with your prompt. This is what people are talking about when they use the term "embeddings".