|
> I was never under the impression that gaps in conversations would increase costs nor reduce quality. Both are surprising and disappointing. You didn't do your due diligence on an expensive API. A naïve implementation of an LLM chat is going to have O(N^2) costs from prompting with the entire context every time. Caching is needed to bring that down to O(N), but the cache itself takes resources, so evictions have to happen eventually. |
You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used.