|
|
|
|
|
by stavros
22 days ago
|
|
We deployed OpenWebUI with the Claude API the other day for employees. Someone sent ten messages (which appeared to just be reasonable day-to-day work), and we paid $200 for it. There were 44M input tokens, 100k output tokens, no cache hits at all. OpenWebUI reports 3M tokens used, Claude reports 44M, and I have no idea where the rest of the tokens went. This was all on a brand new API key, installed directly to the service, too. With this kind of opaque billing, how can I reasonably deploy any AI? |
|
I'm only doing a cursory search, but it seems OpenWebUI doesn't support Anthropic caching, and they don't intend to? Other providers handle caching automatically (apparently?) but caching has to be specifically managed by the client with Anthropic. If that's correct that OpenWebUI doesn't support it, it would really send your costs spiralling, because you're being billed for all the tokens in the entire multi-turn conversation on every turn:
https://github.com/open-webui/open-webui/issues/4887
I have no experience with OpenWebUI though (honestly, first time I've heard of it). Just trying to be helpful. If I'm completely incorrect then apologies in advance for sending you down the wrong path.