| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by weird-eye-issue 179 days ago

They absolutely are segregated

With OpenAI at least you can specify the cache key and they even have this in the docs:

Use the prompt_cache_key parameter consistently across requests that share common prefixes. Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

2 comments

ambicapter 178 days ago

> Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

Why below a certain number? Usually in caches a high number of requests keeps the cached bit from expiring or being replaced, no?

link

weird-eye-issue 175 days ago

It needs to go to the same machine and machines can only handle so many requests

link

psadri 179 days ago

Does anyone actually compute / use this key feature? Or do you rely on implicit caching? I wish HN had a comment with a poll feature.

link

weird-eye-issue 179 days ago

It would be important to use for relatively high traffic use cases

Let's say you have a chatbot with hundreds of active users, their requests could get routed to different machines which would mean the implicit caching wouldn't work

If you set the cache key to a user id then it would be more likely each user's chat could get cached on subsequent requests

link