Hacker News new | ask | show | jobs
by ambicapter 177 days ago
> Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

Why below a certain number? Usually in caches a high number of requests keeps the cached bit from expiring or being replaced, no?

1 comments

It needs to go to the same machine and machines can only handle so many requests