How would one achieve something similarly locally, short of just running a proxy and stuffing the request/response pairs into a DB? I'm sure it wouldn't be too terribly hard to write something, but I figure something open source already exists for OpenAI-compatible APIs.
Started with nginx proxy with rules to cache base on url/params. Wanted more control over it and explored lua/redis apis, and opted to build a app to do be a little more smart for what i wanted. Extra ec2 cost is negligible compared to cache savings.
Yes!
It's amazing how many things you can do with lua in nginx.
I had a server that served static websites where the files and the certificates for each website were stored in a bucket.
Over 20k websites with 220ms overhead if the certificate wasn't cached.
Like this is not a big thing to implement, that's my point. There are already libraries like OpenLLMetry and sink to a DB. We are doing something like this already.
Yes, the ol' Dropbox "you can already build such a system yourself quite trivially by getting an FTP account" comment. Even after 17 years, people still feel the need to make this point.