OpenAI replacement ? Some really tall claims there. What's the context length for this setup ? And any latency numbers ? If this really is what it says it is, you should shout these numbers.
They have more information at the link. This is an abstraction layer that allows anybody who is using OpenAI's API to use them instead, and they can point your API requests to any other LLM. This would be good for anybody using OpenAI but who doesn't want to have to worry about being locked in.
I mean you could click the link and see what it is lol, it's an OpenAI API layer replacement, that allows you to aim it at any llamacpp instance and gguf model