GPT-4 via OpenAI is too slow -- 100 token output takes >3 seconds.
Where can I find one?
I've tried to optimize it by reducing token length and other methods, but I'm wondering if there's any better LLMs