Looking for fast GPT-4 level LLM usable via API. (not via OpenAI)

Y	Hacker News new \| ask \| show \| jobs

2 points by eerop 817 days ago

Hey, I'm looking for a fast LLM to use via API, that's as good or almost as good as GPT-4, better than GPT-3.5-turbo at least.

GPT-4 via OpenAI is too slow -- 100 token output takes >3 seconds.

Where can I find one?

2 comments

Use streaming ?

Not possible, unfortunately. The thing I'm building on top of doesn't make it possible. I need it all at once.

Claude Haiku.

Thanks. I just tried it, it's definitely fater, but still, sometimes it takes >3 seconds (my app requires the completion to be done in <3 seconds).

I've tried to optimize it by reducing token length and other methods, but I'm wondering if there's any better LLMs