OpenRouter is generally a good option (already mentioned), the best part is that you have a unified API for all LLMs, and the pricing is the same as with the providers themselves. Although for OpenAI/Anthropic models they were forced (by the respective companies) to enable filtering for inputs/outputs.
Both already mentioned, but I am using Anyscale Endpoints with great success, very fast and will work on ten jobs at a go out of the box. Together.ai also seems to work fine in my initial tests, but haven't tried it at scale yet.
I work for Groq and we serve the fastest available version of Mixtral (by far) and we also have a web chat app. I'll refrain from linking it because it has already been linked and I don't want to spam, but I'm available to answer any questions people have about Groq's hardware and service.