We seriously need a service that is as cheap and fast as the OpenAI/Anthropic APIs but allow us to run the various community-fine-tuned versions of Mixtral and LLaMA 3 that are not/less censored.
Such services already exists. I don't want to promote any in particular, but if you do a research on pay-as-you-go inference of e.g. mixtral or llama3 you will find offerings that offer an API and charge just cents for XY amount of tokens, exactly as OpenAI does.
* https://www.together.ai/
Here are all the models:
* https://docs.together.ai/docs/chat-models
* https://docs.together.ai/docs/language-and-code-models