| It's not really available for real use. I have tried it on "Azure AI foundry" through their serverless API with a paid subscription. It takes 80s to answer a basic question that was answered in 7s by OpenAI gpt-4o. And there was not that much thought process, it was just super slow to output each token. I guess this slowness is explained by the pricing, they are still figuring out how to run the inference for this model: > DeepSeek R1 use is currently priced at $0, and use is subject to rate limits which may change at any time. Pricing may change, and your continued use will be subject to the new price. The model is in preview; a new deployment may be required for continued use. There is also a hard limitation of 4k tokens as input context (context window on DeepSeek model is 120k tokens), which prevents using it for RAG use-cases: > Message: Request body too large for deepseek-r1 model. Max size: 4000 tokens. Also the documentation and python type hints of their inference lib have a lot of straight up errors in it (they are confusing the class attributes `model=` and `model_name=` at many places in the docs, spoiler: the good one to use is `model_name=`, even if the type hints recommend to use `model=`). I have also tried with more stable models like Mistral Large, but the streaming feeling is really bad, they are sending whole sentences at a time, with multiple seconds of wait between each sentence. Does not feel smooth at all compared to any other provider out there. Would not recommend Azure AI foundry for production use (or any use to be honest). Does not worth the pain to navigate the documentation. We will be using directly DeepSeek API, or fireworks.ai, or together.ai. |