Sure, but they didn't spend on training the model. If DeepSeek is providing the model for the same price as third parties, then it's probably still losing money when you account for the training.
Deepseek bypasses CUDA and has a few other optimisation that neither llama.cpp or vLLM support.
Furthermore, V4 pro was designed to run on 4 Huawei Ascend GPUs which are much cheaper than the nvidia setup others use, and deepseek probably also got some free hardware for their collab.
Hence it is entirely possible their inference costs are significantly lower than other providers.
Furthermore, V4 pro was designed to run on 4 Huawei Ascend GPUs which are much cheaper than the nvidia setup others use, and deepseek probably also got some free hardware for their collab.
Hence it is entirely possible their inference costs are significantly lower than other providers.