|
|
|
|
|
by daemonologist
23 days ago
|
|
If this is accurate it raises the question: why is this model so expensive? DeepSeek v4 Flash is 284B total/13B active, FP4/FP8 mixed, and only costs $0.14/$0.28 - even less from OpenRouter. Of course Gemini 3.5 Flash is most likely a better product, and therefore it can command a higher price from an economics perspective, but does this imply Google is taking roughly a 90% profit margin on inference? If so they're either very compute-limited or confident in the model and wanting to recoup training/fixed costs (or both). |
|
I think it’s pure economics. Flash models are OP for the price, leads to too much demand, google cannot serve it. This is likely expensive to reduce load and hey, if it still makes money just keep the margin.