Some of these models are open weight. You can try hosting them and do the price calculation yourself.
They also publish papers talking about how to save kv cache and computation powers. Because currently they don't have the most powerful nvidia cards, training and inference efficiency is very import for them.
They also publish papers talking about how to save kv cache and computation powers. Because currently they don't have the most powerful nvidia cards, training and inference efficiency is very import for them.