|
|
|
|
|
by stocknoob
540 days ago
|
|
> How much VRAM and inference compute is required to run 3.1-70B vs 2-70B? We aren’t trying to mindlessly consume the same VRAM as last year and hope costs magically drop. We are noticing that we can get last year’s mid-level performance on this year’s low-end model, leading to cost savings at that perf level. The same thing happens next year, leading to a drop in cost at any given perf level over time. > For training. Not for inference. GPU prices remained about the same, give or take. See: https://epoch.ai/blog/trends-in-gpu-price-performance We don’t care about the absolute price, is the cost per flop or cost per GB decreasing over time with each new GPU? —- If it isn’t clear why inference costs at any given performance level will drop given the points above, unfortunately I can’t help you further. |
|