|
|
|
|
|
by usrbinbash
363 days ago
|
|
I disagree tbh. I mean, I accept that new silicon will have better power usage and probably be more efficient in terms of flops/Joule, but there would need to be a major technical breakthrought to get a logarithmic relationship between N requests and inference cost. N requests at P flops, still means I need C x P flops for C x N requests. A not-so-steep linear relationship is still linear. |
|