|
|
|
|
|
by amazingamazing
514 days ago
|
|
each card is not 20x more useful lol. there's no evidence yet that the deepseek architecture would even yield a substantially (20x) more performant model with more compute. if there's evidence to the contrary I'd love to see. in any case I don't think a h800 is even 20x better than a h100 anyway, so the 20x increase has to be wrong. |
|
Also, everything we know about LLMs points to an entirely predictable correlation between training compute and performance.