|
|
|
|
|
by porridgeraisin
252 days ago
|
|
Inference will be cheapest when run in a shared cloud environment, simply due to the LLMs roofline. Thus, most B2B use cases are likely to be datacenter based, like AWS today. Of course, cern is still going to use their FPGA hyper-optimized for their specific trigger model for the LHC, and apple is gojng to use a specialized low power ASIC running a quantized model for hello Siri, but I meant the majority usecase. |
|
I think that there are plenty of competitors in the "LLMs with open weights" space to essentially make the models a commodity, so all that is left is the compute cost and there is no way that someone will be running a datacenter in a way that is cheaper than "the computer that I already have running on my desk".