|
|
|
|
|
by aurareturn
220 days ago
|
|
It seems that using hardware acceleration would decrease inference costs and they could make it up on unit economics over time.
Nvidia GPUs are accelerators too. The reason they can do this so fast is because they're storing entire models in SRAM. |
|
Is this incorrect?