|
|
|
|
|
by cschneid
217 days ago
|
|
so apparently they have custom hardware that is basically absolutely gigantic chips - across the scale of a whole wafer at a time. Presumably they keep the entire model right on chip, in effectively L3 cache or whatever. So the memory bandwidth is absurdly fast, allowing very fast inference. It's more expensive to get the same raw compute as a cluster of nvidia chips, but they don't have the same peak throughput. As far as price as a coder, I am giving a month of the $50 plan a shot. I haven't figured out how to adapt my workflow yet to faster speeds (also learning and setting up opencode). |
|