|
|
|
|
|
by Tuna-Fish
53 days ago
|
|
Time for my daily "HBF is coming" comment. The next step for models is to put the weights on flash, connected with a very wide interface to the accelerator. The first users will be datacenters, but it should trickle down to consumer hardware eventually. A single 512GB stack is expected to cost about $200, and provide 1.6TB/s of reads. You still need some fast DRAM for the KV cache and for activations, but weights should be sitting on flash. |
|