|
|
|
|
|
by jiggawatts
117 days ago
|
|
The next gen inference chips will use High Bandwidth Flash (HBF) to store model weights. These are made similarly to HBM but are lower power and much higher capacity. They can also be used for caching to reduce costs when processing long chat sessions. |
|