|
|
|
|
|
by aurareturn
299 days ago
|
|
Again, prompt processing isn't the major problem here. It's bandwidth. 256GB/s bandwidth (maybe ~210 in real world) limits the tokens per second well before prompt processing. Not entirely sure how your ARM statement matters here. This is unified memory. |
|