|
|
|
|
|
by sharms
1040 days ago
|
|
The problem is memory bandwidth rather than CPU cores: "Memory bandwidth is the limiting factor in almost everything to do with sampling from transformers. Anything that reduces the memory requirements for these models makes them much easier to serve" |
|