|
|
|
|
|
by michaelt
914 days ago
|
|
As I understand things, these LLMs are mostly constrained by memory bandwidth. A respectable desktop CPU like the Intel Core i9-13900F has a memory bandwidth of 89.6 GB/s [1] An nvidia 4090 has a memory bandwidth of 1008 GB/s [2] i.e. 11x as much. Using these together is like a parcel delivery which goes 10 miles by formula 1 race car, then 10 miles on foot. You don't want the race car or the handoff to go wrong, but in terms of the total delivery time they're insignificant compared to the 10 miles on foot. I'm not sure there's much potential for cleverness here, unless someone trains a model specifically targeting this use case. [1] https://www.intel.com/content/www/us/en/products/sku/230502/...
[2] https://www.notebookcheck.net/NVIDIA-GeForce-RTX-4090-GPU-Be... |
|