|
|
|
|
|
by remorses
1144 days ago
|
|
Using the CPU with quantized weights on GPT models makes sense, an example is llama.cpp, that’s because these models are constrained by memory bandwidth and not compute (low arithmetic density) https://github.com/ggerganov/llama.cpp |
|