|
|
|
|
|
by menaerus
540 days ago
|
|
In August 2023, llama2 34B was released and at that time, without employing model quantization, in order to fit this model one needed to have a GPU, or set of GPUs, with total of ~34x2.5=85G of VRAM. That said, can you be more specific what are those "algorithmic" and "hardware" improvements that has driven this cost and hardware requirements down? AFAIK I still need the same hardware to run this very same model. |
|
You aren’t trying to run an old 2023 model as is, you’re trying to match its capabilities. The old models just show what capabilities are possible.