|
|
|
|
|
by throwaway1249
922 days ago
|
|
This comparison is not fair, since the VRAM in the RTX4090 is not enough to hold the whole model in VRAM. I have tested llama.cpp both on an M2 and in a 4090: - The prompt ingestion time in M2 is pretty slow.
- The extra memory of the M2 allows one to try more interesting models (Mixtral) and run multiple models at the same time. |
|