Hacker News new | ask | show | jobs
by ac29 557 days ago
The model you are running isnt the one used in the benchmarks you link.

The default llama3.3 model in ollama is heavily quantized (~4 bit). Running the full fp16 model, or even an 8-bit quant wouldnt be possible on your laptop with 64G RAM.

1 comments

Thanks - yeah, I should have mentioned that. I just added a note directly above this heading https://simonwillison.net/2024/Dec/9/llama-33-70b/#honorable...