| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ac29 557 days ago
	The model you are running isnt the one used in the benchmarks you link. The default llama3.3 model in ollama is heavily quantized (~4 bit). Running the full fp16 model, or even an 8-bit quant wouldnt be possible on your laptop with 64G RAM.

1 comments

Thanks - yeah, I should have mentioned that. I just added a note directly above this heading https://simonwillison.net/2024/Dec/9/llama-33-70b/#honorable...