| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jdsully 849 days ago
	For inference the common answer will be "no", you use the model you get and it takes a constant time to process. However the truth is that inference platforms do take shortcuts that affect accuracy. E.g. LLama.cpp will down convert fp32 intermediates to 8-bit quantized so it can do the work using 8-bit integers. This is degrading the computation's accuracy for performance.

1 comments

frannyg 848 days ago

I have no freaking idea what you said in the second paragraph but I love it and it will linger in the back of my head until I understand enough to look it up.

[nodding repeatedly with a serious face and lot of resolve]

link