Hacker News new | ask | show | jobs
by ericlewis 612 days ago
Higher the precision the better. Use what works within your memory constraints.
1 comments

With serious diminishing returns. At inference time, no reason to use fp64 and should probably use fp8 or less. The accuracy loss is far less than you'd expect. AFAIK Llama 3.2 3B fp4 will outperform Llama 3.2 1B at fp32 in accuracy and speed, despite 8x precision.