Hacker News new | ask | show | jobs
by tarruda 650 days ago
Have you ran the model in full FP16? It is possible a lot of performance is lost when running quantized versions.