Hacker News new | ask | show | jobs
by hoschicz 1110 days ago
Absolutely. I am guessing they quantized the model (run it not in 32-bit but say 8-bit, saves resources). Just like they did with 3.5-turbo.