Hacker News new | ask | show | jobs
by Gareth321 16 days ago
Given the incredible progress of local models, on present trajectory I think we see comparable levels of performance to frontier models in two years on 128GB unified RAM and 6-bit quantisation. Note how the frontier models are now hitting superior benchmarks with only 200,000 tokens. I think we still have a long way to go with distillation.