| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dlcarrier 5 days ago

That's a big if.

The big commercial models seem to gain far more from pre-processing than they do from size, and you can already run pretty useful models on desktop hardware.+

Check out this video about how DeepMind significantly improved performance: https://youtu.be/Dkqzqw8rxXI They basically ran the LLM tuning through an old-school genetic or annealing style algorithm and trounced what a larger model could do alone.