Hacker News new | ask | show | jobs
by 6r17 32 days ago
Very cool work ! I'm running harness system myself and could measure improvement of token use of 2x to 10x on gsm8k only by running a math harness - i'm confident the future is bright for people who will know how to sell tech that is appropriately scaled to one's need. We absolutely do not need to run Claude 123 for most tasks and we better prepare for the rag-pull !
1 comments

A while back when the latest Big Model came out, very impressive benchmarks, I tested it on some coding tasks.

I gave it 3 simple changes to make. It did it perfectly.

Then I tried with a much smaller model. It also did it perfectly, except 3x faster and 9x cheaper.

I used to think "best model" was what's at the top of the benchmarks, but for most tasks that just means you're going to wait longer and pay more money. The right model depends on the job.

(Also, speed itself is a feature -- when you get the really fast models, it enables a kind of real-time interactive usage that is otherwise not possible in the "alt tab and hope it's done" workflow.)

Definitely! A lot of tasks are within reach of small models, much more than people would think. Big models still shine in vague contexts or for breadth, or for very long running tasks, but yeah. The small ones just need help on longer multi-step workflows.

What small models have you used most/found most stable?