|
|
|
|
|
by 6r17
32 days ago
|
|
Very cool work ! I'm running harness system myself and could measure improvement of token use of 2x to 10x on gsm8k only by running a math harness - i'm confident the future is bright for people who will know how to sell tech that is appropriately scaled to one's need. We absolutely do not need to run Claude 123 for most tasks and we better prepare for the rag-pull ! |
|
I gave it 3 simple changes to make. It did it perfectly.
Then I tried with a much smaller model. It also did it perfectly, except 3x faster and 9x cheaper.
I used to think "best model" was what's at the top of the benchmarks, but for most tasks that just means you're going to wait longer and pay more money. The right model depends on the job.
(Also, speed itself is a feature -- when you get the really fast models, it enables a kind of real-time interactive usage that is otherwise not possible in the "alt tab and hope it's done" workflow.)