Hacker News new | ask | show | jobs
by kcorbitt 469 days ago
To be honest, I don't expect the performance to generalize to other task types with this specific training regime. If we had a panel of like 30 logic puzzles and cross-trained against all of them simultaneously it might though.

I think there's a lot of benefit to discovering a training regime that allows small specialized models to do extremely well in one narrow task; if we can figure out how to make small models that beat SOTA on a specific task and are cheap to train and run, that's in some ways a more useful outcome than a very large model that is good at many tasks (but is more expensive to run for each of them).

2 comments

The question to me if you can call that deduction in that case. Isn't it just a type of pattern matching that fits this particular task?
Once the problem gets narrow enough, do you risk training a model that reinvents a straightforward classic algorithm at far higher cost?
Well, in this case there is a much more straightforward method with the same CP-SAT solver used to create the puzzles. This is more of a fun experiment to see if we can train LLMs to solve these kinds of logical deduction problems.