Hacker News new | ask | show | jobs
by snovv_crash 467 days ago
Do you have any other logic puzzles you could use to see if the performance generalises?
1 comments

To be honest, I don't expect the performance to generalize to other task types with this specific training regime. If we had a panel of like 30 logic puzzles and cross-trained against all of them simultaneously it might though.

I think there's a lot of benefit to discovering a training regime that allows small specialized models to do extremely well in one narrow task; if we can figure out how to make small models that beat SOTA on a specific task and are cheap to train and run, that's in some ways a more useful outcome than a very large model that is good at many tasks (but is more expensive to run for each of them).

The question to me if you can call that deduction in that case. Isn't it just a type of pattern matching that fits this particular task?
Once the problem gets narrow enough, do you risk training a model that reinvents a straightforward classic algorithm at far higher cost?
Well, in this case there is a much more straightforward method with the same CP-SAT solver used to create the puzzles. This is more of a fun experiment to see if we can train LLMs to solve these kinds of logical deduction problems.