Hacker News new | ask | show | jobs
by infogulch 1595 days ago
Would it do better if you asked it to "show its work"? I.e. work it out in long form, one step at a time, like you'd ask a school kid to do. Maybe an example prompt would look like this:

    Work out 2241 + 19873.
    02241 + 19873 ~ ____4
    02241 + 19873 ~ ___14 carry 1
    02241 + 19873 ~ __114 carry 1
    02241 + 19873 ~ _2114 carry 1
    02241 + 19873 = 22114.
I'm not sure what is the best way to represent each step including details like carry digits. And you'd have to design a separate scheme for each operation.

If these models are symbol manipulators maybe the key is to break down the task into steps that are closer to being solvable with symbol manipulation.

1 comments

I tried something like that for 3-digit multiplication with GPT-3 in another comment[1], successfully. You have to lay things out different manner than you did here, because GPT-*s have no sense of layout on a page; their byte-pair encoding destroys their ability to learn it efficiently. Further, transformers are optimized to look for things via similarity, because that's what typically occurs in text, so you're better off writing out things it can anchor off of.

There are ways to fix these issues, but BPEs micro-optimize for the primary text benchmarks that papers want good scores on so those are standard for now. I'm sure they'll get replaced eventually, once the costs outrun the wins and more scalable (alternatives to?) transformers become popular.

[1] https://news.ycombinator.com/item?id=30309302