Hacker News new | ask | show | jobs
by dlkf 1596 days ago
I'll preface this by saying that I am 100% in the camp that thinks these language models are neither intelligent nor a promising avenue towards understanding intelligence.

But your conclusion here is entirely wrong: the model clearly is learning something. From eyeballing this, the model is right about 10% of the time. If it were spitting out random digits the accuracy would effectively be zero. So exactly what is it learning? Is it memorising exactly equations that it saw in training? Is it learning ngram patterns that occur frequently in arithmetic equations?

I'm not an expert on these things and I'd love to hear from someone who is.

1 comments

I think fundamentally these models compress the learning data into network weights and connections, so in effect if the learning data was 6 + 10 = 16 and 9 + 10 = 19, then you give it 7 + 10 it'll interpolate between what it's seen or something of the sort, giving you something approximately right. It's also not lossless compression so what it may have actually inside is 9 + 10 = 18 so yeah.