Hacker News new | ask | show | jobs
by eapriv 405 days ago
Great, we can spend crazy amount of computational resources and hand-holding in order to (maybe) reproduce three lines of code.
3 comments

The significance of this is that we can fully understand this problem because it’s only 3 lines of code.

Like for learning the English language we don’t fully understand the way LLMs work. We can’t fully characterize it. So we have debates on whether the LLM actually understands English or understands what it’s talking about. We simply don’t know.

The results of this show that the transformer understands the game of life. Or whatever the transformer does with the rules of the game of life it’s safe to say that it fits a definition of understanding as mankind knows it.

Like much of machine learning where we use the abstraction of curve fitting to understand higher dimensional learning we can do the same extrapolation here.

If the transformer understands the game of life then that understanding must translate over to the LLM. The LLM understands English and understands the contents of what it is talking about.

There was a clear gradient of understanding before understanding the game of life hit saturation. The transformer lived in a state where it didn’t get everything right but it understood the game of life to a degree.

We can extrapolate that gradient to LLMs as well. LLMs are likely on that gradient, not yet at saturation. Either way, I think it’s safe to say that LLMs understand what they are talking about. It’s just that they haven’t hit saturation yet. There’s clearly things that we as humans understand better than the LLM.

But let’s extrapolate this concept to an even higher level:

Have we as humans hit saturation yet?

It's a theoretical result to help determine what they're capable of, not a practical solution. Of course you can write the code yourself - but that's not the point!
Well, you could also implement this by hand-writing weights for one convolution layer.

There are only 512 training examples needed for that, and it would be a lot more interesting if a learning algorithm were able to fit that 3x3 convolution layer from those 512 examples. IIRC, and don't quote me on that, but that's not been done.

Exactly my thoughts. This is not useful at all. We already know how to write exact and correct code to implement that. This is no task that we should throw ANNs at.
Basic research has non-obvious utility and it deserves its own spotlight.

It’s similar to comparing hardware radio and software-defined radio: Yes, we already know how to build a radio with hardware but a software-defined one offers greater flexibility.