Hacker News new | ask | show | jobs
by IIAOPSW 351 days ago
There's something fascinating about this, because the human ability to "transfer knowledge" (eg pick up some other never before seen video game and quickly understand it) isn't really that general. There's a very particular "overtone window" of the sort of degrees of difference where it is possible.

If I were to hand you a version of a 2d platformer (lets say Mario) where the gimmick is that you're actually playing the fourier transform of the normal game, it would be hopeless. You might not ever catch on that the images on screen are completely isomorphic to a game you're quite familiar with and possibly even good at.

But some range of spatial transform gimmicks are cleanly intuitive. We've seen this with games like vvvvvv and braid.

So the general rule seems to be that intelligence is transferable to situations that are isomorphic up to certain "natural" transforms, but not to "matching any possible embedding of the same game in a different representation".

Our failure to produce anything more than hyper-specialists forces us to question exactly is meant by the ability to generalize other than just "mimicking an ability humans seem to have".

2 comments

When studying physics, people eventually learn about Fourier transform, and they learn about quantum mechanics, where the Fourier transform switches between describing things in terms of position and of momentum. And amazingly the harmonic oscillator is the same in position and momentum space! So maybe there are other creatures that perceive in momentum space! Everything is relative!

Except that's of course superficial nonsense. Position space isn't an accident of evolution, one of many possible encodings of spatial data. It's an extremely special encoding: The physical laws are local in position and space. What happens on the moon does not impact what happens when I eat breakfast much. But points arbitrarily far in momentum space do interact. Locality of action is a very very deep physical principle, and it's absolutely central to our ability to reason about the world at all. To break it apart into independent pieces.

So I strongly reject your example. It makes no sense to present the pictures of a video game in Fourier space. Its highly unnatural for very profound reasons. Our difficulty stems entirely from the fact that our vision system is built for interpreting a world with local rules and laws.

I also see no reason that an AI could successfully transfer between the two representations easily. If you start from scratch it could train on the Fourier space data, but that's more akin to using different eyes, rather than transfer.

But, you're not really rejecting my example, you're proving it. The human ability to generalize the concept of a 2d platformer is limited to a very narrow range of "intuitive" generalizations that have deeply baked assumptions in them like "locality of action". So when we try to replicate the ability to "generalize", at some point we have to recognize that we can't "generalize in general" but rather we have to deeply bake in certain assumptions about what sorts of variations on the learned theme are possible. Mario with some sort of gimmick that still respects locality of action is doable, the fourier transform of Mario isn't.

This is a problem because we are approaching AI from an angle of no a priori assumptions about the variations on the pattern that it should be able to generalize to. We just imagine that there's some magic way to recognize any isomorphic representation and transfer our knowledge to the new variables, when the reality is we can only recognize when the domain being transferred to is only different in a narrow set of ways like being upside down or on a bent surface. The set of possible variations on a 2d platformer we can generalize well enough to just pick up and play is a tiny subset of all the ways you could map the pixels on the screen to something else without technically losing information.

We could probably make an AI that bakes in the sort of assumptions where it can easily generalize what it learns to fourier space representations of the same data, but then it probably wouldn't be good at generalizing the same sorts of things we are good at generalizing.

My point (hypothesis really) is that the ability to "generalize in general" is a fiction. We can't do it either. But the sort of things we can generalize are exactly the sort that tend to occur in nature anyway so we don't notice the blind spot in what we can't do because it never comes up.

One of my favourite examples of games that are hard to train an AI on is The Legend of Zelda for NES. Many other games of the NES era have (at least in the short term) a goal function which almost perfectly corresponds to some simple memory value such as score or x-position.

Not Zelda. That game is highly nonlinear and its measurable goals (triforce pieces) are long-term objectives that take a lot of gameplay to obtain. As far as I’m aware, no AI has been able to make even modest progress without any prior knowledge of the game itself.

Yet many humans can successfully play and complete the first dungeon without any outside help. While completing the full game is a challenge that takes dedication, many people achieved it long before having access to the internet and its spoiler resources.

So why is this? Why are humans so much better at Zelda than AIs? I believe that transfer knowledge has a lot to do with it. For starters, Link is approximately human (technically Hylian, but they are considered a race of humans, not a separate species) which means his method of sensing and interacting with his world will be instantly familiar to humans. He’s not at all like an earthworm or an insect in that regard.

Secondly, many of the objects Link interacts with are familiar to most modern humans today: swords, shields, keys, arrows, money, bombs, boomerangs, a ladder, a raft, a letter, a bottle of medicine, etc. Since these objects in-game have real world analogues, players will already understand their function without having to figure it out. Even the triforce itself functions similarly to a jigsaw puzzle, making it obvious what the player’s final objective should be. Furthermore, many players would be familiar with the tropes of heroic myths from many cultures which the Zelda plot closely adheres to (undertake a quest of personal growth, defeat the nemesis, rescue the princess).

All of this cultural knowledge is something we take for granted when we sit down to play Zelda for the first time. We’re able to transfer it to the game without any effort whatsoever, something I have yet to witness an AI achieve (train an AI on a general cultural corpus containing all of the background cultural information above and get it to transfer that knowledge into gameplay as effectively as an unspoiled Zelda beginner).

As for the Fourier transform, I don’t know. I do know that the Legend of Zelda has been successfully completed while playing entirely blindfolded. Of course, this wasn’t with Fourier transformed sound, though since the blindfolded run relies on sound cues I imagine a player could adjust to the Fourier transformed sound effects.