Hacker News new | ask | show | jobs
by sentdex 1829 days ago
In the end, everything is boiling down to matrix math, so you can always make the argument that no neural network is impressive if you want.

The model's size is ~173MB, depending on settings. That's not much space to have memorized every single possible combination of events, nor was our data enough to cover that either.

4 comments

Your original self driving GTA5 videos are what helped me come to understand machine learning in the first place (along with some of Seth Bling's MarI/O, and a bit of Tom7's learn/play-fun magic). I used your tech to make an AI that played Donkey Kong Country in LSNES emulator shortly before Gym-Retro was released.

So, thanks a bunch, Sentdex. You are rad.

Hah, awesome! Any plans to apply GAN Theft Auto to something else? :o
Not offhand, but you've probably inspired a lot of creativity with this across the internet... and a lot of copy cats. I'm looking forward to seeing what gets made.
>> The model's size is ~173MB, depending on settings. That's not much space to have memorized every single possible combination of events, nor was our data enough to cover that either.

The resolution of the images output by the model is very low (what is it exactly, btw?). It's not impossible that your model has memorised at least a large part of its data.

In fact the simplest explanation of your model's output (as of much of deep neural networks for machine vision) is that it's a combination of memorisation and interpolation. There was a recent ish paper by Pedro Domingos that proposed an explation of deep learning as memorisation of exemplars similar to support vectors (if I understood it correctly - only gave it a high-level read).

It's also difficult to see from your demonstration exactly what the relation between the output and the input images are. You're showing some very simple situations in the video (go left, go right) but is that all that was in the input?

For example, I'd like to see what happens when you try to drive the car over the barrier. Was that situation in the input? And if so, how is it modelled in the output?

Finally, how do you see this having real-world applications? I don't mean necessarily right now, but let's say in 30 years time. So far, you need a fully working game engine to model a tiny part of an entire game in very low resolution and very poor detail. Do you see this as somehow being extended to creating a whole novel game from scratch? If so, how?

Edit: on memorisation, it's not necessary to memorise events, only the differences between sets of pixels in different frames. For instance, most of the background and the road stays the same during most of the "game". Again, the resolution is so low that it's not unfathomable that the model has memorised the background and the small changes to it necessary to model the input. So, it interpolates, but can it extrapolate to unseen situations that are nevertheless predicted by the physics you suggest it has learned, like driving over the barrier?

Video frame resolution is pretty small...
> The model's size is ~173MB

That is impressive! Less than twice the size of ResNet-50 weights. Surely that is within an order of magnitude of an equivalent Unity or GoDot game+models.