| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by godelski 655 days ago

  > An ideally trained network could, in principle, learn the data-generating program

No disagreement

  > I might have a NN that naively looks like it takes up GBs of space, but it might actually be parameterizing a much simpler function (hence our ability to prune/compress the weights without performance loss - most of the capacity wasn't being used for any interesting computation).

Also no disagreement.

I suggested that this probably isn't the case here since they tried distillation and saw no effect. While this isn't proof that this particular model can't be compressed more it does suggest that it is non-trivial. This is especially true given the huge difference in size. I mean we're talking about 700x...

Where I think our disagreement is in that I read the OP as saying __this__ network. If we're talking about a theoretical network, well... nothing I said anywhere is in any disagreement with that. I even said in the post I linked to that the difference shows that there's still a long way to go but that this is still cool. Why did I assume OP was talking about __this__ network? Well because we're in a thread talking about a paper and well... yes, we're talking about compression machines so theoretically (well not actually supported by any math theory) this is true for so many things and that is a bit elementary. So makes more sense (imo) that we're talking about this network. And I wanted to make it clear that this network is nowhere near compression. Can further research later result in something that is better than the source code? Who knows? For all the reasons we've both mentioned. We know they are universal approximators (which are not universal mimicers and have limits) but we have no guarantee of global convergence (let alone proof such a thing exists in many problems).

And I'm not sure why you're trying to explain the basic concepts to me. I mentioned I was an ML researcher. I see you're a PhD at Oxford. I'm sure you would be annoyed if I was doing the same to you. We can talk at a different level.

1 comments

_hark 655 days ago

Totally fair points all. Sorry if it came across as condescending!

I agree with you that this network probably has not found the source code or something like a minimal description in its weights.

Honestly, I'm writing a paper on model compression/complexity right now, so I may have co-opted the discussion to practice talking about these things...! Just a bit over-eager (,,>﹏<,,)

Have you given much thought to how we can encourage models to be more compressible? I'd love to be able to explicitly penalize the filesize during training, but in some usefully learnable way. Proxies like weight norm penalties have problems in the limit.

link

godelski 655 days ago

Haha totally fair and it happens to me too, but trying to work on it.

I actually have some stuff I'm working on in that area that is having some success. I do need to extend it to diffusion but I see nothing stopping me.

Personally I think a major slowdown for our community is it's avoidance of math. Like you don't need to have tons of math in the papers, but many of the lessons you learn in the higher level topics do translate to usable techniques in ML. Though I would also like to see a stronger push on theory because empirical results can be deceiving (Von Neumann's elephant and all)

link