|
|
|
|
|
by vessenes
850 days ago
|
|
I wasn't sure if this paper was parody on reading the abstract. It's not parody. Two things stand out to me: first is the idea of distilling these networks down into a smaller latent space, and then mucking around with that. That's interesting, and cross-sections a bunch of interesting topics like interpretability, compression, training, over- and under-.. The second is that they show the diffusion models don't just converge on similar parameters as the ones they train against/diffuse into, and that's also interesting. I confess I'm not sure what I'd do with this in the random grab bag of Deep Learning knowledge I have, but I think it's pretty fascinating. I might like to see a trained latent encoder that works well on a bunch of different neural networks; maybe that thing would be a good tool for interpreting / inspecting. |
|
Or maybe some metaparameter that mucks with the sizes during training produces better results. Start large to get a baseline, then reduce size to increase coherence and learning speed, then scale up again once that is maxed out.