Hacker News new | ask | show | jobs
by ffast-math 1462 days ago
Yes. It's another research project to make this happen, but I think it would be fairly straightforward. The issue is that you can't backprop through the assignment step, so you get no gradient with respect to the input. This mandates a progressive layer freezing strategy. I don't think it would be too hard to get working though; you'd likely just need to train for longer, or start with a pretrained model and fine-tune it as you freeze + approximate the layers.