| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by in-silico 165 days ago
	Why can't you just leave H_res as the identity matrix (or just not use it at all)? In that case, the model is basically a ResNet again and you don't need to worry about exploding/vanishing gradients from H_res. I would think that H_post and H_pre could cover the lost expressiveness.