Hacker News new | ask | show | jobs
by Bayes7 937 days ago
Okay, I see that for inference. But for training it shouldn't matter because I need to hold on to all my activations for my backwards pass anyways? But yeah, fair point!