Hacker News new | ask | show | jobs
by ScoutOrgo 1015 days ago
Hey Jeremy, it seems like you could calculate exactly how much a model learns in a single step by calculating the loss for a batch a second time (with no_grad) after the loss is calculated the first time and gradients are updated. This seems like it could produce interesting outputs when graphing the difference of first and second losses at the batch or observation/question level.