| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lschneider 405 days ago

That's a great point, it would be better to keep the gradients separate from the Scalars.

However, I think PyTorch does it the same way (?), at least they say something like this in their docs.

"This function accumulates gradients in the leaves - you might need to zero .grad attributes or set them to None before calling it." - https://docs.pytorch.org/docs/stable/generated/torch.autogra...

The rust burn crate does it better, they store the backprop'd gradients in a separate container and return it: https://github.com/tracel-ai/burn/blob/af381ee18566fc27f5c98...